Get Time

GCC Hacks: Abusing C++ Extensions for Fun and Profit

By bmerry
TopCoder Member

GCC provides a number of extensions to the C++ language and standard library. Some are de-facto standards supported by several compilers, others are C99 features that GCC offers in C++, still others are purely GNU extensions. Several of these extensions can be put to use in a TopCoder round. Some will make your program faster; others will save you typing.

When considering using an extension, keep in mind that when I say that extension will make your program faster because it generates some special assembly instruction or reduces some overhead, I mean that your program will be very slightly faster. If your program is exceeding a time limit, it is quite likely that you have a bad algorithm and that it is orders of magnitude too slow; these extensions cannot help you here. However, in some cases you will know (or hope) that your program is only just too slow (maybe because slightly smaller test cases run in time), and you wish to squeeze out a little more performance.

long long
This extension is probably seen in the arena more than any other. The long long type has been part of GCC for a long time, and it has even been incorporated into C99. It is a 64-bit signed integral type (at least on x86, and I would guess most if not all other architectures); the companion unsigned long long type provides 64-bit unsigned integers. Since the x86 architecture does not have 64-bit registers, the resulting code is quite a bit slower than 32-bit arithmetic. Nevertheless, it is convenient when 32-bit integers just aren't big enough, or even when you're not sure if they're big enough and you want some breathing space. Usually when I'm worried about integer overflow, I convert everything I can to long long; if you have the right algorithm, the loss of performance will seldom push you over the time limit.

Variable-sized automatic arrays
Arrays allocated on the stack may be given a size that depends on run-time data, such as the values of variables or parameters. This is useful for allocating arrays to hold information when you don't know how big the array must grow.

There are some downsides to using this functionality. In particular, gdb does not seem to handle it very well, and tends to show an array with 2 values when you try to print it. You can get around this by asking for the value of arr[0]@N where N is the number of elements you wish to see. There is also the danger of stack overflow if you try to allocate too many elements (although this also applies to fixed-sized arrays, it does not constrain STL classes which internally allocate their data on the heap). In TopCoder matches there is usually some small upper bound on the size of input data, and so it is often possible to create a constant-sized array of sufficient size for all cases.

Inline functions
GCC has supported inline functions for years in both C and C++. They have been incorporated into C99, but have not yet been formally included in C++. An inline function in one that is inserted in-place at each call site, thus eliminating a function call overhead. Since each copy consumes extra instructions (and thus memory bandwidth and instruction cache), this is only worthwhile for very short functions (typically a line or two) which are called often. An additional benefit to using inline functions is that the inlined code can be optimised separately for each context in which it is used (doing constant folding, for example).

Inline functions are often recommended as a safer alternative to macros, since the usual function call semantics (such as type checking and single evaluation of the arguments) applies to inline functions. Since they are inserted in place, they are just as fast as macros.

To request that the compiler inlines a regular (non-member) function, precede the function return type with the keyword inline, for example
inline int sqr(int x) { return x * x; }
Inlining member functions is quite easy: any member function whose body is defined in-place (i.e. inside the class) will be automatically inlined. If you prefer to keep the function body separate, apply the inline keyword to the declaration, but not to the definition.

You're not guaranteed that the function will be inlined; this is merely a request to the compiler. Recursive functions obviously cannot be inlined; GCC also refuses to inline functions that use variable-sized arrays (see above). Also note that any call through a function pointer will not be inlined; this means that trying to inline a comparison function to be passed to sort is futile. If, however, you define a function object with an inlined operator() and pass an instance to sort, then the sort template will be specialised for that function class and the inlining will work.

Although the header <algorithm> provides min and max functions, these have some limitations. They have only one template parameter, which specifies the type for both arguments; this sometimes leads to errors when the given arguments are of different types. GCC provides the highly non-standard operators <? (min) and >? (max), which will coerce the arguments to a suitable type.

These operators are most commonly seen in their assignment forms <?= and >?=. For example, the following loop computes the minimum value of a function over a range:
int m = INT_MAX;
for (int i = 0; i < N; i++)
    m <?= func(i);
typeof operator
Many TopCoders pre-define little macros for various types of loops, to reduce the number of keys they have to hit during a contest. One of the problems in defining such macros is that the types involved are not always known. The non-standard typeof keyword can help here. It “returns” the type of its argument: a typeof expression can be used anywhere you'd normally write the name of a type, such as in a declaration. Suppose you wanted to write a macro that took two iterators and emitted a for loop that iterated between them. You could write this as follows:
#define for_range(i, first, last) for (typeof(first) i = (first); i != (last); ++i)
where the typeof operator determines the appropriate type for the loop variable.

Compound literals
I often define a small struct that will hold two or three items, such as the end-points and weight of an edge in a graph. For two-item structs there is already the utility pair template, but it has unfriendly field names (first and second compared to, say, target and cost), and it doesn't generalise well to more than two elements (although you can create a pair<pair<R, S>, T> if you really insist).

One advantage of pair that you lose when you do this is the make_pair function that creates pairs on the fly. You could of course define such a function for your class, or give it a constructor, but this can take vital coding time. GCC provides another alternative, which comes from C99: the compound literal. A compound literal is a mix between a cast and an initialiser. To understand what I mean, consider a struct defined as
struct edge
    int target;
    int cost;
An example of a compound literal is (edge) { t, c } where t and c are expressions. The values in the braces are used to initialise the values of the struct, in the order in which they are declared. This is useful for instantiating structs on-the-fly to pass to STL methods.

Although C++ provides the bitset template to efficiently manipulate sets of bits, there are some things that can be done within an integral type that cannot be done so easily with a bitset. For example, the lowest bit of an integer x can be cleared by x &= x - 1. The x86 instruction set provides some special instructions for doing some types of bit manipulation, and GCC provides pseudo-functions that generate these instructions:
__builtin_ctz(unsigned int x)
Counts the number of trailing zero bits. This is one less than the POSIX function ffs (which GCC also implements using the special x86 instruction).
__builtin_clz(unsigned int x)
Counts the number of leading zero bits. Note that GCC docs state that both of these functions are undefined if x is 0. Experimenting on my Athlon 64 with GCC 3.4 gives values of 0 and 31 respectively with -O2, but nonsense values when compiled without optimisation. Don't rely on any particular value.
__builtin_popcount(unsigned int x)
Counts the total number of one bits in x. Note that x86 does not have an instruction to do this, so you end up calling a real function; however, this is still more convenient than having to write the function yourself.
These functions also have variants with ll appended to the function name, that takes unsigned long long rather than unsigned int.

With the exception of long long, these extensions will not make your program immensely easier to write. They could, however, save the vital few seconds that make the difference between winning or losing. Even if you don't intend you use the extensions, it is worthwhile to know how they work so that you can understand code during the challenge phase.