Smart Optimisations – Tail Call

One of more interesting method of solving problems in programming is recursion. Unfortunately, the procedural languages have problem with this approach. They use a lot of stack memory that is quite slow and quite limited. The algorithms using loops are often suggested as workaround.

The compiler may be cleaver sometimes and avoid using the stack. This is a case with the tail recursion. To use this method the last command in procedure (its tail) should be the return from recursion call. The compiler doesn’t have to store current state or to generate a return address. This make the compiled code to look like a common loop and to work as fast.

Below are two illustrations of recursions: the common and the tail.

Graf 1 Graf 2

In my opinion the recursion make the code more readable. At the same time, the use of loops make be little easier for compilers to optimise. If someone decide to use the former method, the tail approach will provide better performance without reducing readability.

I’ve written two, simple procedures in C to test this hypothesis. I’ve implemented a simple generator of pseudo-random numbers. It could be a substitute for a roll of a common dice.

uint_fast16_t my_rand_tail(uint_fast16_t seed, size_t i)
{
	uint_fast32_t t, t1, t2;
	if (i) {
		t = seed;
		t1 = t ^ (t >> 1);
		t2 = (t ^ (t << 2)) + 0xf0000;
		t = (t2 - t1) & 0xffff;
		return my_rand_tail(t, --i);

	} else {
		return seed;
	}
}

uint_fast16_t my_rand_not_tail(uint_fast16_t seed, size_t i)
{
	uint_fast32_t t, t1, t2;
	if (i) {
		t = my_rand_not_tail(seed, --i);
		t1 = t ^ (t >> 1);
		t2 = (t ^ (t << 2)) + 0xf0000;
		t = (t2 - t1) & 0xffff;

	} else {
		t = seed;
	}
	return t;
}

The code readability looks the same. The performance is up to twice as good for the tail call version. I’ve tested on two different computers and two different compilers using the highest standard optimisation level (-O3). Below is the chart.

Chart

In real code the opportunities to use tail recursion are limited. Most algorithms cannot be adapted to this form. A lot of time is lost to call external or system procedures too. However, I will use this little optimisation every time I can. If it doesn’t limit the code readability, why shouldn’t we use it.