Home / Blog / Valgrind: no more printfs for code analysis

Valgrind: no more printfs for code analysis

Posted on 03/02/2015, by Jesús Díaz (INCIBE)
Valgrind

Witnessing the effects caused by programming errors on a daily basis has raised our awareness regarding the importance of producing quality source code. But how do we achieve this? Printf will always be an option to debugging and testing, but obviously there are much more powerful tools. In this post we sum up the main characteristics of Valgrind, the quintessential debugging and C/C++ code optimization tool.

Actually, Valgrind is a combination of tools (a framework) that performs dynamic analyses of programmes written in C/C++. For every instruction that the software under analysis executes, Valgrind adds a number of additional instructions to analyse the its behaviour. The main technique it uses is known as shadow memory. In brief, it consists in associating a group of bits to each memory portion, specifying if the corresponding data is accessible, if it has been correctly initialized, etc.

Therefore, the main function that Valgrind offers is analysing the use that the source code under analysis makes of dynamic memory.

An analysis can be initiated simply by executing the “valgrind ./program” command. [Note: for the exit produced by Valgrind to be more informative, the -g option must be used when compiling the source code, and optimization flags such as -O1 and -O2 must be avoided]. In this case, Valgrind notifies by default about events such as access to uninitialized memory addresses, double free type errors, access to out of range memory, etc. But it also offers more options. For example, if the “--leak-check=full” option is specified, it notifies about dynamic memory that has not been liberated at the end of the execution, if “--track-fds” is indicated it also monitors the used file descriptors, indicating which ones have been left open, etc. For further information, see the man page (man valgrind) or its online manual.

For example, Valgrind informs there are no problems when it analyses the following code (memorytest) with “valgrind --leak-check=full ./memorytest”.


1.	#include <stdio.h>

2.	#include <stdlib.h>

3.	#include <assert.h>

4.	

5.	int main () {

6.	

7.	   int *array, i;

8.	

9.		   /* fprintf(stdout, "Accessing uninitialized array element: %d\n", array[1]); */

10.	

11.		   assert(array = (int *) malloc(sizeof(int)*100));

12.	

13.		   for (i=0; i<100; i++) {

14.		   /* for (i=0; i<=100; i++) { */

15.			      array[i] = i;

16.		   }

17.	

18.		   fprintf(stdout, "\n");

19.	

20.		   free(array);

21.		   /* free(array); */

22.		   array = NULL;

23.	

24.		   return 0;

25.	}

However, if we uncomment lines 9 and 21, and change line 13 for 14, Valgrind starts to notify us:

valgrind_fig1

It specifically notifies us that:

  1. We are using uninitialized memory (and that this occurs on line 9 of the source code).
  2. We are writing outside of the range of the dynamic memory we have allocated (and that the writing occurs at line 15, on a reserved variable on line 11).
  3. We are freeing a variable (on line 21) which was already freed previously (on line 20).

But that is not all, many more errors can be prevented. Another vital case in concurrent programming is preventing deadlocks or race conditions when accessing memory shared by various resources. Helgrind, which can be invoked with “valgrind --tool=helgrind ./program”, must be used to execute this function.

A test code for the race conditions can be found in Pastebin. It is the classic example of two concurrent threads, where one simulates a deposit in a bank account and the other simulates a money withdrawal: if the balance update operation is not protected (through binary semaphores for example), the final result could be inconsistent.

If the program is analysed with Valgrind (“valgrind --tool=helgrind ./threadtest”), the tool notifies us in the following form:

valgrind_fig2

It specifically tells us that there is a possible race condition, produced on thread #3 (on line 47 of the code) and another on thread #2 (on line 31). All that is needed to analyse the solution is uncomment the commented lines in the source code.

There is still more. It is also possible to analyse how efficient a code is in terms of cache memory use. For this case, the internal tool is Cachegrind and, following the pattern, it is executed with “valgrind --tool=cachegrind ./program”.

To illustrate its use, we will resort to another classic: the storage order of multidimensional arrays. In C/C++, the arrays are cached in row-major order (in other words, if e.g. the third element of the second row is accessed, the entire second row will be loaded). Therefore, if a loop goes first over the columns (instead of the rows) of a two-dimensional array, the processor will perform much more cache updates, provoking (if the data is large enough) an effect known as cache thrashing. To verify it, another simple program:

1.	#include <stdio.h>

2.	

3.	int main () {

4.	

5.		   int array[1024][1024], i, j;

6.	

7.		   for (i=0; i<1024; i++) {

8.			      for (j=0; j<1024; j++) {

9.				         array[j][i] = (i*1024)+j;

10.				         /* array[i][j] = (i*1024)+j; */

11.			      }

12.		   }

13.	

14.		   return 0;

15.	

16.	}

Valgrind notifies us in the following way about this code:

valgrind_fig3

In other words, the D1 miss rate is 98.7%. As illustrated in the following image, if the array access order is changed (commenting out line 9 and uncommenting line 10), the miss rate is reduced drastically to 6.2%.

valgrind_fig4

These are probably three of the main tools within Valgrind’s tool suite, but there are more:

  • Callgrind, analyses the call history among functions in a program, and offers an analysis of the number of associated instructions amongst other functions (very useful for code optimization).
  • Massif, a heap profiler.
  • DRD, similar to helgrind, but with a more efficient memory use.

As a final note, this is the reason behind the tool’s name.