Monitoring UNIX Application Performance
Application Monitoring is a very important aspect of a project. Unfortunately, not much attention is being is paid to develop an effective monitoring while the project is not live. Once the project is live, the lack of proper monitoring factors in terms of downtime when the support person is not aware if application is having some problems or application not working at all.
In my line of work, we are being constantly bug by the application team why the system is slow or what is the reason of the system crash. Most of the time the cause is related to memory-management and performance problems, leaks/corruption in the program being run.
Memory management is prone to errors that are too hard to detect. Common errors may be listed as:
- Use of uninitialized memory
- Reading/writing memory after it has been freed
- Reading/writing off the end of malloc’d blocks
- Reading/writing inappropriate areas on the stack
- Memory leaks — where pointers to malloc’d blocks are lost forever
- Mismatched use of malloc/new/new vs free/delete/delete
- Some misuses of the POSIX pthreads API
In trying to monitor the health of a server, the usual suspects can be used: iostat, vmstat and netstat but when we need to dig down to memory-management like the above issues, we need a serious tool.
This is a situation where we need Valgrind. Valgrind works directly with the executables, with no need to recompile, relink or modify the program to be checked. Valgrind decides whether the program should be modified to avoid memory leak, and also points out the spots of “leak.”
Valgrind simulates every single instruction as the program executes. For this reason, Valgrind finds errors not only in the application but also in all supporting dynamically-linked (.so-format) libraries, including the GNU C library, the X client libraries, Qt if you work with KDE, and so on. That often includes libraries, for example the GNU C library, which may contain memory access violations.
With Valgrind’s tool suite you can automatically detect many memory management and threading bugs, avoiding hours of frustrating bug-hunting, making your programs more stable. You can also perform detailed profiling to help speed up your programs.
The Valgrind distribution Tool Suite includes four useful debugging and profiling tools:
Memcheck detects memory-management problems, and is aimed primarily at C and C++ programs. When a program is run under Memcheck’s supervision, all reads and writes of memory are checked, and calls to malloc/new/free/delete are intercepted. As a result, Memcheck can detect if the program:
- Accesses memory it shouldn’t (areas not yet allocated, areas that have been freed, areas past the end of heap blocks, inaccessible areas of the stack).
- Uses uninitialised values in dangerous ways.
- Leaks memory.
- Does bad frees of heap blocks (double frees, mismatched frees).
- Passes overlapping source and destination memory blocks to memcpy() and related functions.
Memcheck reports these errors as soon as they occur, giving the source line number at which it occurred, and also a stack trace of the functions called to reach that line. Memcheck tracks addressability at the byte-level, and initialisation of values at the bit-level. As a result, it can detect the use of single uninitialised bits, and does not report spurious errors on bitfield operations. Memcheck runs programs about 10–30x slower than normal.
Cachegrind is a cache profiler. It performs detailed simulation of the I1, D1 and L2 caches in the CPU and so can accurately pinpoint the sources of cache misses in your code. It identifies the number of cache misses, memory references and instructions executed for each line of source code, with per-function, per-module and whole-program summaries. It is useful with programs written in any language. Cachegrind runs programs about 20–100x slower than normal.
Callgrind, by Josef Weidendorfer, is an extension to Cachegrind. It provides all the information that Cachegrind does, plus extra information about callgraphs. It was folded into the main Valgrind distribution in version 3.2.0. Available separately is an amazing visualisation tool, KCachegrind, which gives a much better overview of the data that Callgrind collects; it can also be used to visualise Cachegrind’s output.
Massif is a heap profiler. It performs detailed heap profiling by taking regular snapshots of a program’s heap. It produces a graph showing heap usage over time, including information about which parts of the program are responsible for the most memory allocations. The graph is supplemented by a text or HTML file that includes more information for determining where the most memory is being allocated. Massif runs programs about 20x slower than normal.
Helgrind is a thread debugger which finds data races in multithreaded programs. It looks for memory locations which are accessed by more than one (POSIX p-)thread, but for which no consistently used (pthread_mutex_) lock can be found. Such locations are indicative of missing synchronisation between threads, and could cause hard-to-find timing-dependent problems. It is useful for any program that uses pthreads.
Some of the big name software projects using or have used Valgrind are Firefox, OpenOffice, StarOffice, AbiWord, Opera, KDE, GNOME, Qt, libstdc++, MySQL, PostgreSQL, Perl, Python, PHP, Samba, RenderMan, Nasa Mars Lander software, SAS, The GIMP, Ogg Vorbis, Unreal Tournament, Medal of Honour…..
If you’re on ubuntu, you can install it with the usual:
sudo apt-get install valgrind
or for Fedora/RedHat
sudo yum -y install valgrind
Playing around, the checking may be performed by simply placing the word valgrind just before the normal command used to invoke the program. For example:
mobile1:~$ valgrind ps -ef
==6752== Memcheck, a memory error detector.
==6752== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==6752== Using LibVEX rev 1804, a library for dynamic binary translation.
==6752== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==6752== Using valgrind-3.3.0-Debian, a dynamic binary instrumentation framework.
==6752== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==6752== For more details, rerun with: -v
==6752== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 21 from 1)
==6752== malloc/free: in use at exit: 1,102 bytes in 48 blocks.
==6752== malloc/free: 961 allocs, 913 frees, 42,627 bytes allocated.
==6752== For counts of detected errors, rerun with: -v
==6752== searching for pointers to 48 not-freed blocks.
==6752== checked 407,936 bytes.
==6752== LEAK SUMMARY:
==6752== definitely lost: 156 bytes in 11 blocks.
==6752== possibly lost: 0 bytes in 0 blocks.
==6752== still reachable: 946 bytes in 37 blocks.
==6752== suppressed: 0 bytes in 0 blocks.
==6752== Rerun with --leak-check=full to see details of leaked memory.
Any error (memory related) is pointed out in the error report. With our example, there are 4 errors encountered and 156 bytes was lost due to memory leaked.