True confession time: my web pages serve several purposes. One of these is to act as a notebook that I will have access to where ever I have a computer and an Internet connection. This page is under construction. In the short term the purpose of this page is to provide a draft of an article on memory and to provide links to garbage collection sources. In the longer term I hope to provide a polished article that will be of interest to other software engineers. This page was prompted by J.D. Marrow, who kindly sent me a reference to Hans-J. Boehm's Web page on garbage collection.
Dynamicly created data structures like trees, linked lists and hash tables (which can be implemented as arrays of linked lists) are key to the construction of many large software systems. For example, a compiler for a programming language will maintain symbol tables and type information which is dynamically constructed by reading the source program. Many modern compilers also read the source program (e.g., parse it) and translate it into an internal tree form (commonly called an abstract syntax tree) that is also dynamically created. Graphics programs, like 3D rendering packages, also make extensive use of dynamic data structures. In fact it is rare to find any program that is larger than a couple of thousand lines that does not make use of dynamically allocated data structures.
To avoid consuming vast amounts of virtual memory and destroying performance through page swapping, many programs that use dynamic allocation also dynamically deallocate memory when the programmer believes it is no longer needed. This leads to a number of problems including:
Poor performance. The openGL graphics library supports "display lists" which describe the 3D polygons that make up a shape. A complex scene may have a large number of display lists. Deallocation of a display list, one element at a time, can be very time consuming and can have a large impact on program performance. This can be avoided using a block based memory allocator, where the display list elements are allocated from large memory blocks. When a display list is no longer needed, a pool of blocks can be deallocated. While this technique is effective, designing, implementing and debugging a good block allocator is time consuming.
Memory leaks. A memory leak takes place when memory is allocated but never deallocated. A major computer manufacturer had a subtle memory leak in one of the window types for their window system. One group of users would run large test programs over night and then expect to scroll back through the windows buffer. The buffer was not infinite and only buffered a certain number of lines. The software displaying this scrolling buffer had a memory leak. It would recover most, but not all of the memory allocated as the window scrolled. As the test ran over night, the window software used up more and more memory, until the entire virtual memory was consumed. The window would then crash, destroying the test run information the user wanted. Tools like purify where not available at the time and this bug tool a huge amount of effort to track down and fix.
Pointers to deallocated memory. When memory is deallocated there may still be pointers in use to the deallocated memory. The deallocated memory will be recovered by the memory allocation system and reused. Since it may now have two pointers to it (e.g., the live pointer to the old deallocated data structure and the pointer to the newly allocated data structure) the program behavior may be bizzare, running correctly sometimes and getting the wrong result in others.
Even carefully crafted code written by experienced software engineers tends to suffer from problems with memory leaks and references to deallocated memory. Products like like purify (for UNIX) and BoundsChecker (for Windows NT) are literally worth their weight in gold when trying to track down these errors.
There are a variety of reasons that the Java programming language seems to be popular. Some of these reasons actually involve software engineering issues, rather than hype and religion. Java does not support explicit pointers (which are a source of a lot of complexity in C and C++) and it supports garbage collection. This can greatly reduce the amount programming effort needed to manage dynamic data structures. Although the programmer still allocates data structures, they are never explicitly deallocated. Instead, they are "garbage collected" when no live references to them are detected. This avoids the problem of having a live pointer to a dead object. If there is a live pointer to an object, the garbage collection system will not deallocate it. When an object is no longer referenced, its memory is automaticly recovered. So, in theory, Java avoids many of the problems listed above. The cost, however, is in performance. Automatic garbage collection is usually not as efficient as programmer managed allocation and deallocation. Also, garbage collectors tend to deallocate objects from a low level, which can hurt performance (e.g., deallocating an openGL display list at the element level). There is a rich body of work on garbage collector architecture and garbage collection algorithms (stemming largely from LISP runtime support, which makes heavy use of garbage collection). A lot of this work is aimed at avoiding the more severe pitfalls of garbage collection.
Many garbage collection systems rely on compiler support to help the system determine when a pointer goes out of scope, so the object the pointer points to can be deallocated. However, most C and C++ compilers don't support garbage collection. There are several packages that support garbage collection for C and C++. See, for example, Hans-J. Boehm's Web page on the Boehm-Demers-Weiser conservative garbage collector. Another version of the web page on the Hoehm-Demers-Weiser garbage collector is also available at a site hosted by Xerox PARC. This garbage collector is designed to be a replacement for malloc in C and new in C++. The Web pages also provides a reference list. So far I have not looked into the pros and cons of this software (as I mentioned above, this Web page is still under construction).
In pool allocation, objects are allocated from a pool of large blocks. When a point in the program is reached where this memory is no longer needed, the entire pool can be deallocated at once. For a discussion of pool allocation of C++ objects see Overloading New in C++.
Jamie Zawinski's argument for garbage collection vs. ad hoc memory management (or no memory management and memory leaks).
Hoard is not a garbage collecting allocator. However, it does very fast multiprocessor memory allocation for multithreaded software. Accross multiple processors with a threaded application the authors of Hoard claim that they have very close to linear allocator performance. Hoard would be an excellent platform on which a garbage collecting allocator could be built.
Ian Kaplan
Last updated, December, 2001