Memory and Object Management, Part II

The C/C++ Users Journal, August, 1999

The Suzuki method for teaching music emphasizes performing. Every student always has a recital piece ready. For the youngest students the recital piece is usually "Twinkle, Twinkle, Little Star." This can lead to rather tedious recitals, but for the parents of students of that age, seeing their children perform is more important than assessing the esthetic merits of their work.

The same approach can be applied to software development. Here at The Journeyman’s Shop we believe that it is much more important to get something working early on than to polish and extend code that isn’t yet known to work. That’s the brilliance of "Hello, world": getting a compiler and library to the point where "Hello, world" works is a significant accomplishment, even though there’s a great deal more work to be done to produce a fully functional compiler and library1.

This month we’re going to continue our discussion of memory management techniques by looking at how to develop a memory manager for a linked list. We’ll start with a very simple version, and make sure that it works. We’ll work our way up to an industrial strength memory manager in small steps, eventually producing a couple of C++ templates for a deceptively simple but powerful memory manager. By starting out with simple C code rather than templates, we’ll avoid dealing with templates and their complexities (including compiler quirks) until we know that our underlying implementation is sound. We’ll resist the temptation to do everything at once, because we know from experience that it’s much easier to add in a few carefully chosen features after we’ve built a solid foundation.

The Requirements

Let’s say that we’ve been asked to help improve the performance of a customer’s application. It’s too slow, and it uses too much memory. They’ve identified their linked list manager as the most likely culprit, and they’ve asked us to write a memory manager to sit underneath their linked list manager. Their application manages several linked lists of complex numbers, and during a typical run of the application there can be as few as a couple of hundred elements in use or there can be as many as a couple hundred thousand elements. They’ve been using malloc and free to allocate and release list nodes, but they’re willing to find all the places where the linked list manager calls these two functions and replace the calls with calls to our code, with whatever arguments we need. That makes our job quite a bit easier: we don’t have to fit into a pre-defined interface, which means that we can give our memory manager whatever information we need. We’ll write two functions, one to allocate a list element, and one to free it.

Why Are We Writing a Memory Manager?

Before we start designing our code we need to be sure that we understand the nature of the problem that we’re trying to solve. In the case of a memory manager there are two common problems: the current memory manager is too slow, and the current memory manager wastes too much space. These problems often arise because the memory manager that comes with a C compiler is designed to work reasonably well for most applications, which means that it implements a compromise between speed and space efficiency. This compromise is often unimportant, but sometimes it’s the primary cause of performance problems.

Most memory managers for C and C++ maintain a linked list of available memory blocks, known as a "free list," sorted by the addresses of the blocks. The function malloc walks through the free list, looking for a suitable block. In pseudo-code:


typedef struct mem_b
    {
    size_t size;
    struct mem_b *prev, *next;
    } mem_block;

void *malloc(size_t sz)
    {
    mem_block *cur = free_list_head;
    sz += sizeof(size_t);   /* adjust for header;
                                see below */
    while (cur && cur->size < sz)
        cur = cur->next;

If malloc reaches the end of the free list without finding a block that’s large enough, it usually asks the operating system for more memory. If that fails, it can’t allocate the requested memory, so it returns NULL. If it finds a suitable block, if the entire block is needed to satisfy the allocation request it unlinks the block from the free list and returns its address, otherwise it carves off a portion that’s large enough to satisfy the allocation request and returns the address of that portion. Like this:


    if (cur->size == sz)
        {   /* unlink block */
        cur->next->prev = cur->prev;
        cur->prev->next = cur->next;
        return cur;
        }
    else
        {   /* carve out needed amount */
        mem_block res;
        cur->size -= sz;
        res = (char *)cur + cur->sz;
        res->size = sz;
        return (char *)res + sizeof(size_t);
        }

Notice that in the else block, the returned block is taken from the upper end of the block pointed to by cur. This saves having to unlink the free block from the list. All that’s needed is to adjust the size of the free block to reflect the amount that we’ve removed. In addition, the code adjusts the returned pointer to point past the stored size. That’s why the earlier code increased the amount requested by sizeof(size_t)2. You’ll see why we need to keep this size inside the allocated block when we get to freeing a block.

The function free also walks through the free list, this time looking for the location in the list where the block being freed should go. That’s easy to do, if the C compiler supports comparison of arbitrary memory addresses. In pseudo-code, it looks like this:


void free(void *p)
    {
    mem_block *cur = free_list_head;
    mem_block *ptr = (mem_block *)((char *)p
        - sizeof(size_t));
    while (cur < ptr)
        cur = cur->next;

Now free knows where to insert the block being freed. The next step is to decide whether to merge that block with either of its neighbors in the list. That is, if the block being freed occupies memory that’s adjacent to the block that comes before it or after it in the list, then the adjacent blocks should be combined into one block. Otherwise we’d end up with lots of little blocks, and have a hard time satisfying allocation requests for large blocks. That’s part of a general problem known as "fragmentation." The merge code is fairly simple:


    if ((char *)ptr + ptr->size == cur)
        {   /* merge with successor */
        cur->next->prev = cur->prev;
        cur->prev->next = cur->next;
        ptr->size += cur->size;
        cur = cur->next;
        }
    if (ptr == (char *)cur->prev
            + cur->prev->size)
        { /* merge with precedessor */
        cur->prev->size += ptr->size;
        }
    else
        {   /* insert */
        cur->prev->next = ptr;
        ptr->prev = cur->prev;
        cur->prev = ptr;
        ptr->next = cur;
        }

The code that handles merging with a successor block removes the successor block from the free list then adjusts the size of the block being freed by the size of the successor block. That way, the code that follows doesn’t have to be duplicated to deal with a block that needs to be merged with both adjacent blocks.

There are two things to note in these functions. The first is that both walk through the free list until they find the location where they need to work. That means that the time they take is proportional to the length of the free list. Even with merging of adjacent blocks, the free list can become fairly long, so this search can take a long time. The second is that the block that malloc returns is larger than the size requested. That’s because free needs to know how large the block being freed is, so that it can determine whether to merge it with the next block in the free list.

Starting Simply

If we didn’t have to merge adjacent free blocks, free wouldn’t have to walk through the free list looking for the right place to insert a block. Merging blocks is important if our memory manager has to handle blocks of different sizes. If we know that we’re going to be dealing with blocks of a single size, we can eliminate that code. That’s the key to a fast allocator for a linked list: all the blocks in the list are the same size, so we don’t need to worry about merging blocks. We can eliminate that code by writing our own free list manager. Let’s assume that our customer’s complex numbers are represented as two double values:


typedef struct list_el
    {
    struct list_el *next, *prev;
    double re, im;
    } list_element;

Our functions for allocating and freeing elements of this type look like this:


list_element *free_list = NULL;

list_element *alloc_element(void)
    {
    if (free_list != NULL)
        {   /* allocate from free list */
        list_element *res = free_list;
        free_list = free_list->next;
        return res;
        }
    else
        return malloc(sizeof(list_element));
    }

void free_element(list_element *elem)
    {
    elem->next = free_list;
    free_list = elem;
    }

And, finally, a simple test harness looks like this:


int main()
    {
    list_element *elt = alloc_element();
    free_element(elt);
    }

This memory manager will be considerably faster than the version of malloc and free that we looked at above. However, it has a potentially serious cost. Consider what happens if the application starts out by creating a hundred thousand nodes, then settles into a steady state where it uses only ten thousand or so. We’ve got ninety thousand nodes in our free list, and no way of making that memory available to the rest of the application. We should consider setting a maximum length for the free list. It doesn’t take much code:


#define MAX_SIZE 10000
int nelems = 0;

list_element *alloc_element(void)
    {
    if (free_list != NULL)
        {   /* allocate from free list */
        list_element *res = free_list;
        free_list = free_list->next;
        --nelems;
        return res;
        }
    else
        return malloc(sizeof(list_element));
    }

void free_element(list_element *elem)
    {
    if (MAX_SIZE <= nelems)
        free(elem);
    else
        {
        elem->next = free_list;
        free_list = elem;
        ++nelems;
        }
    }

One thing we haven’t done so far is eliminate the extra space that malloc and free imposed. We’re still allocating an extra few bytes for each block, even though our memory manager doesn’t need that information. The answer to this problem is to use malloc to allocate blocks that can hold several of our nodes, and parcel out the nodes ourselves. This is known as suballocation. The allocator now looks like this:


#define BLOCK_SIZE (sizeof(list_element) * 40)
char *block = NULL;
unsigned avail = 0;

list_element *free_list = NULL;

list_element *alloc_element()
    {
    if (free_list != NULL)
        {   /* allocate from free list */
        list_element *res = free_list;
        free_list = free_list->next;
        return res;
        }
    else 
        {   /* suballocate */
        if (avail == 0)
            {
            block = malloc(BLOCK_SIZE);
            if (block == NULL)
                return NULL;
            avail = BLOCK_SIZE;
            }
        avail -= sizeof(list_element);
        return (list_element*)(block + avail);
        }
    }

The suballocator is quite simple. It uses the variable avail to keep track of how many bytes are still available for use from block. When avail is 0, which occurs the first time the suballocator is used and whenever all the memory in a block has been used, it allocates a new block. Each list element is carved off the top of the block. Since the list elements are not being allocated with malloc, they cannot be passed to free when we are done with them. This means we cannot use the second version of free_element, only the first. We’ve eliminated malloc’s extra data from our list elements, at the cost of not being able to return unused list elements to the global free store. That’s often acceptable. If we know that the application works with a large number of list elements and never drastically reduces the number of elements, the space we save by having smaller element blocks can easily outweigh the space that’s lost by not being able to recycle elements in other parts of the code.

In our actual code, the functions alloc_element and free_element would be written in a single source file, and their supporting data would be declared static within that source file.

Adapting the Code for Arbitrary Data Types

The code we’ve looked at so far satisfies our customer’s requirements, so the paying part of this project is finished. For our own purposes we may want to go further, and turn it into more generic code so that we can reuse it in future projects without having to make drastic changes to it. Once we’ve gotten our manager’s approval for putting in this non-billable time, we can look at how to make this code suitable for use with linked lists holding arbitrary data types instead of two doubles.

First, there’s a simple case involving essentially the same requirements as before, namely a single list element type throughout the application, but for different data fields within the element than what we had. That’s easy to deal with: we simply change the definition of list_element to reflect the new type, and recompile. The code should all still work.

A more challenging problem is being able to handle allocations for more than one element type within a single program. Now we can no longer have a single source file with static data. We need to encapsulate our memory manager’s data so that we can provide separate memory managers for different sizes of objects. In C we do this by putting the data inside a struct and changing alloc_element and free_element to take a pointer to a struct of that type rather than getting their data from global objects. Like this:


#include <stdlib.h>

typedef struct mm_list_el
    {
    struct mm_list_el *next, *prev;
    } mm_list_element;

typedef struct mem_d
    {
    mm_list_element *free_list;
    size_t size;
    } mem_data;

void *alloc_element(mem_data *list)
    {
    if (list->free_list != NULL)
        {   /* allocate from free list */
        mm_list_element *res = list->free_list;
        list->free_list = list->free_list->next;
        return res;
        }
    else
        return malloc(list->size);
    }

void free_element(mem_data *list, void *elem)
    {
    ((mm_list_element*)elem)->next =
        list->free_list;
    list->free_list = elem;
    }

The user of our code needs to create and initialize an object of type mem_data, and pass that object whenever the code allocates or frees a list element.


typedef struct list_el
    {
    struct list_el *next, *prev;
    double re, im;
    } complex_list_element;

mem_data complex_allocator =
    { 0, sizeof(complex_list_element) };

int main()
    {
    complex_list_element *elt =
        alloc_element(&complex_allocator);
    free_element(&complex_allocator, elt);
    return 0;
    }

In C++ we’d turn all this into a class, and avoid having to explicitly pass the extra pointer:


#include <stdlib.h>

class mem_mgr
{
public:
    mem_mgr(size_t sz) : free_list(0), size(sz) {}
    void *alloc_element();
    void free_element(void *elem);
private:
    struct mm_list_element
    {
    mm_list_element *next, *prev;
    };
    mm_list_element *free_list;
    size_t size;    
};

void *mem_mgr::alloc_element()
    {
    if (free_list != NULL)
        {   /* allocate from free list */
        mm_list_element *res = free_list;
        free_list = free_list->next;
        return res;
        }
    else
        return malloc(size);
    }

void mem_mgr::free_element(void *elem)
    {
    ((mm_list_element*)elem)->next = free_list;
    free_list = (mm_list_element*)elem;
    }

typedef struct list_el
    {
    struct list_el *next, *prev;
    double re, im;
    } complex_list_element;

mem_mgr complex_allocator(sizeof(complex_list_element));

int main()
    {
    complex_list_element *elt =
        (complex_list_element *)
        complex_allocator.alloc_element();
    complex_allocator.free_element(elt);
    return 0;
    }

Moving to Templates

This isn’t very good C++ code, however. Instead, it’s C code translated rather mechanically into C++. We can improve it tremendously by using templates instead of classes. The first step is to use the size of the list element as a template argument instead of passing it into the constructor.


#include <stdlib.h>

template<size_t size> class mem_mgr
{
public:
    static void *alloc_element();
    static void free_element(void *elem);
private:
    struct mm_list_element
    {
    mm_list_element *next, *prev;
    };
    static mm_list_element *free_list;
};

template<size_t size>
void *mem_mgr<size>::alloc_element()
    {
    if (free_list != NULL)
        {   /* allocate from free list */
        mm_list_element *res = free_list;
        free_list = free_list->next;
        return res;
        }
    else
        return malloc(size);
    }

template<size_t size>
void mem_mgr<size>::free_element(void *elem)
    {
    ((mm_list_element*)elem)->next = free_list;
    free_list = (mm_list_element*)elem;
    }

template<size_t size>
mem_mgr<size>::mm_list_element
    *mem_mgr<size>::free_list;

typedef struct list_el
    {
    struct list_el *next, *prev;
    double re, im;
    } complex_list_element;

typedef mem_mgr<sizeof(complex_list_element)> 
    complex_allocator;

int main()
    {
    complex_list_element *elt =
        (complex_list_element *)
        complex_allocator::alloc_element();
    complex_allocator::free_element(elt);
    return 0;
    }

There is a subtle change that you should notice here. Instead of using a class with ordinary members, I used a template with static members. When we used a class we needed a separate object for each different element size, because each different element size needs a separate free list. With a template, each time we instantiate a template with a different size we get a different type. That means that mem_mgr<24>::free_list is already distinct from mem_mgr<16>::free_list. We don’t need separate objects to keep these items distinct. The type system will do it for us. All we need to do is call alloc_element and free_element on the appropriate type.

Refining the Template

Now we have a template that lets us create allocators for arbitrary sizes of list elements. That’s nice in itself, but there are still too many details exposed to our users. They have to tell the template how big their element type is, they have to cast the return from alloc_element, and there’s the danger that they’ll call free_element with an object of the wrong type. These are easily fixed by adding a second template:


template<class C> class mgr
{
typedef mem_mgr<sizeof(C)> impl;
public:
    static C *alloc_element()
        { return (C*)impl::alloc_element(); }
    static void free_element(C *ptr)
        { impl::free_element(ptr); }
};

Now we have type safety, and our users don’t have to concern themselves with sizeof. In use, our template looks like this:


int main()
    {
    complex_list_element *elt =
        mgr<complex_list_element::alloc_element();
    mgr<complex_list_element>::free_element(elt);
    return 0;
    }

Conclusion

In this column we’ve gradually developed a memory manager for linked lists. Each version has incorporated a simple main function to serve as a driver. Simply calling alloc_element and free_element once, however, shouldn’t be enough to satisfy us that our code works. We need a more comprehensive test, even if we have a separate testing group with responsibility for overall testing. Before we release our code to our testing group we should be convinced that it will almost certainly pass all of their tests. Next month we’ll look at how to approach testing memory managers. It’s difficult to do well, but it’s easy to do a much better than what we’ve done so far.

1. Java, of course, doesn’t fit this pattern. To get "Hello, world" working in Java is a much larger task than it is in C or C++. It requires that about half of the core library (by "core library" I mean the packages java.io, java.lang, and java.util) be written and compilable, and that about a quarter of the core library work correctly. This is in part because Java does not distinguish between interface and implementation, so library implementors must write function bodies for every function that is named in another function, even when that code is not actually called. It is also because functions in Java’s library are highly interdependent. For example, in order for "Hello, world" to work the library has to be able to read a system resource that tells it what the newline character is on the platform where it is running. Resources are accessed through a HashTable, so the code that implements HashTable has to work, and system resources are read from files, so the code for opening and reading from files must also be working.

2. I haven’t addressed the problem of assuring appropriate alignment here. This code is intended only as a sketch of what the system library does. In the actual runtime library the code makes sure that the memory block is aligned suitably for any data type.