The Suzuki method for teaching music emphasizes performing. Every student always has a recital piece ready. For the youngest students the recital piece is usually "Twinkle, Twinkle, Little Star." This can lead to rather tedious recitals, but for the parents of students of that age, seeing their children perform is more important than assessing the esthetic merits of their work.
The same approach can be applied to software development. Here at The Journeyman’s Shop we believe that it is much more important to get something working early on than to polish and extend code that isn’t yet known to work. That’s the brilliance of "Hello, world": getting a compiler and library to the point where "Hello, world" works is a significant accomplishment, even though there’s a great deal more work to be done to produce a fully functional compiler and library1.
This month we’re going to continue our discussion of memory management techniques by looking at how to develop a memory manager for a linked list. We’ll start with a very simple version, and make sure that it works. We’ll work our way up to an industrial strength memory manager in small steps, eventually producing a couple of C++ templates for a deceptively simple but powerful memory manager. By starting out with simple C code rather than templates, we’ll avoid dealing with templates and their complexities (including compiler quirks) until we know that our underlying implementation is sound. We’ll resist the temptation to do everything at once, because we know from experience that it’s much easier to add in a few carefully chosen features after we’ve built a solid foundation.
Let’s say that we’ve been asked to help improve the performance of a
customer’s application. It’s too slow, and it uses too much memory.
They’ve identified their linked list manager as the most likely culprit,
and they’ve asked us to write a memory manager to sit underneath their
linked list manager. Their application manages several linked lists of
complex numbers, and during a typical run of the application there can
be as few as a couple of hundred elements in use or there can be as many
as a couple hundred thousand elements. They’ve been using
malloc
and free
to allocate and release list
nodes, but they’re willing to find all the places where the linked list
manager calls these two functions and replace the calls with calls to
our code, with whatever arguments we need. That makes our job quite a
bit easier: we don’t have to fit into a pre-defined interface, which
means that we can give our memory manager whatever information we need.
We’ll write two functions, one to allocate a list element, and one to
free it.
Before we start designing our code we need to be sure that we understand the nature of the problem that we’re trying to solve. In the case of a memory manager there are two common problems: the current memory manager is too slow, and the current memory manager wastes too much space. These problems often arise because the memory manager that comes with a C compiler is designed to work reasonably well for most applications, which means that it implements a compromise between speed and space efficiency. This compromise is often unimportant, but sometimes it’s the primary cause of performance problems.
Most memory managers for C and C++ maintain a linked list of
available memory blocks, known as a "free list," sorted by the
addresses of the blocks. The function malloc
walks through
the free list, looking for a suitable block. In pseudo-code:
typedef struct mem_b
{
size_t size;
struct mem_b *prev, *next;
} mem_block;
void *malloc(size_t sz)
{
mem_block *cur = free_list_head;
sz += sizeof(size_t); /* adjust for header;
see below */
while (cur && cur->size < sz)
cur = cur->next;
If malloc
reaches the end of the free list without
finding a block that’s large enough, it usually asks the operating
system for more memory. If that fails, it can’t allocate the requested
memory, so it returns NULL
. If it finds a suitable block,
if the entire block is needed to satisfy the allocation request it
unlinks the block from the free list and returns its address, otherwise
it carves off a portion that’s large enough to satisfy the allocation
request and returns the address of that portion. Like this:
if (cur->size == sz)
{ /* unlink block */
cur->next->prev = cur->prev;
cur->prev->next = cur->next;
return cur;
}
else
{ /* carve out needed amount */
mem_block res;
cur->size -= sz;
res = (char *)cur + cur->sz;
res->size = sz;
return (char *)res + sizeof(size_t);
}
Notice that in the else
block, the returned block is
taken from the upper end of the block pointed to by cur
.
This saves having to unlink the free block from the list. All that’s
needed is to adjust the size of the free block to reflect the amount
that we’ve removed. In addition, the code adjusts the returned pointer
to point past the stored size. That’s why the earlier code increased the
amount requested by sizeof(size_t)
2. You’ll see why we need to keep this size
inside the allocated block when we get to freeing a block.
The function free
also walks through the free list, this
time looking for the location in the list where the block being freed
should go. That’s easy to do, if the C compiler supports comparison of
arbitrary memory addresses. In pseudo-code, it looks like this:
void free(void *p)
{
mem_block *cur = free_list_head;
mem_block *ptr = (mem_block *)((char *)p
- sizeof(size_t));
while (cur < ptr)
cur = cur->next;
Now free
knows where to insert the block being freed.
The next step is to decide whether to merge that block with either of
its neighbors in the list. That is, if the block being freed occupies
memory that’s adjacent to the block that comes before it or after it in
the list, then the adjacent blocks should be combined into one block.
Otherwise we’d end up with lots of little blocks, and have a hard time
satisfying allocation requests for large blocks. That’s part of a
general problem known as "fragmentation." The merge code is
fairly simple:
if ((char *)ptr + ptr->size == cur)
{ /* merge with successor */
cur->next->prev = cur->prev;
cur->prev->next = cur->next;
ptr->size += cur->size;
cur = cur->next;
}
if (ptr == (char *)cur->prev
+ cur->prev->size)
{ /* merge with precedessor */
cur->prev->size += ptr->size;
}
else
{ /* insert */
cur->prev->next = ptr;
ptr->prev = cur->prev;
cur->prev = ptr;
ptr->next = cur;
}
The code that handles merging with a successor block removes the successor block from the free list then adjusts the size of the block being freed by the size of the successor block. That way, the code that follows doesn’t have to be duplicated to deal with a block that needs to be merged with both adjacent blocks.
There are two things to note in these functions. The first is that
both walk through the free list until they find the location where they
need to work. That means that the time they take is proportional to the
length of the free list. Even with merging of adjacent blocks, the free
list can become fairly long, so this search can take a long time. The
second is that the block that malloc returns is larger than the size
requested. That’s because free
needs to know how large the
block being freed is, so that it can determine whether to merge it with
the next block in the free list.
If we didn’t have to merge adjacent free blocks, free
wouldn’t have to walk through the free list looking for the right place
to insert a block. Merging blocks is important if our memory manager has
to handle blocks of different sizes. If we know that we’re going to be
dealing with blocks of a single size, we can eliminate that code. That’s
the key to a fast allocator for a linked list: all the blocks in the
list are the same size, so we don’t need to worry about merging blocks.
We can eliminate that code by writing our own free list manager. Let’s
assume that our customer’s complex numbers are represented as two double
values:
typedef struct list_el
{
struct list_el *next, *prev;
double re, im;
} list_element;
Our functions for allocating and freeing elements of this type look like this:
list_element *free_list = NULL;
list_element *alloc_element(void)
{
if (free_list != NULL)
{ /* allocate from free list */
list_element *res = free_list;
free_list = free_list->next;
return res;
}
else
return malloc(sizeof(list_element));
}
void free_element(list_element *elem)
{
elem->next = free_list;
free_list = elem;
}
And, finally, a simple test harness looks like this:
int main()
{
list_element *elt = alloc_element();
free_element(elt);
}
This memory manager will be considerably faster than the version of
malloc
and free
that we looked at above.
However, it has a potentially serious cost. Consider what happens if the
application starts out by creating a hundred thousand nodes, then
settles into a steady state where it uses only ten thousand or so. We’ve
got ninety thousand nodes in our free list, and no way of making that
memory available to the rest of the application. We should consider
setting a maximum length for the free list. It doesn’t take much
code:
#define MAX_SIZE 10000
int nelems = 0;
list_element *alloc_element(void)
{
if (free_list != NULL)
{ /* allocate from free list */
list_element *res = free_list;
free_list = free_list->next;
--nelems;
return res;
}
else
return malloc(sizeof(list_element));
}
void free_element(list_element *elem)
{
if (MAX_SIZE <= nelems)
free(elem);
else
{
elem->next = free_list;
free_list = elem;
++nelems;
}
}
One thing we haven’t done so far is eliminate the extra space that
malloc
and free
imposed. We’re still
allocating an extra few bytes for each block, even though our memory
manager doesn’t need that information. The answer to this problem is to
use malloc
to allocate blocks that can hold several of our
nodes, and parcel out the nodes ourselves. This is known as
suballocation. The allocator now looks like this:
#define BLOCK_SIZE (sizeof(list_element) * 40)
char *block = NULL;
unsigned avail = 0;
list_element *free_list = NULL;
list_element *alloc_element()
{
if (free_list != NULL)
{ /* allocate from free list */
list_element *res = free_list;
free_list = free_list->next;
return res;
}
else
{ /* suballocate */
if (avail == 0)
{
block = malloc(BLOCK_SIZE);
if (block == NULL)
return NULL;
avail = BLOCK_SIZE;
}
avail -= sizeof(list_element);
return (list_element*)(block + avail);
}
}
The suballocator is quite simple. It uses the variable
avail
to keep track of how many bytes are still available
for use from block. When avail is 0, which occurs the first time the
suballocator is used and whenever all the memory in a block has been
used, it allocates a new block. Each list element is carved off the top
of the block. Since the list elements are not being allocated with
malloc
, they cannot be passed to free
when we
are done with them. This means we cannot use the second version of
free_element
, only the first. We’ve eliminated
malloc
’s extra data from our list elements, at the cost of
not being able to return unused list elements to the global free store.
That’s often acceptable. If we know that the application works with a
large number of list elements and never drastically reduces the number
of elements, the space we save by having smaller element blocks can
easily outweigh the space that’s lost by not being able to recycle
elements in other parts of the code.
In our actual code, the functions alloc_element
and
free_element
would be written in a single source file, and
their supporting data would be declared static
within that
source file.
The code we’ve looked at so far satisfies our customer’s requirements, so the paying part of this project is finished. For our own purposes we may want to go further, and turn it into more generic code so that we can reuse it in future projects without having to make drastic changes to it. Once we’ve gotten our manager’s approval for putting in this non-billable time, we can look at how to make this code suitable for use with linked lists holding arbitrary data types instead of two doubles.
First, there’s a simple case involving essentially the same requirements as before, namely a single list element type throughout the application, but for different data fields within the element than what we had. That’s easy to deal with: we simply change the definition of list_element to reflect the new type, and recompile. The code should all still work.
A more challenging problem is being able to handle allocations for
more than one element type within a single program. Now we can no longer
have a single source file with static data. We need to encapsulate our
memory manager’s data so that we can provide separate memory managers
for different sizes of objects. In C we do this by putting the data
inside a struct and changing alloc_element
and
free_element
to take a pointer to a struct of that type
rather than getting their data from global objects. Like this:
#include <stdlib.h>
typedef struct mm_list_el
{
struct mm_list_el *next, *prev;
} mm_list_element;
typedef struct mem_d
{
mm_list_element *free_list;
size_t size;
} mem_data;
void *alloc_element(mem_data *list)
{
if (list->free_list != NULL)
{ /* allocate from free list */
mm_list_element *res = list->free_list;
list->free_list = list->free_list->next;
return res;
}
else
return malloc(list->size);
}
void free_element(mem_data *list, void *elem)
{
((mm_list_element*)elem)->next =
list->free_list;
list->free_list = elem;
}
The user of our code needs to create and initialize an object of type
mem_data
, and pass that object whenever the code allocates
or frees a list element.
typedef struct list_el
{
struct list_el *next, *prev;
double re, im;
} complex_list_element;
mem_data complex_allocator =
{ 0, sizeof(complex_list_element) };
int main()
{
complex_list_element *elt =
alloc_element(&complex_allocator);
free_element(&complex_allocator, elt);
return 0;
}
In C++ we’d turn all this into a class, and avoid having to explicitly pass the extra pointer:
#include <stdlib.h>
class mem_mgr
{
public:
mem_mgr(size_t sz) : free_list(0), size(sz) {}
void *alloc_element();
void free_element(void *elem);
private:
struct mm_list_element
{
mm_list_element *next, *prev;
};
mm_list_element *free_list;
size_t size;
};
void *mem_mgr::alloc_element()
{
if (free_list != NULL)
{ /* allocate from free list */
mm_list_element *res = free_list;
free_list = free_list->next;
return res;
}
else
return malloc(size);
}
void mem_mgr::free_element(void *elem)
{
((mm_list_element*)elem)->next = free_list;
free_list = (mm_list_element*)elem;
}
typedef struct list_el
{
struct list_el *next, *prev;
double re, im;
} complex_list_element;
mem_mgr complex_allocator(sizeof(complex_list_element));
int main()
{
complex_list_element *elt =
(complex_list_element *)
complex_allocator.alloc_element();
complex_allocator.free_element(elt);
return 0;
}
This isn’t very good C++ code, however. Instead, it’s C code translated rather mechanically into C++. We can improve it tremendously by using templates instead of classes. The first step is to use the size of the list element as a template argument instead of passing it into the constructor.
#include <stdlib.h>
template<size_t size> class mem_mgr
{
public:
static void *alloc_element();
static void free_element(void *elem);
private:
struct mm_list_element
{
mm_list_element *next, *prev;
};
static mm_list_element *free_list;
};
template<size_t size>
void *mem_mgr<size>::alloc_element()
{
if (free_list != NULL)
{ /* allocate from free list */
mm_list_element *res = free_list;
free_list = free_list->next;
return res;
}
else
return malloc(size);
}
template<size_t size>
void mem_mgr<size>::free_element(void *elem)
{
((mm_list_element*)elem)->next = free_list;
free_list = (mm_list_element*)elem;
}
template<size_t size>
mem_mgr<size>::mm_list_element
*mem_mgr<size>::free_list;
typedef struct list_el
{
struct list_el *next, *prev;
double re, im;
} complex_list_element;
typedef mem_mgr<sizeof(complex_list_element)>
complex_allocator;
int main()
{
complex_list_element *elt =
(complex_list_element *)
complex_allocator::alloc_element();
complex_allocator::free_element(elt);
return 0;
}
There is a subtle change that you should notice here. Instead of
using a class with ordinary members, I used a template with static
members. When we used a class we needed a separate object for each
different element size, because each different element size needs a
separate free list. With a template, each time we instantiate a template
with a different size we get a different type. That means that
mem_mgr<24>::free_list
is already distinct from
mem_mgr<16>::free_list
. We don’t need separate
objects to keep these items distinct. The type system will do it for us.
All we need to do is call alloc_element
and
free_element
on the appropriate type.
Now we have a template that lets us create allocators for arbitrary
sizes of list elements. That’s nice in itself, but there are still too
many details exposed to our users. They have to tell the template how
big their element type is, they have to cast the return from
alloc_element
, and there’s the danger that they’ll call
free_element
with an object of the wrong type. These are
easily fixed by adding a second template:
template<class C> class mgr
{
typedef mem_mgr<sizeof(C)> impl;
public:
static C *alloc_element()
{ return (C*)impl::alloc_element(); }
static void free_element(C *ptr)
{ impl::free_element(ptr); }
};
Now we have type safety, and our users don’t have to concern themselves with sizeof
.
In use, our template looks like this:
int main()
{
complex_list_element *elt =
mgr<complex_list_element::alloc_element();
mgr<complex_list_element>::free_element(elt);
return 0;
}
In this column we’ve gradually developed a memory manager for linked
lists. Each version has incorporated a simple main
function
to serve as a driver. Simply calling alloc_element
and
free_element
once, however, shouldn’t be enough to satisfy
us that our code works. We need a more comprehensive test, even if we
have a separate testing group with responsibility for overall testing.
Before we release our code to our testing group we should be convinced
that it will almost certainly pass all of their tests. Next month we’ll
look at how to approach testing memory managers. It’s difficult to do
well, but it’s easy to do a much better than what we’ve done so far.
1. Java, of course, doesn’t fit this pattern. To
get "Hello, world" working in Java is a much larger task than
it is in C or C++. It requires that about half of the core library (by
"core library" I mean the packages java.io
,
java.lang
, and java.util
) be written and
compilable, and that about a quarter of the core library work correctly.
This is in part because Java does not distinguish between interface and
implementation, so library implementors must write function bodies for
every function that is named in another function, even when that code is
not actually called. It is also because functions in Java’s library are
highly interdependent. For example, in order for "Hello,
world" to work the library has to be able to read a system resource
that tells it what the newline character is on the platform where it is
running. Resources are accessed through a HashTable
, so the
code that implements HashTable
has to work, and system
resources are read from files, so the code for opening and reading from
files must also be working.
2. I haven’t addressed the problem of assuring appropriate alignment here. This code is intended only as a sketch of what the system library does. In the actual runtime library the code makes sure that the memory block is aligned suitably for any data type.
Copyright © 1999-2006 by Pete Becker. All rights reserved.