Memory and Object Management

The C/C++ Users Journal, June, 1999

Back about ten years ago an engineer I worked with gave me a valuable piece of advice. He said that the first thing he does when he starts a new project is design the memory manager. He was exaggerating a bit: designing the memory manager is usually about the third thing to do on a new project. He made a good point, though. Here in The Journeyman’s Shop we have learned over the years that looking carefully at an application’s memory usage characteristics and using the resulting information to choose an appropriate memory allocation strategy can often shrink an application’s memory requirements and make it run faster, sometimes by as much as a couple of orders of magnitude. Don’t be fooled by the constant talk of bigger RAM and faster CPUs making efficiency unimportant. We’ve been hearing that talk for thirty years, and it’s always been wrong. Today’s memory management problems can be masked by tomorrow’s technology, but the day after tomorrow we’ll create new memory management problems as we try to solve bigger and harder problems, problems that we wouldn’t dream of working on today. The demands made by new software applications are always pushing the limits imposed by the hardware that the applications run on. If your application is too big or too slow you probably can’t wait for faster hardware: you have to fix the problem now. One of the first things you should look at is how your application uses memory. Changes here can often produce dramatic improvements.

For example, a few weeks ago I wrote some Java code that read and summarized data from about fifty text files. The first version of the program took about four minutes to run. It was far too slow. One problem was that the code opened, read and closed the same file about half a dozen times, each time looking for different information. By changing the code to open each file only once and keep the file open until it was no longer needed I cut the runtime to a bit under a minute. That is, the application was faster by about a factor of four. Not bad, but still not as fast as it should have been. Then I looked at how the application used memory. During a single run it allocated about a million blocks of memory totaling about 40 million bytes. With a few simple changes I reduced this to 35,000 blocks and 1.7 million bytes. It now runs in about eight seconds¹, faster than the previous improved version by about a factor of seven. Both fixes were needed, but improving the application’s memory management produced a much larger speed increase.

Overview of Memory Usage Issues

Whenever you add a variable to your program you have to make several decisions that affect the overall memory usage of your program. These decisions can affect the size, speed, and safety of the program. Just as with any other engineering decision, good program design involves finding the right balance of these three factors for the particular program that you are working on. Over the years you’ve probably developed a set of rules and habits that you apply when making these decisions. For most of us, these rules and habits do a pretty good job most of the time, but they fail miserably the rest of the time. One of our goals in The Journeyman’s Shop is to turn our habits into conscious guidelines, so that we will be more aware of the decisions we are making as we write code. That way we will be more likely to recognize situations where our guidelines do not apply and we have to think more carefully about what we are doing. If I had though more about memory usage when I was writing the code that I talked about in the previous paragraph I might have avoided the problems that I ran into, and not had to go back and rewrite code that I thought was finished.

Adding a variable to a program involves answering three questions: is the variable really needed; where in the program do I need to be able to use the variable; and how should I define the variable to get the best balance between size, speed, and safety. The last of these questions is much more complicated than it may appear. It involves understanding the application that we are developing, as well as the tools that the programming language we are using provides for object and memory management. We’ll spend most of our time in the next few months on that question, after we talk about a few fundamental concepts and briefly go over the first two questions.

Object Management versus Memory Management

It’s important, when talking about memory usage, to distinguish between managing objects² and managing memory. Objects are the things we deal with in our source code. They often have names and they are usually created explicitly in the code. For example, the following code creates eight objects:


int limit = 3;                          // two objects

void demo()
    {
    fstream out("test.dat");  // two objects
    complex *data =                     // one object
        new complex(1.0, 0.0);          // three objects
    }

Three of these objects, limit, out, and data, have names that we’ve given to them. Four of these objects are literal constants (3, "test.dat", 1.0, and 0.0), and their names describe their contents. One object, the object of type complex created by new, does not have a name.

Object management consists of two decisions: the decision to create an object and the decision to dispose of it. In many cases we let the compiler make those decisions for us. We don’t know, nor do we care, where the value 3 is stored in the program above. All that matters is that we get the right value when we use it³. Once we get away from literal constants, however, we get more control over when objects are created and when they are disposed of. In the example above the object named limit is created at program startup and isn’t disposed of until program termination. The objects named out and data are created on entry into the function demo and disposed of on exit from demo. The object created by new is created when program execution reaches the new statement and it is never disposed of.

Memory, on the other hand, is simply storage space where we can place objects. Memory management involves allocating storage space for objects as needed, and recycling that storage space when it is no longer needed. It is obviously closely involved in object management, in that creating an object requires allocating space for it and disposing of an object makes the space it occupied available for reuse. It warrants separate consideration, however, because it deals with how we allocate and recycle memory rather than about deciding when to do so. We can often ignore memory management entirely, trusting the compiler and its runtime system to handle it for us. If this works, we can complete our application more quickly than if we wrote and debugged a memory manager ourselves. However, the memory management tools that come with the compiler were designed to work well in most cases, and we sometimes hit upon a case that’s outside the area where they work well. When that happens we need to take control of memory management ourselves.

Tools Provided by the Language

The design of a programming language involves, in part, deciding how programs written in that language will manage objects and memory. In C and C++ decisions about how the compiler itself manages objects are reflected in the notion of storage duration. In the code snippet above we saw examples of objects with the two storage durations supported by these languages: limit has static duration, lasting throughout the program’s execution; out and data have auto duration, lasting only while the block containing them is being executed. In addition, the runtime support libraries for C and C++ provide a free store, accessible through calls to malloc and free, where we can create objects that will continue to exist until we explicitly release them. The notion of storage duration doesn’t incorporate explicit management of the free store, so when we are creating an application, instead of talking about storage duration we usually talk about the lifetime of an object. C and C++ give you tools to create objects with three possible lifetimes: objects with static duration exist throughout the program’s execution; objects with auto duration exist while the function containing them is being executed; and objects on the free store exist from the time they are explicitly created until the time they are explicitly destroyed.

These three possible lifetimes usually involve three separate memory management techniques. Objects with static duration are usually simply created as part of the executable image; their memory is never released. Objects with auto duration are usually created on the stack⁴: on entry into a function space is allocated for all the auto objects defined in that function, and on exit that space is made available for use by any other function that may be called. Objects whose lifetime you control are simply allocated with malloc and released with free.

Now, in the first two cases, the compiler is in charge of the object. Once you’ve decided to give control to the compiler you have no further say in regard to the memory management technique. With objects whose lifetime you control, however, you have complete control. If you don’t like the technique that your compiler’s implementation of malloc and free uses you can write your own functions and use them⁵.

In C++ we use the keywords new and delete to create and dispose of objects with arbitrary lifetimes. When we say something like


complex *data = new complex(1.0, 0.0);

we’re telling the compiler to allocate space from the free store for an object of type complex and to initialize that object using the constructor for the type complex that takes two arguments of type double. When we say


delete data;

we’re telling the compiler to call the destructor for complex, passing to it the address held in data, and then to release the memory that this object occupied back to the free store. The C++ runtime library provides two functions, named operator new and operator delete, that are used to talk to the free store. If you don’t like the technique that your compiler’s implementation of operator new and operator delete uses you can write your own versions. They have to be named operator new and operator delete, because the compiler doesn’t necessarily know, when it sees the use of the keyword new, that you have replaced operator new with your own code. The C++ standard allows you to do this. In addition, it does not allow the function malloc to use operator new to get memory and it does not allow the function free to use operator delete. This means that it is safe for you to call malloc from inside your version of operator new and free from inside your version of operator delete. For example, to try this out you might write something simple like this:


void *operator new(size_t sz)
    {
    void *res = malloc(sz == 0 ? 1 : sz);
    if (res == NULL)
        throw bad_alloc();
    return res;
    }

void operator delete(void *ptr)
    {
    free(ptr);
    }

In addition, C++ allows you to write a version of operator new and of operator delete for any class. When you create an object whose type is a class with its own version of operator new or operator delete the compiler uses that function instead of the global function to allocate or free the memory. For example:


class sample
{
public:
    static void *operator new(size_t);
    static void operator delete(void *);
};

void *sample::operator new(size_t sz)
    {
    void *res = malloc(sz == 0 ? 1 : sz);
    if (res == NULL)
        throw bad_alloc();
    return res;
    }

void sample::operator delete(void *ptr)
    {
    free(ptr);
    }

That’s all there is to it. The tools provided by C and C++ for object management and for memory management are very simple. Of course, that simplicity can be misleading; these tools require thorough understanding and careful use.

Deciding Whether a Variable is Really Needed

The cheapest way to manage objects is not to create them at all. That’s why many programmers tend to reuse variables: creating one variable is usually faster and uses less memory than creating several. C encourages this by only allowing creation of variables at the beginning of a block, making it much easier to see how many variables you’ve created. When you look at the beginning of a function and see the definitions of several integer variables that are used only to control loops within the function you’ll probably be tempted to consolidate them into a single variable. That’s often appropriate, but if you do it, be aware of the risks that this involves. Using the same variable for different purposes at different points in the code can lead to errors. Let’s take a somewhat contrived example:


int show(char **errors, char **warnings)
    {
    int i, j;
    for (i = 0; i < 100; ++i)
        if (errors[i] == NULL)
            break;
        else
            printf("%s\n", errors[i]);
    for (j = 0; j < 100; ++j)
        if (warnings[j] == NULL)
            break;
        else
            printf("%s\n", warnings[j]);
    return i;
    }

If you read this code carelessly, overlooking the return statement at the end, you might decide to eliminate j and use i as the only control variable. If you do this you’ll change the meaning of the function: instead of returning the number of error messages that it printed, it will return the number of warning messages.

Of course, a major part of the problem in this code is the poor names chosen for the loop control variables - giving them names like error_count and warning_count might have helped prevent this problem. Nevertheless, the problem is still real: we have to be careful not to let our zeal for eliminating unnecessary variables lead us into writing code that doesn’t work correctly.

Still, it’s often appropriate to hunt for ways to eliminate variables. After all, creating a variable takes time and uses memory space. For an application that’s required to be fast and small, eliminating variables can lead to speed improvements.

Where Do I Need To Use a Variable

One of the most important guidelines for writing good C code is to create objects in the smallest scope possible. For beginners that’s an important lesson: don’t use a global variable to control a loop. In most cases it’s good advice for more advanced programmers, too. But in some cases this guideline leads to poor performance, because giving a variable a short lifetime can mean that you end up creating and destroying it more often. This can be expensive, both because of the cost of initialization and destruction and because of the cost of allocating and releasing space for the variable. Giving a key variable a longer lifetime can often speed up an application. That’s one of the things I did in our implementation of the Java library to make programs run faster.

Many of the Java functions that write output provide overloaded versions, one that takes a single char, one that takes an array of chars, and one that takes an array of chars with a starting index and a count. If I were writing the version that takes a single char in C++ I’d write it like this:


void write(char *buf, int, int);
     
void write(char b)
    {
    char buf[1];
    buf[0] = b;
    write(buf, 0, 1);
    }

That is, when we need to write a single char we simply insert it into a one-element array of chars and write out that array. In Java, code that uses this approach looks like this:


class Writer
{
public void write(char buf[], int off, int len)
    {
    // code omitted
    }
public void write(char b)
    {
    char buf[] = new char[1];
    buf[0] = b;
    write(buf, 0, 1);
    }
};

In both code samples we’ve followed the guideline of making the scope of buf as small as possible: we’ve restricted buf to the function that it is used in. In the C++ code this works fine. In the Java code it works correctly, but it’s a significant bottleneck. A clue to the difference is in the use of the keyword new in the Java code: we’re going out to the free store to allocate a one-element array every time we need to write a char. In the C++ code the array has auto storage duration, so it is created on the stack each time we write a char. The difference in speed between the two object creation mechanisms is profound. We can get the Java code to run much faster by broadening the scope of the array⁶:


class Writer
{
public void write(char buf[], int off, int len)
    {
    // code omitted
    }
public void write(char b)
    {
    buf[0] = b;
    write(buf, 0, 1);
    }
private char buf[] = new char[1];
};

This makes code that writes individual chars dramatically faster, at the cost of making buf accessible to all of the methods in Writer rather than just the one that needs it.

The problem in this example was simply that creating an object can be slow. The same problem can occur in C++, even without going to the free store, with objects that have constructors. It often pays to move the creation of an object to an outer scope where it won’t be constructed as many times. For example, none of us would write code like this:


void slow(char *msg[])
    {
    for (; *msg != 0; ++msg)
        {
        fstream out("test.out", ios::out | ios::ate);
        out << *msg << `\n’;
        }
    }

Here, we’re writing each string in the array msg to the file named test.out. For each string, we open the file, write the message, and close it, repeating the sequence as many times as necessary. While the performance of this loop might be satisfactory if we were simply printing out the handful of elements from a command line, we’d rarely write production code that does this because opening and closing the file so many times would be prohibitively slow⁷. Instead, we’d write the code like this:


void faster(char *msg[])
    {
    fstream out("test.out", ios::out | ios::ate);
    for (int i = 0; i < 100; ++i)
        out << *msg << `\n’;
    }

Note that we’ve expanded the lifetime of the output stream: it now exists and is available throughout our function, rather than existing only inside the loop where it is actually used. The code does the same thing as the earlier version, but it is much faster. However, the change we’ve made makes the stream out available and usable throughout the function instead of confining it to the for loop where we actually write the messages. In a small function like this that change isn’t particularly risky. In a larger function, though, this change adds to the risk that the variable out will be used incorrectly later on in the code. And if we changed the stream out to a global variable because we needed it in several functions we’d greatly increase the risk involved in using it. We could reduce that risk by adding code to insure that it was properly initialized before any possible use, but as we’ve seen in the past several columns, that’s hard to get right, and it adds to the bulk of code that needs to be maintained. That’s why programmers are urged to define variables in the narrowest scope possible: doing so reduces the risk that variables will be used incorrectly, and reduces the need for additional code to try to protect against misuse. Narrowing the scope as much as possible, however, can slow a program down significantly, so we often compromise by giving such a variable a broader scope than is technically necessary in order to get reasonable performance.

Corrections

Matthew Healy and Eric Nagler both caught me in a misstatement. In the April issue I said that you can only use the default constructor to initialize C++ objects in an array. That’s wrong. You can use aggregate initialization, like this:


complex basis[2] =
    {
    complex(1.0, 0.0),
    complex(0.0, 1.0)
    };

The standard says that the two unnamed objects in the initializer will be constructed, and then copied into the array. The situation that I had in mind was creating an array using operator new. There is no language construct that lets you specify initializers for the elements in such an array. All you can do is create the array and then assign objects:


void create()
    {
    complex *basis = new complex[2];
    basis[1] = complex(1.0, 0.0);
    basis[2] = complex(0.0, 1.0);
    }

Jason Todd raised a good point about the initialization code that I discussed in the March issue. The code that I wrote looked like this:


if (jt_init() != JT_SUCCESS
    || atexit(jt_cleanup) != 0)
        quit("unable to initialize thread");

He asked why I called atexit from this code, rather than having jt_init call it. The reason is that I was thinking of these functions as analogues of C++’s constructors and destructors, and put both calls in the same function. He’s right, though: it makes much better sense for jt_init to handle registering its own cleanup function. That way, if the cleanup requirements for a particular module change, the code that initializes that module doesn’t need to change. Instead of two tests, the initialization code would have only one:


if (jt_init() != JT_SUCCESS)
    quit("unable to initialize thread");

1. The system I ran this on had 192 Megabytes of RAM, more than enough for the original version of the application, so the slow speed was not the result of swapping data out to the hard disk.

2. Note that throughout this series of columns I will use the term "object" in the compiler-writer’s sense: an object is a block of memory that holds a value of some sort and a set of operations that can be performed on that value. From this perspective, and int is an object: it occupies four bytes (the actual size depends on your compiler), and you can store values in it, add its value to another value, subtract values from the value it holds, and so on. This is the origin of the more restricted use of the term in the newer notion of object oriented programming, where "object" usually does not include built-in types such as int, etc. which have their own sets of rules. The underlying concept, however, is the same in both cases.

3. Some implementations of FORTRAN used to create literal values as actual data storage, initialized to the appropriate value. Since FORTRAN used pass-by-reference, this meant that it was possible to modify the value of a literal constant by writing FORTRAN code analogous to this C++ code:


void modify(int &ir)
    {
    ++ir;
    }

int main()
    {
    modify(1);
    cout << 1 << endl;
    }

Fortunately, we’ve learned since then that literals shouldn’t be modifiable. Well, except for quoted strings, of course.

4. This isn’t required. A compiler that called malloc and free at the appropriate points for every auto variable would conform to the language definition. It would also generate applications that ran rather slowly.

5. Don’t name them malloc and free, however. Although many people think these are supposed to be freely replaceable, the C language definition does not permit you to create your own functions with these names if any of the source files in your application contains the line


#include <stdlib.h>

Instead, if you need to replace malloc and free everywhere in some pre-existing code, write functions with their own appropriate names and use macros to replace the names malloc and free in the source code with the names of your functions.

6. However, this code does not address synchronization issues. As written, if two separate threads call write(char b) at the same time with two different characters there’s no telling what characters will actually be written. So write(char b) needs to be synchronized, slowing it down. This is still worthwhile, in part because write(char buf[], int start, int len) is already synchronized, so getting the monitor in write(char b) simply means that write(char buf[], int start, int len) doesn’t ever have to wait for the monitor when it is called from write(char b).

7. We might actually want to do this if we were logging data in an unstable environment, and wanted to minimize the amount of data that would be lost if the system crashed. Closing the file gives us the best possible guarantee that all the data that we’ve written will be flushed to the disk.