Initialization and Cleanup in C++

If the last time you read Huckleberry Finn was when you were fourteen, or if you haven’t read it at all, I suggest that you read it soon. Twain warns us that

Persons attempting to find a motive in this narrative will be prosecuted; persons attempting to find a moral in it will be banished; persons attempting to find a plot in it will be shot.

Nevertheless, here at The Journeyman’s Shop we think there are important lessons in it. The one I want to look at this month comes from a discussion between Tom Sawyer and Huckleberry Finn about the best way to help the runaway slave Jim to escape from the folks who have captured him. Tom says they need a saw, and Huck asks him, "What do we want of a saw?":

"What do we want of a saw? Hain’t we got to saw the leg of Jim’s bed off, so as to get the chain loose?"

"Why, you just said a body could lift up the bedstead and slip the chain off."

"Well, if that ain’t just like you, Huck Finn. You can get up the infant-schooliest ways of going at a thing. Why, hain’t you ever read any books at all? - Baron Trenck, nor Casanova, nor Benvenuto Chelleeny, nor Henri IV, nor none of them heroes? Who ever heard of getting a prisoner loose in such an old-maidy way as that? No; the way all the best authorities does is to saw the bed leg in two, and leave it just so, and swallow the sawdust, so it can’t be found, and put some dirt and grease around the sawed place so the very keenest seneskal can’t see no sign of its being sawed, and thinks the bed leg is perfectly sound. Then, the night you’re ready, fetch the leg a kick, down she goes; slip off your chain, and there you are."

In order to make our code more reliable and our development processes more predictable, we learn and apply abstract principles to software development. Unfortunately, we tend to become like Tom: we simply do things the way we’ve been told we’re supposed to do them, without thinking about whether the advice we’ve gotten is appropriate in this particular case. The structured programming revolution has given us programmers who blanch when they see a return statement anywhere other than at the end of a function. The object oriented revolution has given us programmers who write elaborate classes to perform what should be simple tasks. To become good programmers we have to learn to recognize the situations where we don’t need to swallow the sawdust.

For example, in comp.lang.c++ I recently made the statement that if the design of a class hierarchy does not call for deleting objects of derived types through pointers to the base type, then the base class does not need to have a virtual destructor. Now, that’s a fairly simple statement, and a practical one. It’s the other side of the rule stated in the C++ language definition, that the effect of deleting a derived object through a pointer to its base is undefined if the base class does not have a virtual destructor. Most of the replies I received didn’t discuss the practical meaning of this statement, however, and instead simply parroted a common design guideline that classes with virtual functions should have a virtual destructor. I think that’s a very useful guideline, but like most guidelines, it’s an oversimplification. The reason it’s useful is that it is easier to apply than the true rule, which is the one that I stated earlier. It provides a simple test: if there are virtual functions, there should be a virtual destructor. This works fairly well, because most classes that have virtual functions are intended to be used as polymorphic bases. That is, objects of derived types will be manipulated through pointers to the base. That often means that they will be contained in collections that deal only in base pointers, and in order to clean up such a collection properly those objects will be deleted through pointers to their base, so the base should have a virtual destructor.

On the other hand, consider a mixin class that defines an interface for writing objects to a file:


class streamable
{
private:
    virtual bool write(ostream&) const = 0;
};

The documentation for this class tells you that you should derive from it when you want to be able to write objects to a file, and override streamable::write to write out the data contained in your class. To write an object, call the global function write:


bool write(ostream&, const streamable&);

This function will write out the object that you call it on, as well as any streamable objects contained in that object and all streamable objects referred to by pointers or references contained in that object.

The coding guideline would dictate that streamable have a virtual destructor. However, a virtual destructor is not needed here, because streamable is not intended to be used as a polymorphic base. In particular, the write function will never delete a streamable object, so there is no need for streamable to have a virtual destructor.

Some folks will cling to their guideline, however, and claim that the cost of providing a virtual destructor is negligible, so we may as well do it. Granted, adding a virtual destructor isn’t like swallowing sawdust, but it is misleading. Someone will look at that virtual destructor and conclude that it’s OK to delete an object derived from streamable through a base pointer, and when they do it, they’ll discover that they don’t understand what it actually does. That’s because when we wrote it we didn’t understand what it was supposed to do. We didn’t design it, we simply included it because we’re supposed to. Don’t do that. Design your classes so that they meet well-defined requirements. As Scott Meyers says about the guidelines in his book Effective C++,

If you follow all the guidelines all the time, you are unlikely to fall into the most common traps surrounding C++, but by their very nature guidelines have exceptions. That’s why each item has an explanation. The explanations are the most important part of the book, because only by understanding the rationale behind an item can you reasonably determine whether it applies to the program you are developing and to the unique constraints under which you toil.

When you see a programming guideline, make sure that you understand the reason for that guideline. Then you’ll understand when you should ignore it.

Initialization and Cleanup in C++

Last month we talked about how to supply initial values to data objects in C, and about how to make sure that any code that our application needs to run to clean things up will be run when the application terminates. This month we’re going to talk about the same things, but in C++. In C we have a constant tension between wanting to defer initializing data objects until we have enough information to initialize them correctly and the risk that we might accidentally use such data objects before the have been properly initialized. C++ eliminates this problem, at least in the course of ordinary programming. If you go out of your way to write perverse code you can evade the initialization and cleanup rules that C++ imposes, but anyone who writes that sort of code deserves whatever happens to them. Here in The Journeyman’s Shop we don’t write perverse code. We give the C++ compiler a fair chance at handling initialization and cleanup correctly.

Of course, handling initialization and cleanup correctly also depends on using classes that have been written correctly. We’re not going to talk about that this month. Rather, we’ll talk about how the existence of constructors and destructors affects the initialization and cleanup code that we looked at last month. We’ll leave the issues involved in writing sensible constructors and destructors for another time.

Assuming that we’re using a class with a sensible constructor and destructor, C++ in most cases guarantees that the constructor will be run before you can use any member functions on an object, and that the destructor will be run when you dispose of an object. This does not apply to built-in types, so initialization of integers, pointers, etc. is subject to the same rules as in C. Further, since a struct written for use in C does not have a constructor or destructor, when you move C code into C++ you won’t automatically get initialization of your structs.

Auto Objects

When you define an object inside a function without the static keyword the compiler generates code to call the object’s constructor at the point in the code where the definition occurs, and it generates code to call the object’s destructor at the curly brace where the object goes out of scope. When more than one object is defined in the same scope, their destructors are run in the reverse order of their construction.

To see this, let’s look at a simple example:


#include <iostream>
using std::cout;

class instrument
{
public:
    instrument() : obj_id(count++)
        {
        cout << "instrument ctor("
            << obj_id << ")\n";
        }
    ~instrument()
        {
        cout << "instrument dtor("
            << obj_id << ")\n";
        }
private:
    int obj_id;
    static int count;   
};

int instrument::count;

int main()
    {
    instrument i0;
    instrument i1;
    cout << "in function body\n";
    return 0;
    }

When you run it you should see this:


instrument ctor(0)
instrument ctor(1)
in function body
instrument dtor(1)
instrument dtor(0)

Now, that might not look like much, but it gives class designers a very powerful mechanism to help assure the consistency of their classes. In particular, when a class needs to obtain a resource like memory or a mutex lock or a file handle and make sure that that resource is released when the class no longer needs it, the class designer should use the constructor to obtain the resource and the destructor to release it. From our current perspective as users of such a class this use of constructors and destructors makes our work much easier. Consider what we’d have to do to write something similar in C:


#include <stdio.h>

static int count;

typedef struct
    {
    int obj_id;
    } instrument;

void init_instrument(instrument *inst)
    {
    inst->obj_id = count++;
    printf("instrument ctor(%d)\n", inst->obj_id);
    }

void cleanup_instrument(instrument *inst)
    {
    printf("instrument dtor(%d)\n", inst->obj_id);
    }

int main()
    {
    instrument i0;
    instrument i1;
    init_instrument(&i0);
    init_instrument(&i1);
    printf("in function body\n");
    cleanup_instrument(&i1);
    cleanup_instrument(&i0);
    return 0;
    }

I hope you’ll agree with me that the C++ version of main is much simpler than the C version. Of course, you have to get used to the fact that defining variables can result in constructor and destructor calls, but once you’ve made that adjustment, its much easier to read and understand the C++ code.

Now that’s rather elementary, and I hope you’re not offended that I spent so much time on it. I think it’s one of the most powerful features of C++. The guarantee that auto objects will be properly initialized and disposed of goes a long way toward making our code safer. The compiler does all the heavy lifting for us. It won’t forget to clean up our objects when we’re finished with them.

This points up what I regard as a major flaw in Java: it has nothing that corresponds to a destructor. I mentioned earlier three examples of resources that need to be released when we’re finished with them: memory, mutex locks, and file handles. Java has its own mechanisms for handling two of these resources: garbage collection handles memory, and synchronized blocks take care of mutex locks. There’s no way you can guarantee that a file handle will be closed when you no longer need it. You’ve got to write that code yourself, every place where it’s appropriate. That means you have to know the details of the class you’re using, so that you can write appropriate cleanup code for it. For example, let’s try to write the sample program that we’ve been looking at in Java:


class Instrument
{
public Instrument()
    {
    obj_id = count++;
    System.out.println("Instrument ctor("
        + obj_id + ")");
    }
public void cleanup()
    {
    System.out.println("Instrument dtor("
        + obj_id + ")");
    }
private int obj_id;
private static int count;
};

public class Test
{
public static void main(String args[])
    {
    Instrument i0 = new Instrument();
    Instrument i1 = new Instrument();
    System.out.println("in function body");
    i1.cleanup();
    i0.cleanup();
    }
};

Test.main has to explicitly call instrument’s cleanup method. Java will not do this for you.

You may have noticed that I didn’t use the class’s finalize method to handle the cleanup. It would be possible to write the class Instrument with a method named finalize instead of the method named cleanup, but that isn’t quite the same thing. Java will only run the finalize method when the memory used by an Instrument object is being recycled. That is, you cannot predict when the code in finalize will be run. Further, if the class is using a system resource that is not recycled by the operating system, then you have to change Java’s default setting to tell it to run finalizers for all objects that haven’t yet been finalized when the program terminates. By default Java does not run finalizers at program termination.

Now let’s make the program a bit more complicated. Instead of simply printing out that we’re in the function body, let’s check the number of command line arguments and if there were no arguments passed, display an error message and exit. Our first pass at the C code might look like this:


#include <stdlib.h>

int main(int argc, char *argv[])
    {
    instrument i0;
    instrument i1;
    init_instrument(&i0);
    init_instrument(&i1);
    if (argc == 1)
        {
        printf("missing argument\n");
        return EXIT_FAILURE;
        }
    printf("in function body\n");
    cleanup_instrument(&i1);
    cleanup_instrument(&i0);
    return 0;
    }

When we compile this program and run it with no command line arguments, we get the following output:


instrument ctor(0)
instrument ctor(1)
missing argument

Our cleanup code didn’t run. The problem, of course, is that we returned from the middle of main without running the cleanup code. That’s one of the major reasons that structured programming prohibits having more than one exit point from a function: it’s easy to forget to do the necessary cleanup. We can fix this easily, by putting the two alternate execution paths in the two branches of our if statement and adding a status variable:


#include <stdlib.h>

int main(int argc, char *argv[])
    {
    int status;
    instrument i0;
    instrument i1;
    init_instrument(&i0);
    init_instrument(&i1);
    if (argc == 1)
        {
        printf("missing argument\n");
        status = EXIT_FAILURE;
        }
    else
        {
        printf("in function body\n");
        status = 0;
        }
    cleanup_instrument(&i1);
    cleanup_instrument(&i0);
    return status;
    }

This works, but look at how the control logic and the details of the code are intertwined: we start out by creating a status variable, then we create and initialize the two variables that we need, then we check the command line and issue an appropriate message and set the status variable, then we clean up our two variables, then we return the value of the status variable. That’s a lot of bookkeeping. With destructors it’s much simpler:


#include <stdlib.h>

int main(int argc, char *argv[])
    {
    instrument i0;
    instrument i1;
    if (argc == 1)
        {
        cout << "missing argument\n";
        return EXIT_FAILURE;
        }
    cout << "in function body\n";
    return 0;
    }

The code is much more cleanly encapsulated: the error condition is handled entirely within a single block. No other code in main has to do anything in order to return the right status code.

The Java version suffers from a similar problem. We can put both our mainline code and our error code in separate branches of an if statement in order to make sure that cleanup is handled properly:


public class Test
{
public static void main(String args[])
    {
    Instrument i0 = new Instrument();
    Instrument i1 = new Instrument();
    if (args.length == 0)
        System.out.println("missing argument");
    else
        System.out.println("in function body");
    i1.cleanup();
    i0.cleanup();
    }
};

Although this code looks better than the C version, that’s mostly because in Java main doesn’t return a value, so we don’t have to keep track of a status code in this particular case. If this function returned a value we’d have to set it in both branches of the if statement, and do the cleanup at the end¹ before returning that value, just as we did in C.

A better solution in Java is to wrap the core code in a try block and put the cleanup code in a finally clause:


public class Test
{
public static void main(String args[])
    {
    Instrument i0 = new Instrument();
    Instrument i1 = new Instrument();
    try
        {
        if (args.length == 0)
            {
            System.out.println("missing argument");
            return;
            }
        System.out.println("in function body");
        }
    finally
        {
        i1.cleanup();
        i0.cleanup();
        }
    }
};

Compare this with the C++ code that does the same thing², and you’ll see that having destructors lets you write much cleaner code.

Arrays of Objects

When you create an array of objects in C++ you run into a significant constraint: objects in arrays can only be initialized with the default constructor. We didn’t look at constructors that take arguments in the preceding section, but as you probably know, a class can have multiple constructors, and you can pass arguments to them:


class instrument
{
public:
    instrument() : obj_id(count++)
        {
        cout << "instrument ctor("
            << obj_id << ")\n";
        }
    instrument(int val) : obj_id(val)
        {
        cout << "instrument ctor("
            << obj_id << ")\n";
        }
    ~instrument()
        {
        cout << "instrument dtor("
            << obj_id << ")\n";
        }
private:
    int obj_id;
    static int count;   
};

Now we can create an object of type instrument that has an arbitrary value for obj_id, not just the one that we generate internally:


int main()
    {
    instrument i0;      // uses default ctor
    instrument i1(3);   // uses ctor that takes int
    cout << "in function body\n";
    return 0;
    }

You can see if you run it that the output is now


instrument ctor(0)
instrument ctor(3)
in function body
instrument dtor(3)
instrument dtor(0)

You can’t do this when you create an array of objects in C++. You can only use the default constructor, that is, the constructor that takes no arguments. For example:


int main()
    {
    instrument i0[3];   // array of 3 objects
    cout << "in function body\n";
    return 0;
    }

The output of this program is


instrument ctor(0)
instrument ctor(1)
instrument ctor(2)
in function body
instrument dtor(2)
instrument dtor(1)
instrument dtor(0)

Static Initializers

Initialization of static objects in C++ can be tricky. That’s because initialization can involve executing code, and the compiler has to figure out when to run that code. The C++ language definition has some weasel words here: the rule is that static objects must be initialized before main is entered or before the use of any object in the translation unit that contains the static object. The provision about initializing before use of any object in the translation unit is there to allow for dynamically loaded libraries, whose initializers can’t be run until the library is actually loaded.

In cases that don’t involve dynamically loaded libraries it’s easy to see how this works:


instrument istatic;

int main()
    {
    instrument i0(3);
    cout << "in function body\n";
    return 0;
    }

Here, the initializer for istatic will be run before main begins, and the initializer for i0 will run on entry to main:


instrument ctor(0)
instrument ctor(3)
in function body
instrument dtor(3)
instrument dtor(0)

Further, static objects defined in the same translation unit are initialized in the order of their definitions:


instrument i0(1);
instrument i1(3);

int main()
    {
    cout << "in function body\n";
    return 0;
    }

Here, i0 will be initialized first, then i1.

The point where this gets tricky is when you have more than one translation unit in your application. The language definition doesn’t say anything about the order in which static objects defined in separate translation units will be initialized. This was a deliberate decision: it’s hard to come up with a rule for initialization order that avoids possible problems and is easy to understand. If you define static objects that don’t depend on one another you shouldn’t have any problems:


test1.cpp
---------
instrument i0(1);

test2.cpp
---------
instrument i1(3);

int main()
    {
    cout << "in function body\n";
    return 0;
    }

If you compile and link these two source files (with the appropriate #include directives, of course) the output from the program can be either


instrument ctor(0)
instrument ctor(3)
in function body
instrument dtor(3)
instrument dtor(0)


instrument ctor(3)
instrument ctor(0)
in function body
instrument dtor(0)
instrument dtor(3)

Don’t count on either of these two orders. Your compiler may fool you. If the order of initialization is important, put the objects in the same file. Or use one of the tricks that we’ll talk about next time.

Local Statics

Local statics in C++ act pretty much the same way as they do in C: they are initialized the first time that the flow of control in the program reaches them. However, the compiler has to generate code to invoke destructors for local statics. The problem here is that if the local static hasn’t been constructed, then the compiler shouldn’t run its destructor. Further, just like all other statics, the compiler has to make sure that destructors for local statics are run at the right time, that is, destructors must be run in the reverse order of construction. If you think about it a bit you’ll see that there really is no way for the compiler to generate a simple function that will simply call all of the destructors in the right order: it doesn’t know what order they will be constructed in, so it doesn’t know what order their destructors should be called in. This means that the compiler has to generate code that will keep track of the order of construction, and undo it properly. Before you do anything tricky in destructors for static objects, be sure that your compiler gets the order right. Better yet, don’t do tricky things there. It’s just too risky.

`atexit` functions and Destructors for Static Objects

Be very, very careful if you mix atexit functions and static objects with destructors. The rules are pretty clear: functions registered with atexit are interleaved with destructors for static objects in the reverse order of their registration. If you create a static object, then register an atexit function, then create another static object, the compiler should generate code that invokes the destructor for the second object, calls the atexit function, then invokes the destructor for the first object. If your code relies on this sequence being handled correctly it is very likely to crash. If you’re concerned about writing portable code (as we are in The Journeyman’s Shop) don’t rely on intermixing atexit functions with static destructors.

Coming Up

Next month we’ll wrap up this discussion of initialization and cleanup by looking at techniques for handling more complex initializations. We’ll look at things like deferred initialization, the nifty-counter trick, and placement new. See you then!

Initialization and Cleanup in C++

The C/C++ Users Journal, April, 1999

Initialization and Cleanup in C++

Auto Objects

Arrays of Objects

Static Initializers

Local Statics

atexit functions and Destructors for Static Objects

Coming Up

`atexit` functions and Destructors for Static Objects