Initialization and Cleanup, Part III

The C/C++ Users Journal, May, 1999

Controlling Initialization and Cleanup

In the last two installments of The Journeyman’s Shop we’ve looked at the mechanisms that C and C++ provide for initializing and cleaning up variables. Most of the time these mechanisms fit fairly closely to what we need to do. We use them almost without thinking, and they do what we need. For example, it’s fairly common to open a file, write some data to it, and close it. In C++ we often do this with an auto object:


#include <fstream>

void write(const char *name)
    {
    std::ofstream out(name);
    out << "Hello, world!\n";
    }

When program execution reaches the end of the function write the destructor for out is run. That closes the file, flushing any buffer that may have been created and assuring that the data has actually been written to the disk. In C there’s a bit more work to do, because there are no destructors:


#include <stdio.h>

void write(const char *name)
    {
    FILE *out = fopen(name, "w");
    fwrite(out, "Hello, world!\n");
    fclose(out);
    }

If you forget to close the file the C runtime system will close it when the program terminates, but it’s still good practice to close files whenever you’re finished with them. If you don’t you run the risk of running out of file handles, preventing you from opening new files later in your program¹. This isn’t a major source of errors, because it’s fairly easy to recognize a missing call to fclose when a file is used only in the function that opened it.

Things get harder when you need to initialize an object that will be used from many different locations in the program. You then have two choices: you can create the object in a function and pass a pointer or reference to it from that function into other functions that need it, or you can create a global object and reference it directly from the places where it is needed. The first of these is preferable if the use of the object isn’t too widespread, because you can dispose of the object in the function that created it, without having to do any extra bookkeeping to make sure that you aren’t disposing of it too early and to make sure that you haven’t forgotten about disposing of it. However, passing extra arguments through a series of function calls adds overhead and complexity, and often the second alternative is a better choice. That’s where C++’s destructors offer a much more powerful mechanism than C provides.


std::ofstream out("test.log");

int main()
    {
    out << "Hello, world!\n";
    return 0;
    }

Notice that I haven’t written any code to close the file. C++ requires that the destructor for out be run after main has finished, and that destructor does the same thing as it did when out was an auto object: it closes the file after flushing any internal buffers. We can do something similar in C:


FILE *out;

int main()
    {
    out = fopen("test.dat", "w");
    fwrite(out, "Hello, world!\n");
    return 0;
    }

But be careful: this only works correctly because the C language guarantees that any files that are open when the program terminates will be properly closed. There is no general cleanup mechanism in C, and if there is any possibility that a resource that you have allocated in a global variable won’t be cleaned up by the runtime system then you must clean it up yourself. In fact, the absence of a call to fclose in the example above bothers me enough that I’d put it in anyway:


FILE *out;

int main()
    {
    out = fopen("test.dat", "w");
    fwrite(out, "Hello, world!\n");
    fclose(out);
    return 0;
    }

When you use global variables like this, main is primarily responsible for initializing and cleaning up the global variables. Do as little else as possible in main, and call a function to do the real work. That way you can look at initialization and cleanup of the global variables without being distracted by other code.

There are times, though, when we don’t want to have global variables automatically initialized like this. We may want to defer initialization to the first time that a global variable is used. That way, if it is never used, we don’t need to initialize it at all. In C++ we may need to force initialization of a global variable at a particular time, to overcome the uncertainty of initialization order for globals. Although those cases require a bit more thought, they are fairly easy to handle.

Controlling Global Objects

The simplest way to defer initialization of a global object is to use some sort of a flag to indicate whether the object has been initialized. Any code that needs to use that object checks the flag, and if the object has not yet been initialized, it initializes it and sets the flag to indicate that the object has been initialized. That’s actually simpler than it sounds, particularly if the object to be initialized is a pointer. The pointer itself can be used as a flag, since it will be initialized to NULL when the program starts. By checking for a null pointer we can determine whether the pointer has been initialized.


FILE *out;

void f()
    {
    if (out == NULL)
        out = fopen("test.dat", "w");
    fwrite(out, "Called f\n");
    }

void g()
    {
    if (out == NULL)
        out = fopen("test.dat", "w");
    fwrite(out, "Called g\n");
    }

void close_out()
    {
    fclose(out);
    }

int main()
    {
    atexit(close_out);
    f();
    g();
    return 0;
    }

This code pushes the responsibility for opening out down into the functions that use it. In a more complex program where f and g might not be called at all, this avoids opening the file if it isn’t actually used. This code is a bit verbose, however, and it might be worthwhile to provide a function to handle opening the file:


void open_out()
    {
    if (out == NULL)
        out = fopen("test.dat", "w");
    }

Each function that will use out should call this function before attempting to use it. The responsibility for cleaning up any resources allocated by a function like this still lies with main: it must register an atexit function to take care of any necessary cleanup.

Functions Returning Pointers or References

Another technique for deferring initialization is to write a function that returns a pointer or a reference to the object, and handle initialization inside that function. When we do that, we no longer need to make the name of the object itself visible throughout the program. All that’s needed is the function itself.


static FILE *out_file;

FILE *open_out()
    {
    if (out_file == NULL)
        out_file = fopen("test.dat", "w");
    return out_file;
    }

With this approach, functions that need to write to the file can either call this function each time they need the FILE pointer or they can create a local variable to hold the pointer.


void f()
    {
    fwrite(open_out(), "Called f\n");
    }

void g()
    {
    FILE *out = open_out();
    fwrite(out, "Called g\n");
    fwrite(out, "Logging additional data\n");
    }

Another variation on this idea is to make the object a local static object rather than a global one, and let the compiler worry about checking whether the object has been initialized. This is particularly useful in C++, because it lets you defer running the constructor for an object:


std::ostream& open_out()
    {
    static std::ofstream out("test.dat");
    return out;
    }

One drawback to this approach is that it involves a function call every time you need to use the object. That’s often not as big a penalty as it might appear to be, because it avoids the risk that someone will forget to check whether the variable has been initialized, and will simply use it. That’s the sort of problem that will hide from you until you’re about to ship your product. Then some tester will come across it and you’ll have to work late several nights to figure out what’s going wrong. So in general this is a case where I’d be willing to accept the overhead of a function call. Of course, if you’re writing in C++ you can always consider making this function inline².

The Nifty Counter Trick

C++ programmers often run into problems with the initialization order of global objects. The rule is quite simple, but dangerous: global objects that are defined in the same translation unit are initialized in the order of their definitions. Programmers run into problems when they forget that this rule says nothing about the order of initialization of global objects defined in different translation units. The language definition says nothing about this. It is up to the implementation to determine the order, and there is no requirement that this order be consistent from one compilation of your application to another, or even from one run of your application to another. In short, if your code depends on having one global object constructed before another global object is constructed then you must define the two objects in the same translation unit or take over managing construction yourself.

Let’s take a quick look at how this sort of problem can arise. Suppose you have a global file object like the one we’ve been looking at throughout this discussion. We’ve only written code that uses this object from within main or functions called from main. What happens when we try to use it from the constructor of another global object?


// database.h
#include <iosfwd>
extern std::ofstream out;

class database
{
public:
	database()
		{
		out << "Opening database\n";
		}
};

// main.cpp
#include "database.h"

std::ofstream out("test.dat");
database db;

int main()
    {
    return 0;
    }

This code works fine. In main.cpp we define the object out before we define the object db. This means that out will be constructed before db is constructed, and the use of out in db’s constructor will work correctly. If we reverse the order of the definitions of out and db in main.cpp then db’s constructor will run first, and it will attempt to insert text into an uninitialized stream. Of course, that’s not a good thing to do. It’s easy to avoid in a case like this, because both objects are defined in the same file. But suppose we wanted to separate the details of our application’s logging operations from the database itself, that is, suppose we wanted to put out into its own file? The obvious solution, simply moving the definition of out, doesn’t work:


// datalog.h
#include <iosfwd>
extern std::ofstream out;

// database.h
#include "datalog.h"

class database
{
public:
	database()
		{
		out << "Opening database\n";
		}
};

// datalog.cpp
#include "datalog.h"

std::ofstream out("test.dat");

// main.cpp
#include "database.h"

database db;

int main()
    {
    return 0;
    }

Here we’ve defined db and out in two separate files, and its up to our compiler to decide what order to initialize them in. It may happen to work, if the compiler initializes out before it initializes db, or it may not, if the compiler initializes db first. If your code has this sort of dependency, don’t rely on the compiler’s choice of order, even if it happens to work, and even if your compiler carefully documents what order it will initialize these objects in. If you do this you’re deep into the realm of nonportable code. Take control, perhaps with one of the techniques we’ve already discussed³.

There’s another technique, though, that you should be aware of. It’s known as the "nifty counter trick", and it’s been used in many implementations of iostreams. The problem that implementors of iostreams run into is exactly the one that we saw above: objects like cout are defined somewhere in the runtime library, and the C++ language does not guarantee that they will be constructed before they are used in other code. For example:


#include <iostream>

class could_be_dangerous
{
public:
	could_be_dangerous()
	{
	std::cout << "Here I am!\n";
	}
};

could_be_dangerous d;

int main()
    {
    return 0;
    }

This sort of code is quite common, and library implementors must insure that cout is constructed before any code uses it. None of the techniques we’ve discussed so far can be used here, because they would all require rewriting the code that uses cout, either by replacing cout with a function call or by adding a check for construction. Neither one is suitable here⁴.

The nifty counter trick involves creating a static object in every file that uses the object that we need to initialize. The static object’s constructor checks a flag, and if we haven’t done the initialization yet it does the initialization and sets the flag. The static object’s destructor takes care of cleanup. This is all accomplished by putting the definition of the static object into the header file that declares the object that we’re going to initialize. Like this:


// datalog.h
#include <iosfwd>
extern std::ofstream out;

class out_initializer
{
public:
	out_initializer()
	{
	if (init_count++ == 0)
		init_out();
	}

	~out_initializer()
	{
	if (--init_count == 0)
		cleanup_out();
	}
private:
	static int init_count;
	static void init_out();
	static void cleanup_out();
};

static out_initializer out_initializer_object;

// datalog.cpp
#include "datalog.h"

std::ofstream out;

static int out_initializer::init_count;

void out_initializer::init_out()
    {
    new (&out) std::ofstream("test.dat");
    }

void out_initializer::cleanup_out()
    {
    out.close();
    }

With this technique, our earlier example with the database would be unchanged:


// main.cpp
#include "database.h"

database db;

int main()
    {
    return 0;
    }

The difference is that now it will work correctly. That’s because the #include directive pulls in the contents of database.h, which in turn pulls in the contents of datalog.h, which defines a static object of type out_initializer. The definition of this object will always occur before any use of the object out itself, because it is in the header that declares out. A translation unit cannot use the name out without using this header, so every translation unit that uses out will get a static object of type out_initializer, and the definition of that object will always come before the definition of any object whose constructor uses out. That means that, regardless of which translation unit the compiler initializes first, there will always be a static object that will be constructed before the first use of out, and the constructor for that static object will initialize out. This guarantees that out will always be initialized before it is used.

There are a couple of drawbacks to the nifty counter technique. First, you may have noticed that it uses placement new to construct an object of type std::ofstream on top of the object out. Somewhere along the line the compiler will also generate a call to construct the object out. The writer of a library that uses this trick must make sure that multiple constructor calls like this will leave the object in a sensible state. Typically that means writing the default constructor to do nothing, so that it won’t change any of the initialization done by the call to placement new. And, of course, there’s a corresponding destructor call as the program terminates. That could happen before the destructor for the last out_initializer object calls close. Again, the destructor must not do anything that makes this final call fail. This is tricky code to get right. Don’t try this at home.

The second drawback is that all of those objects of type out_initializer must actually be constructed, even though it is only the first one that actually performs the initialization of out. That’s unavoidable, and although it doesn’t look like much, it can be very expensive on a system that uses virtual memory: it could cause pages to be pulled in off of the disk in order to execute these constructors. That could slow down program initialization considerably.

In the form I’ve described, the nifty counter trick can only be used safely by implementors of the standard library. They know their compiler’s quirks, and can make sure that any peculiarities that they rely on will not be changed in some future release. There’s a slight variation that’s much more useful. It’s completely portable, because it eliminates the compiler-generated constructor and destructor calls. Instead of using an object named out, use a reference. Initialize it like this:


// datalog.cpp
#include "datalog.h"

static char out_data[sizeof(std::ofstream)];
std::ofstream &out = (std::ofstream)out_data;

static int out_initializer::init_count;

void out_initializer::init_out()
    {
    new (&out_data) std::ofstream("test.dat");
    }

void out_initializer::cleanup_out()
    {
    (&out)->~ofstream();
    }

What we’ve done here is to create an array of char named out_data consisting of enough bytes to hold an object of type std::ofstream, told the compiler to pretend that this array of char is actually an object of type std::ofstream, and initialized the reference out to refer to this chunk of memory. Since the chunk of memory is defined as an array of char the compiler won’t generate any code to initialize it⁵, so we don’t have to worry about multiple constructors or destructors. That tricky-looking expression at the end of cleanup_out simply invokes the destructor for ofstream on the object that we created.

Conclusion

Most of the time you can simply use the built in mechanisms for initialization and cleanup of objects and they’ll do just what you need. Occasionally you may run into a tricky situation where the built in mechanisms don’t provide enough flexibility, or are too flexible. In those cases you need to take control of initialization and cleanup yourself, using flags, local static objects, or counters to keep track of whether an object has been initialized and acting accordingly. None of these techniques is particularly hard to implement. The hard parts are in recognizing that you need something other than what the compiler will give you and in deciding on which of the available techniques is best suited to your application.

Summary of Initialization and Cleanup
All Variables
    What should I initialize it to
    How should I initialize it
    When should I initialize it
Variables that require scarce resources
    How should I dispose of its resources
    When should I dispose of its resources
Lifetime and initialization
    Global static variables are initialized once, at program startup
    Local static variables are initialized once,
        when the block in which they are defined is first entered
    Auto variables are initialized every time the
        block in which they are defined is entered
Cleaning up in C
    atexit
Cleaning up in C++: destructors
    Destructors for static objects, both global and local, are run at program 
        termination, in reverse order of construction
    Destructors for auto objects are run when the block in which they are defined is 
        exited by any means, including throwing an exception
When the language doesn’t do what you need
    Use default 0 initialization to flag variables that haven’t been initialized
    Use a function that returns a reference to a local static object
    Use the nifty counter trick

1. Older versions of DOS used to limit programs to twenty file handles, and most compilers grabbed five handles for their own use. Having only fifteen handles available made programmers quite careful of how they used file handles.

2. Even the version with the local static can be inlined. The C++ language definition says that each inline version of open_out refers to the same static object. That’s a little tricky for the compiler to do, so I’d be a bit hesitant to rely on it. It’s not something I’d avoid, but I’d make sure that it’s on the list of porting issues for the code.

3. Another possibility is to use compiler-specific mechanisms for controlling order of initialization. This often is easy to do, and if you can do it in a way that guarantees that you’ll get a compile-time error if you change to a compiler that doesn’t support that mechanism you can do it safely. This only postpones the porting issue, though: once you move to another compiler you’ll have to either come up with another nonportable technique for use with that compiler or write code that works with all compilers.

4. In libraries that we write, however, we aren’t constrained by this sort of prior usage. We can design the interface to our libraries to avoid this problem entirely, often by using a function call instead of an object, as we saw earlier. That’s probably a good idea in most cases.

5. The block does get initialized to all zeros, but that’s done before any constructors are run, so we don’t have to worry about it getting in our way.