In the last two installments of The Journeyman’s Shop we’ve looked at the mechanisms that C and C++ provide for initializing and cleaning up variables. Most of the time these mechanisms fit fairly closely to what we need to do. We use them almost without thinking, and they do what we need. For example, it’s fairly common to open a file, write some data to it, and close it. In C++ we often do this with an auto object:
#include <fstream>
void write(const char *name)
{
std::ofstream out(name);
out << "Hello, world!\n";
}
When program execution reaches the end of the function
write
the destructor for out
is run. That
closes the file, flushing any buffer that may have been created and
assuring that the data has actually been written to the disk. In C
there’s a bit more work to do, because there are no destructors:
#include <stdio.h>
void write(const char *name)
{
FILE *out = fopen(name, "w");
fwrite(out, "Hello, world!\n");
fclose(out);
}
If you forget to close the file the C runtime system will close it
when the program terminates, but it’s still good practice to close files
whenever you’re finished with them. If you don’t you run the risk of
running out of file handles, preventing you from opening new files later
in your program1. This isn’t a major
source of errors, because it’s fairly easy to recognize a missing call
to fclose
when a file is used only in the function that
opened it.
Things get harder when you need to initialize an object that will be used from many different locations in the program. You then have two choices: you can create the object in a function and pass a pointer or reference to it from that function into other functions that need it, or you can create a global object and reference it directly from the places where it is needed. The first of these is preferable if the use of the object isn’t too widespread, because you can dispose of the object in the function that created it, without having to do any extra bookkeeping to make sure that you aren’t disposing of it too early and to make sure that you haven’t forgotten about disposing of it. However, passing extra arguments through a series of function calls adds overhead and complexity, and often the second alternative is a better choice. That’s where C++’s destructors offer a much more powerful mechanism than C provides.
std::ofstream out("test.log");
int main()
{
out << "Hello, world!\n";
return 0;
}
Notice that I haven’t written any code to close the file. C++
requires that the destructor for out
be run after
main
has finished, and that destructor does the same thing
as it did when out
was an auto object: it closes the file
after flushing any internal buffers. We can do something similar in
C:
FILE *out;
int main()
{
out = fopen("test.dat", "w");
fwrite(out, "Hello, world!\n");
return 0;
}
But be careful: this only works correctly because the C language
guarantees that any files that are open when the program terminates will
be properly closed. There is no general cleanup mechanism in C, and if
there is any possibility that a resource that you have allocated in a
global variable won’t be cleaned up by the runtime system then you must
clean it up yourself. In fact, the absence of a call to
fclose
in the example above bothers me enough that I’d put
it in anyway:
FILE *out;
int main()
{
out = fopen("test.dat", "w");
fwrite(out, "Hello, world!\n");
fclose(out);
return 0;
}
When you use global variables like this, main
is
primarily responsible for initializing and cleaning up the global
variables. Do as little else as possible in main
, and call
a function to do the real work. That way you can look at initialization
and cleanup of the global variables without being distracted by other
code.
There are times, though, when we don’t want to have global variables automatically initialized like this. We may want to defer initialization to the first time that a global variable is used. That way, if it is never used, we don’t need to initialize it at all. In C++ we may need to force initialization of a global variable at a particular time, to overcome the uncertainty of initialization order for globals. Although those cases require a bit more thought, they are fairly easy to handle.
The simplest way to defer initialization of a global object is to use
some sort of a flag to indicate whether the object has been initialized.
Any code that needs to use that object checks the flag, and if the
object has not yet been initialized, it initializes it and sets the flag
to indicate that the object has been initialized. That’s actually
simpler than it sounds, particularly if the object to be initialized is
a pointer. The pointer itself can be used as a flag, since it will be
initialized to NULL
when the program starts. By checking
for a null pointer we can determine whether the pointer has been
initialized.
FILE *out;
void f()
{
if (out == NULL)
out = fopen("test.dat", "w");
fwrite(out, "Called f\n");
}
void g()
{
if (out == NULL)
out = fopen("test.dat", "w");
fwrite(out, "Called g\n");
}
void close_out()
{
fclose(out);
}
int main()
{
atexit(close_out);
f();
g();
return 0;
}
This code pushes the responsibility for opening out
down
into the functions that use it. In a more complex program where
f
and g
might not be called at all, this
avoids opening the file if it isn’t actually used. This code is a bit
verbose, however, and it might be worthwhile to provide a function to
handle opening the file:
void open_out()
{
if (out == NULL)
out = fopen("test.dat", "w");
}
Each function that will use out
should call this
function before attempting to use it. The responsibility for cleaning up
any resources allocated by a function like this still lies with
main
: it must register an atexit
function to
take care of any necessary cleanup.
Another technique for deferring initialization is to write a function that returns a pointer or a reference to the object, and handle initialization inside that function. When we do that, we no longer need to make the name of the object itself visible throughout the program. All that’s needed is the function itself.
static FILE *out_file;
FILE *open_out()
{
if (out_file == NULL)
out_file = fopen("test.dat", "w");
return out_file;
}
With this approach, functions that need to write to the file can
either call this function each time they need the FILE
pointer or they can create a local variable to hold the pointer.
void f()
{
fwrite(open_out(), "Called f\n");
}
void g()
{
FILE *out = open_out();
fwrite(out, "Called g\n");
fwrite(out, "Logging additional data\n");
}
Another variation on this idea is to make the object a local static object rather than a global one, and let the compiler worry about checking whether the object has been initialized. This is particularly useful in C++, because it lets you defer running the constructor for an object:
std::ostream& open_out()
{
static std::ofstream out("test.dat");
return out;
}
One drawback to this approach is that it involves a function call every time you need to use the object. That’s often not as big a penalty as it might appear to be, because it avoids the risk that someone will forget to check whether the variable has been initialized, and will simply use it. That’s the sort of problem that will hide from you until you’re about to ship your product. Then some tester will come across it and you’ll have to work late several nights to figure out what’s going wrong. So in general this is a case where I’d be willing to accept the overhead of a function call. Of course, if you’re writing in C++ you can always consider making this function inline2.
C++ programmers often run into problems with the initialization order of global objects. The rule is quite simple, but dangerous: global objects that are defined in the same translation unit are initialized in the order of their definitions. Programmers run into problems when they forget that this rule says nothing about the order of initialization of global objects defined in different translation units. The language definition says nothing about this. It is up to the implementation to determine the order, and there is no requirement that this order be consistent from one compilation of your application to another, or even from one run of your application to another. In short, if your code depends on having one global object constructed before another global object is constructed then you must define the two objects in the same translation unit or take over managing construction yourself.
Let’s take a quick look at how this sort of problem can arise. Suppose you have a global file object like the one we’ve been looking at throughout this discussion. We’ve only written code that uses this object from within main or functions called from main. What happens when we try to use it from the constructor of another global object?
// database.h
#include <iosfwd>
extern std::ofstream out;
class database
{
public:
database()
{
out << "Opening database\n";
}
};
// main.cpp
#include "database.h"
std::ofstream out("test.dat");
database db;
int main()
{
return 0;
}
This code works fine. In main.cpp
we define the object
out
before we define the object db
. This means
that out
will be constructed before db
is
constructed, and the use of out
in db
’s
constructor will work correctly. If we reverse the order of the
definitions of out
and db
in
main.cpp
then db
’s constructor will run first,
and it will attempt to insert text into an uninitialized stream. Of
course, that’s not a good thing to do. It’s easy to avoid in a case like
this, because both objects are defined in the same file. But suppose we
wanted to separate the details of our application’s logging operations
from the database itself, that is, suppose we wanted to put out into its
own file? The obvious solution, simply moving the definition of
out
, doesn’t work:
// datalog.h
#include <iosfwd>
extern std::ofstream out;
// database.h
#include "datalog.h"
class database
{
public:
database()
{
out << "Opening database\n";
}
};
// datalog.cpp
#include "datalog.h"
std::ofstream out("test.dat");
// main.cpp
#include "database.h"
database db;
int main()
{
return 0;
}
Here we’ve defined db
and out
in two
separate files, and its up to our compiler to decide what order to
initialize them in. It may happen to work, if the compiler initializes
out
before it initializes db
, or it may not,
if the compiler initializes db
first. If your code has this
sort of dependency, don’t rely on the compiler’s choice of order, even
if it happens to work, and even if your compiler carefully documents
what order it will initialize these objects in. If you do this you’re
deep into the realm of nonportable code. Take control, perhaps with one
of the techniques we’ve already discussed3.
There’s another technique, though, that you should be aware of. It’s
known as the "nifty counter trick", and it’s been used in many
implementations of iostreams. The problem that implementors of iostreams
run into is exactly the one that we saw above: objects like
cout
are defined somewhere in the runtime library, and the
C++ language does not guarantee that they will be constructed before
they are used in other code. For example:
#include <iostream>
class could_be_dangerous
{
public:
could_be_dangerous()
{
std::cout << "Here I am!\n";
}
};
could_be_dangerous d;
int main()
{
return 0;
}
This sort of code is quite common, and library implementors must
insure that cout
is constructed before any code uses it.
None of the techniques we’ve discussed so far can be used here, because
they would all require rewriting the code that uses cout
,
either by replacing cout
with a function call or by adding
a check for construction. Neither one is suitable here4.
The nifty counter trick involves creating a static object in every file that uses the object that we need to initialize. The static object’s constructor checks a flag, and if we haven’t done the initialization yet it does the initialization and sets the flag. The static object’s destructor takes care of cleanup. This is all accomplished by putting the definition of the static object into the header file that declares the object that we’re going to initialize. Like this:
// datalog.h
#include <iosfwd>
extern std::ofstream out;
class out_initializer
{
public:
out_initializer()
{
if (init_count++ == 0)
init_out();
}
~out_initializer()
{
if (--init_count == 0)
cleanup_out();
}
private:
static int init_count;
static void init_out();
static void cleanup_out();
};
static out_initializer out_initializer_object;
// datalog.cpp
#include "datalog.h"
std::ofstream out;
static int out_initializer::init_count;
void out_initializer::init_out()
{
new (&out) std::ofstream("test.dat");
}
void out_initializer::cleanup_out()
{
out.close();
}
With this technique, our earlier example with the database would be unchanged:
// main.cpp
#include "database.h"
database db;
int main()
{
return 0;
}
The difference is that now it will work correctly. That’s because the
#include
directive pulls in the contents of
database.h
, which in turn pulls in the contents of
datalog.h
, which defines a static object of type
out_initializer
. The definition of this object will always
occur before any use of the object out
itself, because it
is in the header that declares out
. A translation unit
cannot use the name out
without using this header, so every
translation unit that uses out
will get a static object of
type out_initializer
, and the definition of that object
will always come before the definition of any object whose constructor
uses out
. That means that, regardless of which translation
unit the compiler initializes first, there will always be a static
object that will be constructed before the first use of
out
, and the constructor for that static object will
initialize out
. This guarantees that out
will
always be initialized before it is used.
There are a couple of drawbacks to the nifty counter technique.
First, you may have noticed that it uses placement new
to
construct an object of type std::ofstream
on top of the
object out
. Somewhere along the line the compiler will also
generate a call to construct the object out
. The writer of
a library that uses this trick must make sure that multiple constructor
calls like this will leave the object in a sensible state. Typically
that means writing the default constructor to do nothing, so that it
won’t change any of the initialization done by the call to
placement new
. And, of course, there’s a corresponding
destructor call as the program terminates. That could happen before the
destructor for the last out_initializer
object calls
close
. Again, the destructor must not do anything that
makes this final call fail. This is tricky code to get right. Don’t try
this at home.
The second drawback is that all of those objects of type
out_initializer
must actually be constructed, even though
it is only the first one that actually performs the initialization of
out
. That’s unavoidable, and although it doesn’t look like
much, it can be very expensive on a system that uses virtual memory: it
could cause pages to be pulled in off of the disk in order to execute
these constructors. That could slow down program initialization
considerably.
In the form I’ve described, the nifty counter trick can only be used
safely by implementors of the standard library. They know their
compiler’s quirks, and can make sure that any peculiarities that they
rely on will not be changed in some future release. There’s a slight
variation that’s much more useful. It’s completely portable, because it
eliminates the compiler-generated constructor and destructor calls.
Instead of using an object named out
, use a reference.
Initialize it like this:
// datalog.cpp
#include "datalog.h"
static char out_data[sizeof(std::ofstream)];
std::ofstream &out = (std::ofstream)out_data;
static int out_initializer::init_count;
void out_initializer::init_out()
{
new (&out_data) std::ofstream("test.dat");
}
void out_initializer::cleanup_out()
{
(&out)->~ofstream();
}
What we’ve done here is to create an array of char named out_data
consisting of enough bytes to hold an object of type
std::ofstream
, told the compiler to pretend that this array
of char is actually an object of type std::ofstream
, and
initialized the reference out
to refer to this chunk of
memory. Since the chunk of memory is defined as an array of char the
compiler won’t generate any code to initialize it5, so we don’t have to worry about multiple
constructors or destructors. That tricky-looking expression at the end
of cleanup_out simply invokes the destructor for ofstream
on the object that we created.
Most of the time you can simply use the built in mechanisms for initialization and cleanup of objects and they’ll do just what you need. Occasionally you may run into a tricky situation where the built in mechanisms don’t provide enough flexibility, or are too flexible. In those cases you need to take control of initialization and cleanup yourself, using flags, local static objects, or counters to keep track of whether an object has been initialized and acting accordingly. None of these techniques is particularly hard to implement. The hard parts are in recognizing that you need something other than what the compiler will give you and in deciding on which of the available techniques is best suited to your application.
Summary of Initialization and Cleanup All Variables What should I initialize it to How should I initialize it When should I initialize it Variables that require scarce resources How should I dispose of its resources When should I dispose of its resources Lifetime and initialization Global static variables are initialized once, at program startup Local static variables are initialized once, when the block in which they are defined is first entered Auto variables are initialized every time the block in which they are defined is entered Cleaning up in C atexit Cleaning up in C++: destructors Destructors for static objects, both global and local, are run at program termination, in reverse order of construction Destructors for auto objects are run when the block in which they are defined is exited by any means, including throwing an exception When the language doesn’t do what you need Use default 0 initialization to flag variables that haven’t been initialized Use a function that returns a reference to a local static object Use the nifty counter trick
1. Older versions of DOS used to limit programs to twenty file handles, and most compilers grabbed five handles for their own use. Having only fifteen handles available made programmers quite careful of how they used file handles.
2. Even the version with the local static can be
inlined. The C++ language definition says that each inline version of
open_out
refers to the same static object. That’s a little
tricky for the compiler to do, so I’d be a bit hesitant to rely on it.
It’s not something I’d avoid, but I’d make sure that it’s on the list of
porting issues for the code.
3. Another possibility is to use compiler-specific mechanisms for controlling order of initialization. This often is easy to do, and if you can do it in a way that guarantees that you’ll get a compile-time error if you change to a compiler that doesn’t support that mechanism you can do it safely. This only postpones the porting issue, though: once you move to another compiler you’ll have to either come up with another nonportable technique for use with that compiler or write code that works with all compilers.
4. In libraries that we write, however, we aren’t constrained by this sort of prior usage. We can design the interface to our libraries to avoid this problem entirely, often by using a function call instead of an object, as we saw earlier. That’s probably a good idea in most cases.
5. The block does get initialized to all zeros, but that’s done before any constructors are run, so we don’t have to worry about it getting in our way.
Copyright © 1999-2006 by Pete Becker. All rights reserved.