A few years ago there was a comic strip called "Alley Oop." No, it wasn’t about polymorphic highways. It was about a caveman. I don’t remember much else about it, but one line always stuck in my mind. Oop at one point said that thinking about some problem reminded him of the guy who tried to eat a piece of ’gator tail: the more he chewed on it, the bigger it got. This discussion of error handling is doing the same thing. My initial plan for this column was to spend the first installment talking about errors, and then to move on to other topics. As I started writing that first installment, I realized that it was going to expand into at least two installments. As I write the second one, it’s clear that the topic is even larger than that, and will spill over into next month as well1. If you like talking about error handling, then that’s probably a good thing. If you don’t like it, don’t worry: it won’t go beyond next month.
Last month we talked about handling errors inside the function that detects them. The three possible approaches are terminating execution, fixing the problem, and ignoring the problem. If none of these is appropriate, then we can’t handle the problem locally and need to report the problem to the calling function. The writer of the calling function then has to make the same choice: handle the problem by terminating execution, fixing it, or ignoring it; or notify that function’s caller. This month we’re going to look at the second alternative, notifying the calling function that something went wrong.
Obviously, once we’ve decide to notify the calling function that something went wrong, we have to decide how to notify it. There are a number of techniques in use, and a bit later on we’ll talk about several of them. Before doing that, though, we need to give some thought to what factors we ought to consider in choosing our reporting technique. This is in part an architectural decision, and may be dictated by decisions already made by the system architects. In particular, if we’re writing a subsystem for the application and need to report a problem to another subsystem, the reporting mechanism has probably already been designed as part of the system architecture. However, if we’re reporting errors within the subsystem that we’re writing we have a great deal more flexibility in our choice of technique. We can choose a technique that’s different from the technique used to communicate problems between subsystems, so long as we keep things within our own subsystem and follow the prescribed conventions when we report problems to other subsystems.
The first thing to think about is how much information we need to
convey to the caller. Sometimes all that our function needs to do is
tell the caller that it could not do what it was asked to do. The
standard C function malloc
does this: when
malloc
is unable to allocate the amount of memory that was
requested there are a number of possible reasons for the failure, but
malloc
simply reports that it failed. That’s probably the
right choice: giving the caller more information probably wouldn’t help
in recovering from the error. On the other hand, if we were writing a
function that took a file name as an argument, opened the file,
allocated a 30 byte buffer, and read 30 bytes from the file into the
buffer, it’s important to the caller to know whether a failure occurred
because the function could not open the file, because it could not
allocate the buffer, or because it couldn’t read 30 bytes from the file.
If our function does not indicate which of these was the problem then
our caller will have a much more difficult job if it has to try to fix
the problem.
A word of caution, however: it’s easy to overdesign an error handling system, and end up producing something that’s so complicated that users of your code will be tempted to ignore the error handling rather than figure out how to use it correctly. Don’t give in to the temptation to produce the ultimate error handling system: that’s almost certainly more than you need. Instead, consider what information is needed by the caller to properly handle the error condition that you are reporting. Provide all the information that’s needed, and nothing more.
To see this more clearly, let’s look at malloc
in a bit
more detail. Its job is to allocate a block of memory of the requested
size. It’s possible to implement malloc
so that it first
checks the requested size against some maximum allowed size, and fails
if the request is too large. If the request is for an allowable size,
malloc
then looks through the memory blocks that it has
available to find one that’s large enough to satisfy the request. If
none is found, malloc
fails. The specification for
malloc
does not distinguish between these two cases: both
result in returning a null pointer. It would be possible to invent a
more sophisticated error reporting mechanism for malloc
,
one that would indicate which of these cases caused the failure. From a
user’s perspective, however, this is rarely important. What matters is
simply that malloc
was unable to allocate the amount of
memory requested, and that’s all that should be reported to the
caller2.
The key to designing an error handling system is understanding the context that it will be used in. Provide all the information that’s needed by the caller, and nothing more.
Once you’ve decided what information you want to transmit when an error occurs, you need to decide how to transmit that information. There are three basic techniques in common use: return a value that indicates that an error occurred, set a flag for the caller to check, or transfer execution to somewhere other than the normal return point. We’ll talk about the last of these next time around. This month we’re going to look at how to convey error information without disrupting the normal flow of the application.
By far the most common technique for indicating that an error occurred is for a function to return a value that the caller recognizes as an error indicator. Although this seems like a simple technique, it’s actually quite a bit more complicated than it looks.
The minimal form of this technique is for a function to return a
boolean value that simply indicates success or failure. The standard C
function fclose
does this: it returns 0 on success, and
EOF
on failure. If you’re rigorously checking for failures
in your code, calls to fclose
should look something like
this:
if (fclose(fp))
/* handle error here */
This technique is fairly easy to use. The only thing to watch out for
is getting the sense of the test right: for fclose
, a
non-zero return value indicates failure; for some other function a zero
return value might indicate failure, requiring that the result be
negated in the test. In particular, in C++, a return type of
bool
as an indicator of success will require that the
result be negated:
bool do_something();
if (!do_something())
/* handle error here */
Don’t underestimate the possibility of coding mistakes with something
as simple as this. Make sure that the documentation for your function
clearly spells out what the error indication is. For example, in
describing fclose
, the C standard says
The fclose function returns zero if the stream was successfully closed, or EOF if any errors were detected.
The documentation for our hypothetical do_something
should be equally clear:
The do_something function returns true if something was done, or false if nothing was done.
Another common technique when a function returns a number of useful
values is to designate a special value to indicate failure. Probably the
most commonly used function that does this is malloc
: it
returns a pointer to the memory block that it allocated, or a null
pointer if it was unable to satisfy the allocation request. Most of the
time, interpreting the return value from such a function is no more
complicated than interpreting the return value from a function that
returns a boolean success indicator:
if ((ptr = malloc(100)) == NULL)
/* handle error here */
There’s a danger in this approach, however: it’s possible that the
special value that we’ve chosen is actually a legitimate value for the
function to return. For example, the Win32 API defines a function,
SetFilePointer
, that sets a new read/write location in a
file. Its prototype looks like this3:
unsigned SetFilePointer(HANDLE file,
long dist, long *hdist, unsigned dir);
It takes a 64 bit value, in a slightly peculiar form, as the offset
to seek to. The dist
argument is the low 32 bits of the
argument, and the hdist
argument points to a long that
holds the high 32 bits of the argument. If the hdist
argument is a null pointer then the high 32 bits are all zeroes. If the
function fails it returns 0xFFFFFFFF. If it succeeds it returns the low
32 bits of the resulting offset, and puts the high 32 bits of the
resulting offset in the location pointed to by hdist
if
hdist
is not NULL. The trouble with this is that
0xFFFFFFFF can also be the correct result for a successful call. That
is, getting back a value of 0xFFFFFFFF does not mean that the seek
failed, only that it might not have succeeded. You have to then call
GetLastError
to determine whether this value actually means
that an error occurred. We’ll look at the idea behind
GetLastError
in a little more detail in a moment. For now,
though, the point is that 0xFFFFFFFF does not absolutely indicate that
an error occurred. On the other hand, getting a return value other than
0xFFFFFFFFF tells you that the call did succeed.
In a case like this, where the special value that indicates that an error occurred isn’t really special, checking for errors is more complicated:
if (GetFilePointer(hnd, 128, NULL, FILE_BEGIN)
== 0xFFFFFFFF &&
GetLastError() != NO_ERROR)
/* handle error here */
Aside from this complication, returning a special value is pretty much the same as returning a boolean value: it tells you whether an error occurred, but doesn’t tell you anything about what the error was. Just as with a boolean value, you must be sure that users of your code know what the return code for an error is, so they can test for it properly.
A natural extension to both of these techniques is to define more
than one error value, and return the value that most closely describes
what actually went wrong. This isn’t much different from what we talked
about earlier: instead of using a boolean value to indicate failure, use
an integer. If you’re overloading the return value to provide actual
data as well as an error indication, like malloc
does, it’s
a bit harder. You have to come up with a few more invalid values to use
as error flags. That often doesn’t require doing anything tricky, but
you have to think about it carefully. In the case of
malloc
, for example, many implementations align pointers
returned by malloc
to 8-byte boundaries. This means that
the low three bits in the pointer are available for indicating bad
values. I don’t recommend that you play this sort of game with pointers
in your own code, however. The point is that with a bit of creativity
you can find values that are easily distinguished from valid ones, and
use those values to indicate that an error has occurred. On the other
hand, in some cases it might be even easier to change the design a bit
and use the return value solely to indicate success or failure, and give
the actual results back to the caller in some other way.
From the caller’s perspective, calling a function that can return any
of several values to indicate a failure is a bit more complicated than
calling a function that has only one failure indicator. Instead of using
an if
statement as we did above, we may need to use a
switch
statement. For example, in the Java library that
I’ve been working on for the past year, there’s a thread support package
that’s written in C. A function like sleep
that is supposed
to pause execution of the calling thread for the time specified in its
argument can return early if some other thread calls
interrupt
on the sleeping thread. In that case,
sleep
is supposed to throw an exception. This means that
the C code that implements sleep
has to be able to indicate
that the call succeeded, that it failed for any of several reasons that
aren’t relevant here, or that it failed to pause for the requested time
because it was interrupted. The Java code that calls the C code
interprets the return code and throws the appropriate exception:
public static void sleep(long millis, int nanos)
throws InterruptedException, UnknownError
{ // block calling thread for specified time
switch (NativeThread.threadSleep(currentThread().mth,
millis, nanos))
{
case INTERRUPTED:
throw new InterruptedException();
case UNKNOWN_ERROR:
throw new UnknownError();
}
}
If you’re considering providing multiple error codes you should also
try to provide a simple way for the calling code to simply determine
that an error occurred, in case the details are not important to the
caller. In the case of threadSleep
, above, there is only
one success code, so checking for success would be easy:
if (NativeThread.threadSleep(currentThread().mth,
millis, nanos) != SUCCESS)
/* handle error here */
If your function also has several possible success indicators things get a bit more complicated. In a function that simply counts items a valid count will never be less than zero, so you can use negative values to indicate errors:
if (count(data_items) < 0)
/* handle error here */
Try to avoid making your return codes so detailed that they require your users to write complicated code to interpret them. If your documentation tells users that the values -1, - 2, -3, 7, 14, and 19 indicate errors and that any other value indicates success, you’ll find your e-mail account flooded with complaints4. Make it as easy as you can for users of your code to recognize error reports.
A slightly more complicated technique for embedding error information
in a return value is to return the number of items that were
successfully handled. From the caller’s perspective, this means checking
whether the return value is equal to the number of items that should
have been handled. For example, fread
, from the standard C
library, works this way:
struct data_item
{
/* your data goes here */
}
#define DSIZE 20
struct data_item data[DSIZE];
if (fread(&data, sizeof(data_item), DSIZE, fp) != DSIZE)
/* handle error here */
Many of the functions in the standard C library use this technique5.
Sometimes the best way to indicate that an error occurred is to set a
flag for the caller to check later. For example, the class
java.io.PrintWriter
in Java does this.
PrintWriter out = new PrintWriter(
new BufferedWriter(
new FileWriter(FileDescriptor.out)));
out.print(3);
The last line converts the value 3 into a string and pushes the
characters from that string one at a time into the
BufferedWriter
that writes blocks of characters to the
FileWriter
that sends them to stdout
6. Both BufferedWriter
and
FileWriter
throw exceptions if an attempted write fails.
The print
method in PrintWriter
catches any
exceptions that are thrown, and sets an internal flag to indicate that
an error occurred. It does not rethrow the exception. If we want to find
out whether we succeeded in writing 3 to stdout
we have to
ask the PrintWriter
object:
if (out.checkError())
/* handle error here */
Note that this separates error checking from the operation that might
have caused the error. This lets you perform a series of operations
without checking for errors, and simply check at the end whether they
all succeeded. For routine operations such as writing information to
stdout
this is usually appropriate:
out.print("Hello, ");
out.println("world!");
if (out.checkError())
/* handle error here */
The call to checkError
will indicate whether something
went wrong in the calls to print
and println
,
but it won’t indicate which one of them failed. In fact, most of us
wouldn’t bother checking for success at all in this case. But the flag
is there if we want to look at it7.
Another error flag that most of us are probably more familiar with is
errno
. Many of the functions in the standard C library
indicate that an error occurred by setting errno
to a
non-zero value. You can check for errors by looking at
errno
, but there are a couple of things you have to watch
out for. First, the value of errno
is set to 0 when your
program starts executing, but no function ever sets it back to 0. This
means that when you’re going to call a function and you want to use
errno
to tell you whether the call succeeded, you must set
errno
to 0 before the call. Like this:
#include <errno.h>
#include <math.h>
#include <stdio.h>
void show_log(double val)
{
errno = 0;
double d = log(val);
if (errno != 0)
perror("Log error");
else
printf("%f\n", d);
}
Note that I’ve used the standard function perror
to
display an error message when log sets errno
to a non-zero
value. You can also use strerror
to get a C-style string
describing the error that occurred.
errno
works just like the error flag in
PrintStream
that we looked at earlier, in that you can
postpone checking for errors until you’ve done several function calls.
If any one of them failed, errno
will contain the error
code for the last one that failed. Be careful if you do this, though:
there’s a great deal of variation among compilers as to which functions
set errno
and which ones don’t. That’s because the
specification for errno
is a bit fuzzy in the C standard.
It says that some functions must set errno
when an error
occurs, and the specifications for those functions indicate what values
they will set errno
to for the various errors that may
occur. However, it also allows other functions to set
errno
, and it allows implementations to provide values in
addition to those required by the standard. In practice, this means that
you must check your compiler’s documentation if you’re going to use
errno
to detect errors. Otherwise you won’t know which
functions can set it, and you won’t know what the various values can
be8.
A flag like errno
can also be used to supplement the
information that a return value provides. Rather than cobble up an
incomprehensible scheme for compressing several dozen bits of error
information into a return value, you might want to stick to using a
single value to indicate an error, and allowing the caller to look
elsewhere if more information is needed. By separating detection of the
error from reporting the details of the error, you make it possible for
callers of your function to write simpler code when they are not
concerned with the details of what went wrong, without depriving callers
who need more data of the information that they need.
As we saw earlier, Win32 provides something similar with the function
GetLastError
. It’s a lot like errno
, in that
when you call it you get back a number that indicates what the most
recent error was in the thread that you called it from. This gives the
calling function a great deal of flexibility in handling errors reported
by system calls. If it doesn’t need to try to recover from the error it
doesn’t need to look at the detailed information:
HANDLE hnd;
if ((hnd = GetStdHandle(STD_OUTPUT_HANDLE))
== INVALID_HANDLE_VALUE)
exit(1);
If it needs to do more, it can:
HANDLE hnd;
if ((hnd = GetStdHandle(STD_OUTPUT_HANDLE))
== INVALID_HANDLE_VALUE)
{
void *msg;
FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER
| FORMAT_MESSAGE_FROM_SYSTEM,
NULL,
GetLastError(),
0,
&msg,
0,
NULL);
printf("%s\n", (char *)msg);
exit(1);
}
In general, this is a good approach9. Make it as easy as you can for functions that call your code to recognize that an error occurred. This makes it much more likely that the writer of the calling function will take the time to check for errors. That, in turn, will make your application much more robust.
Next month we’ll bring this discussion of error handling to an end when we talk about callbacks, signals, longjmp, and exceptions.
1. Of course, none of us has ever run into this sort of problem with our software projects.
2. Some memory-intensive applications require more
sophisticated error reporting for memory management problems. The point
here is not that such information is never needed, but that it should
not be provided unless it is needed. As specified, malloc
may not be the right tool for memory-intensive applications. That does
not mean that the specification for malloc
should be
changed. Rather, it means that memory- intensive applications should not
rely solely on malloc
for memory management.
3. I’ve removed the Windowsisms to make the code
more recognizable to ordinary C programmers. In fact,
SetFilePointer
deals in HANDLE
s,
DWORD
s, LONG
s, and PLONG
s.
4. An extreme example of this occurs in Microsoft’s OLE or COM or ActiveX (What do you want to call it today?). Many of the support functions return a 32-bit value which you then decode with a macro to figure out whether the call succeeded.
5. Without looking it up, what does the return
value of printf
tell you? Can you do anything useful with
this?
6. It’s not actually necessary to go through all
this to write to stdout
in Java. You can call
print
on the object System.out
instead.
7. Java requires that every function that calls a
function that throws an exception must either enclose that call in a
try...catch
block to handle that exception, or declare
itself as throwing that exception, too. That’s a major annoyance when
all you’re trying to do is display a debugging message on the console,
so PrintWriter
wisely hides any exceptions from you when
you use it.
8. There’s another issue with errno
:
since the C standard doesn’t deal with multi-threaded environments,
errno
is specified simply as a global value. When you have
multiple threads this obviously doesn’t work: another thread can change
the value of errno
before you get a chance to look at it.
The solution is equally obvious: there has to be a separate
errno
for each thread. That’s actually a minor issue: every
compiler that I know of that supports multiple threads provides a
per-thread errno
.
9. That is, separating the error indicator itself from the details of the error. The amount of noise involved in using Win32, on the other hand, is a significant distraction.
Copyright © 1999-2006 by Pete Becker. All rights reserved.