More on Error Handling

The C/C++ Users Journal, January, 1999

A few years ago there was a comic strip called "Alley Oop." No, it wasn’t about polymorphic highways. It was about a caveman. I don’t remember much else about it, but one line always stuck in my mind. Oop at one point said that thinking about some problem reminded him of the guy who tried to eat a piece of ’gator tail: the more he chewed on it, the bigger it got. This discussion of error handling is doing the same thing. My initial plan for this column was to spend the first installment talking about errors, and then to move on to other topics. As I started writing that first installment, I realized that it was going to expand into at least two installments. As I write the second one, it’s clear that the topic is even larger than that, and will spill over into next month as well1. If you like talking about error handling, then that’s probably a good thing. If you don’t like it, don’t worry: it won’t go beyond next month.

Last month we talked about handling errors inside the function that detects them. The three possible approaches are terminating execution, fixing the problem, and ignoring the problem. If none of these is appropriate, then we can’t handle the problem locally and need to report the problem to the calling function. The writer of the calling function then has to make the same choice: handle the problem by terminating execution, fixing it, or ignoring it; or notify that function’s caller. This month we’re going to look at the second alternative, notifying the calling function that something went wrong.

Obviously, once we’ve decide to notify the calling function that something went wrong, we have to decide how to notify it. There are a number of techniques in use, and a bit later on we’ll talk about several of them. Before doing that, though, we need to give some thought to what factors we ought to consider in choosing our reporting technique. This is in part an architectural decision, and may be dictated by decisions already made by the system architects. In particular, if we’re writing a subsystem for the application and need to report a problem to another subsystem, the reporting mechanism has probably already been designed as part of the system architecture. However, if we’re reporting errors within the subsystem that we’re writing we have a great deal more flexibility in our choice of technique. We can choose a technique that’s different from the technique used to communicate problems between subsystems, so long as we keep things within our own subsystem and follow the prescribed conventions when we report problems to other subsystems.

The first thing to think about is how much information we need to convey to the caller. Sometimes all that our function needs to do is tell the caller that it could not do what it was asked to do. The standard C function malloc does this: when malloc is unable to allocate the amount of memory that was requested there are a number of possible reasons for the failure, but malloc simply reports that it failed. That’s probably the right choice: giving the caller more information probably wouldn’t help in recovering from the error. On the other hand, if we were writing a function that took a file name as an argument, opened the file, allocated a 30 byte buffer, and read 30 bytes from the file into the buffer, it’s important to the caller to know whether a failure occurred because the function could not open the file, because it could not allocate the buffer, or because it couldn’t read 30 bytes from the file. If our function does not indicate which of these was the problem then our caller will have a much more difficult job if it has to try to fix the problem.

A word of caution, however: it’s easy to overdesign an error handling system, and end up producing something that’s so complicated that users of your code will be tempted to ignore the error handling rather than figure out how to use it correctly. Don’t give in to the temptation to produce the ultimate error handling system: that’s almost certainly more than you need. Instead, consider what information is needed by the caller to properly handle the error condition that you are reporting. Provide all the information that’s needed, and nothing more.

To see this more clearly, let’s look at malloc in a bit more detail. Its job is to allocate a block of memory of the requested size. It’s possible to implement malloc so that it first checks the requested size against some maximum allowed size, and fails if the request is too large. If the request is for an allowable size, malloc then looks through the memory blocks that it has available to find one that’s large enough to satisfy the request. If none is found, malloc fails. The specification for malloc does not distinguish between these two cases: both result in returning a null pointer. It would be possible to invent a more sophisticated error reporting mechanism for malloc, one that would indicate which of these cases caused the failure. From a user’s perspective, however, this is rarely important. What matters is simply that malloc was unable to allocate the amount of memory requested, and that’s all that should be reported to the caller2.

The key to designing an error handling system is understanding the context that it will be used in. Provide all the information that’s needed by the caller, and nothing more.

Once you’ve decided what information you want to transmit when an error occurs, you need to decide how to transmit that information. There are three basic techniques in common use: return a value that indicates that an error occurred, set a flag for the caller to check, or transfer execution to somewhere other than the normal return point. We’ll talk about the last of these next time around. This month we’re going to look at how to convey error information without disrupting the normal flow of the application.

By far the most common technique for indicating that an error occurred is for a function to return a value that the caller recognizes as an error indicator. Although this seems like a simple technique, it’s actually quite a bit more complicated than it looks.

Returning a Boolean Value

The minimal form of this technique is for a function to return a boolean value that simply indicates success or failure. The standard C function fclose does this: it returns 0 on success, and EOF on failure. If you’re rigorously checking for failures in your code, calls to fclose should look something like this:

if (fclose(fp))
    /* handle error here */

This technique is fairly easy to use. The only thing to watch out for is getting the sense of the test right: for fclose, a non-zero return value indicates failure; for some other function a zero return value might indicate failure, requiring that the result be negated in the test. In particular, in C++, a return type of bool as an indicator of success will require that the result be negated:

bool do_something();

    if (!do_something())
        /* handle error here */

Don’t underestimate the possibility of coding mistakes with something as simple as this. Make sure that the documentation for your function clearly spells out what the error indication is. For example, in describing fclose, the C standard says

The fclose function returns zero if the stream was successfully closed, or EOF if any errors were detected.

The documentation for our hypothetical do_something should be equally clear:

The do_something function returns true if something was done, or false if nothing was done.

Returning a Special Value

Another common technique when a function returns a number of useful values is to designate a special value to indicate failure. Probably the most commonly used function that does this is malloc: it returns a pointer to the memory block that it allocated, or a null pointer if it was unable to satisfy the allocation request. Most of the time, interpreting the return value from such a function is no more complicated than interpreting the return value from a function that returns a boolean success indicator:

if ((ptr = malloc(100)) == NULL)
    /* handle error here */

There’s a danger in this approach, however: it’s possible that the special value that we’ve chosen is actually a legitimate value for the function to return. For example, the Win32 API defines a function, SetFilePointer, that sets a new read/write location in a file. Its prototype looks like this3:

unsigned SetFilePointer(HANDLE file,
    long dist, long *hdist, unsigned dir);

It takes a 64 bit value, in a slightly peculiar form, as the offset to seek to. The dist argument is the low 32 bits of the argument, and the hdist argument points to a long that holds the high 32 bits of the argument. If the hdist argument is a null pointer then the high 32 bits are all zeroes. If the function fails it returns 0xFFFFFFFF. If it succeeds it returns the low 32 bits of the resulting offset, and puts the high 32 bits of the resulting offset in the location pointed to by hdist if hdist is not NULL. The trouble with this is that 0xFFFFFFFF can also be the correct result for a successful call. That is, getting back a value of 0xFFFFFFFF does not mean that the seek failed, only that it might not have succeeded. You have to then call GetLastError to determine whether this value actually means that an error occurred. We’ll look at the idea behind GetLastError in a little more detail in a moment. For now, though, the point is that 0xFFFFFFFF does not absolutely indicate that an error occurred. On the other hand, getting a return value other than 0xFFFFFFFFF tells you that the call did succeed.

In a case like this, where the special value that indicates that an error occurred isn’t really special, checking for errors is more complicated:

if (GetFilePointer(hnd, 128, NULL, FILE_BEGIN)
        == 0xFFFFFFFF &&
    GetLastError() != NO_ERROR)
        /* handle error here */

Aside from this complication, returning a special value is pretty much the same as returning a boolean value: it tells you whether an error occurred, but doesn’t tell you anything about what the error was. Just as with a boolean value, you must be sure that users of your code know what the return code for an error is, so they can test for it properly.

Returning an Enumeration

A natural extension to both of these techniques is to define more than one error value, and return the value that most closely describes what actually went wrong. This isn’t much different from what we talked about earlier: instead of using a boolean value to indicate failure, use an integer. If you’re overloading the return value to provide actual data as well as an error indication, like malloc does, it’s a bit harder. You have to come up with a few more invalid values to use as error flags. That often doesn’t require doing anything tricky, but you have to think about it carefully. In the case of malloc, for example, many implementations align pointers returned by malloc to 8-byte boundaries. This means that the low three bits in the pointer are available for indicating bad values. I don’t recommend that you play this sort of game with pointers in your own code, however. The point is that with a bit of creativity you can find values that are easily distinguished from valid ones, and use those values to indicate that an error has occurred. On the other hand, in some cases it might be even easier to change the design a bit and use the return value solely to indicate success or failure, and give the actual results back to the caller in some other way.

From the caller’s perspective, calling a function that can return any of several values to indicate a failure is a bit more complicated than calling a function that has only one failure indicator. Instead of using an if statement as we did above, we may need to use a switch statement. For example, in the Java library that I’ve been working on for the past year, there’s a thread support package that’s written in C. A function like sleep that is supposed to pause execution of the calling thread for the time specified in its argument can return early if some other thread calls interrupt on the sleeping thread. In that case, sleep is supposed to throw an exception. This means that the C code that implements sleep has to be able to indicate that the call succeeded, that it failed for any of several reasons that aren’t relevant here, or that it failed to pause for the requested time because it was interrupted. The Java code that calls the C code interprets the return code and throws the appropriate exception:

public static void sleep(long millis, int nanos)
        throws InterruptedException, UnknownError
    {   // block calling thread for specified time
    switch (NativeThread.threadSleep(currentThread().mth,
        millis, nanos))
        case INTERRUPTED:
            throw new InterruptedException();
        case UNKNOWN_ERROR:
            throw new UnknownError();

If you’re considering providing multiple error codes you should also try to provide a simple way for the calling code to simply determine that an error occurred, in case the details are not important to the caller. In the case of threadSleep, above, there is only one success code, so checking for success would be easy:

if (NativeThread.threadSleep(currentThread().mth,
    millis, nanos) != SUCCESS)
    /* handle error here */

If your function also has several possible success indicators things get a bit more complicated. In a function that simply counts items a valid count will never be less than zero, so you can use negative values to indicate errors:

if (count(data_items) < 0)
    /* handle error here */

Try to avoid making your return codes so detailed that they require your users to write complicated code to interpret them. If your documentation tells users that the values -1, - 2, -3, 7, 14, and 19 indicate errors and that any other value indicates success, you’ll find your e-mail account flooded with complaints4. Make it as easy as you can for users of your code to recognize error reports.

Returning Item Counts

A slightly more complicated technique for embedding error information in a return value is to return the number of items that were successfully handled. From the caller’s perspective, this means checking whether the return value is equal to the number of items that should have been handled. For example, fread, from the standard C library, works this way:

struct data_item
    /* your data goes here */

    #define DSIZE 20
    struct data_item data[DSIZE];
    if (fread(&data, sizeof(data_item), DSIZE, fp) != DSIZE)
        /* handle error here */

Many of the functions in the standard C library use this technique5.

Setting Flags

Sometimes the best way to indicate that an error occurred is to set a flag for the caller to check later. For example, the class in Java does this.

PrintWriter out = new PrintWriter(
    new BufferedWriter(
    new FileWriter(FileDescriptor.out)));

The last line converts the value 3 into a string and pushes the characters from that string one at a time into the BufferedWriter that writes blocks of characters to the FileWriter that sends them to stdout6. Both BufferedWriter and FileWriter throw exceptions if an attempted write fails. The print method in PrintWriter catches any exceptions that are thrown, and sets an internal flag to indicate that an error occurred. It does not rethrow the exception. If we want to find out whether we succeeded in writing 3 to stdout we have to ask the PrintWriter object:

if (out.checkError())
    /* handle error here */

Note that this separates error checking from the operation that might have caused the error. This lets you perform a series of operations without checking for errors, and simply check at the end whether they all succeeded. For routine operations such as writing information to stdout this is usually appropriate:

out.print("Hello, ");
if (out.checkError())
    /* handle error here */

The call to checkError will indicate whether something went wrong in the calls to print and println, but it won’t indicate which one of them failed. In fact, most of us wouldn’t bother checking for success at all in this case. But the flag is there if we want to look at it7.

Another error flag that most of us are probably more familiar with is errno. Many of the functions in the standard C library indicate that an error occurred by setting errno to a non-zero value. You can check for errors by looking at errno, but there are a couple of things you have to watch out for. First, the value of errno is set to 0 when your program starts executing, but no function ever sets it back to 0. This means that when you’re going to call a function and you want to use errno to tell you whether the call succeeded, you must set errno to 0 before the call. Like this:

#include <errno.h>
#include <math.h>
#include <stdio.h>

void show_log(double val)
    errno = 0;
    double d = log(val);
    if (errno != 0)
        perror("Log error");
        printf("%f\n", d);

Note that I’ve used the standard function perror to display an error message when log sets errno to a non-zero value. You can also use strerror to get a C-style string describing the error that occurred.

errno works just like the error flag in PrintStream that we looked at earlier, in that you can postpone checking for errors until you’ve done several function calls. If any one of them failed, errno will contain the error code for the last one that failed. Be careful if you do this, though: there’s a great deal of variation among compilers as to which functions set errno and which ones don’t. That’s because the specification for errno is a bit fuzzy in the C standard. It says that some functions must set errno when an error occurs, and the specifications for those functions indicate what values they will set errno to for the various errors that may occur. However, it also allows other functions to set errno, and it allows implementations to provide values in addition to those required by the standard. In practice, this means that you must check your compiler’s documentation if you’re going to use errno to detect errors. Otherwise you won’t know which functions can set it, and you won’t know what the various values can be8.

Providing Additional Information

A flag like errno can also be used to supplement the information that a return value provides. Rather than cobble up an incomprehensible scheme for compressing several dozen bits of error information into a return value, you might want to stick to using a single value to indicate an error, and allowing the caller to look elsewhere if more information is needed. By separating detection of the error from reporting the details of the error, you make it possible for callers of your function to write simpler code when they are not concerned with the details of what went wrong, without depriving callers who need more data of the information that they need.

As we saw earlier, Win32 provides something similar with the function GetLastError. It’s a lot like errno, in that when you call it you get back a number that indicates what the most recent error was in the thread that you called it from. This gives the calling function a great deal of flexibility in handling errors reported by system calls. If it doesn’t need to try to recover from the error it doesn’t need to look at the detailed information:

if ((hnd = GetStdHandle(STD_OUTPUT_HANDLE))

If it needs to do more, it can:

if ((hnd = GetStdHandle(STD_OUTPUT_HANDLE))
    void *msg;
    printf("%s\n", (char *)msg);

In general, this is a good approach9. Make it as easy as you can for functions that call your code to recognize that an error occurred. This makes it much more likely that the writer of the calling function will take the time to check for errors. That, in turn, will make your application much more robust.

Coming Up

Next month we’ll bring this discussion of error handling to an end when we talk about callbacks, signals, longjmp, and exceptions.

1. Of course, none of us has ever run into this sort of problem with our software projects.

2. Some memory-intensive applications require more sophisticated error reporting for memory management problems. The point here is not that such information is never needed, but that it should not be provided unless it is needed. As specified, malloc may not be the right tool for memory-intensive applications. That does not mean that the specification for malloc should be changed. Rather, it means that memory- intensive applications should not rely solely on malloc for memory management.

3. I’ve removed the Windowsisms to make the code more recognizable to ordinary C programmers. In fact, SetFilePointer deals in HANDLEs, DWORDs, LONGs, and PLONGs.

4. An extreme example of this occurs in Microsoft’s OLE or COM or ActiveX (What do you want to call it today?). Many of the support functions return a 32-bit value which you then decode with a macro to figure out whether the call succeeded.

5. Without looking it up, what does the return value of printf tell you? Can you do anything useful with this?

6. It’s not actually necessary to go through all this to write to stdout in Java. You can call print on the object System.out instead.

7. Java requires that every function that calls a function that throws an exception must either enclose that call in a try...catch block to handle that exception, or declare itself as throwing that exception, too. That’s a major annoyance when all you’re trying to do is display a debugging message on the console, so PrintWriter wisely hides any exceptions from you when you use it.

8. There’s another issue with errno: since the C standard doesn’t deal with multi-threaded environments, errno is specified simply as a global value. When you have multiple threads this obviously doesn’t work: another thread can change the value of errno before you get a chance to look at it. The solution is equally obvious: there has to be a separate errno for each thread. That’s actually a minor issue: every compiler that I know of that supports multiple threads provides a per-thread errno.

9. That is, separating the error indicator itself from the details of the error. The amount of noise involved in using Win32, on the other hand, is a significant distraction.