Software Testing

I’d like to do a quick survey. Please raise your hand if you’d be willing to trust your life to the software that your company produces. Just as I expected -- I see a few hands from the folks who work on things like medical devices and nuclear plant controls, and most of the rest of you look like you just bit a lemon. Okay, you can put your hands down. Thanks.

Over the years, our industry has produced a convenient classification system based on how robust an application is expected to be. There are safety-critical applications, which must work correctly or fail safely, and there’s everything else. I call this convenient because it implicitly accepts the idea that making software that works right is hard and, therefore, expensive. Once we accept that, we start to accept the obvious practical consequence -- non-safety-critical applications don’t have to work right, because making them work right costs too much or takes too long. That gives us an easy excuse to avoid looking at the bad habits we’ve developed over the years, the habits that prevent us from producing software that’s reliable and easy to use.

At this point you may be saying that I’m being unfair -- that top management is responsible for poor quality as well, because they insist on producing products despite inadequate budgets and unrealistic ship dates. That’s often true. It’s also irrelevant. There are many things we can do to improve the quality of the code that we produce without increasing the time it takes to produce that code. And I’ll let you in on a little secret of project management: no manager knows how long any project will really take. They rely on us for the raw data for their time projections¹. If you think that you’ll need some extra time to get some details right, add it in. If you expect that your manager won’t allow additional time for testing (one of the first things that harried managers try to cut when a deadline looms), then don’t mention testing. Increase your estimate of the development time that’s needed, and fold the testing into development. If nobody else in your department is spending time testing their work, you’ll eventually get a reputation for being slow but reliable. Sooner or later that pays off -- the number of failures traceable to your code will go down. If enough people do that, the cost of providing technical support for the product will also go down. Even managers who focus only on financial statements will see this change.

In his book "Quality is Free²," Philip Crosby expands on this idea. He’s writing about manufacturing industries, which have a somewhat different set of problems from ours, but with a bit of imagination, what he says can be applied to the software industry as well. If you have an assembly line producing cars, and suddenly the cars start coming off the line with their headlights missing, what do you do? Obviously, you fix the cars that don’t have headlights. But that’s expensive and error prone, so you also go to the work station where the headlights are supposed to be installed and watch for a while, to figure out what’s going wrong at that station. Once you know why the cars aren’t getting their headlights, you can fix the actual problem, and not have to continue treating its symptoms. Manufacturing costs can be reduced by fixing the production process instead of fixing the products.

Software developers also need to look at how to fix their development process. We typically produce something that we’re pretty sure works, and we hand it over to the testing group and hope that they don’t find any serious problems. Of course, they do, and we have to drop whatever it is that we’re now working on, and go back to that code that we thought we were finished with, figure out what we were thinking about when we wrote it, track down where the failures are coming from, fix the problems, and start the cycle again. Eventually we reach a point where everyone involved agrees that the software is good enough to ship, that is, that the problems we know about aren’t severe enough to embarrass us, and that we’re pretty sure that there aren’t any major problems that we’re not aware of still lurking in the software. At this point our software has spent a great deal of time going back and forth between development and testing, and nobody has tried to figure out why the headlights didn’t get installed.

Tom DeMarco, in his book "Controlling Software Projects³," has several suggestions for overcoming this problem. One of the simplest to implement is to stop referring to software problems as "bugs". Bugs are little things that come crawling in from the woodwork. They’re always around, and nobody can get rid of all of them. That’s why many companies today call software problems "defects" -- it’s much more personal. Defects are introduced by the person who wrote the code. Take that to heart: if your code doesn’t work right, it’s your fault. In addition to fixing the code, you need to figure out why the problem occurred, and fix the process. If you want to write code with fewer defects, change the way you write software.

For example, suppose you get a defect report that your application crashes when someone tries to print a report and there is no data in the database. You track the problem down to a loop that iterates through the records in the database:


do {
    showCurrentRecord();
    nextRecord();
    } while (currentRecord != NULL);

The failure here, obviously, comes from trying to show the current record before checking whether it exists. That’s easy enough to fix, but once you’ve fixed it, you need to ask what you can do to prevent this sort of thing from happening in the future. One common approach is to write functions like showCurrentRecord so that they fail safely. That is, showCurrentRecord could be written like this:


void showCurrentRecord()
    {
    if (currentRecord != NULL)
        {   // display contents of current record
        }
    }

In general, that sort of thing is a bad idea. If you’re tempted to write code like this, you ought to consider changing the name of the function as well, to something like showCurrentRecordIfItExists. Now, don’t misunderstand me, I don’t object to putting checks in critical pieces of code to ensure that they aren’t being misused. But if the reason for these checks is to catch coding errors then they should complain loudly when such errors occur:


void showCurrentRecord()
    {
    assert(currentRecord != NULL);
    // display contents of current record
    }

If the reason for these checks is to catch errors by the user of the program, then they should provide enough information for the user to fix the problem. In the case of coding errors, it’s more important to fix the problem at the point where it actually occurred than to recover from it when it produces contradictory information. In the code above, you wrote a loop but got the entry conditions wrong. A good way to avoid this sort of problem is to develop the habit of always thinking about the edge conditions whenever you write a loop. If you always ask yourself "what’s the smallest number of times that this loop can execute" and "what’s the largest," and check that the code works right for both of these cases, you can avoid many looping problems. Once you have 0 records working correctly and you have the maximum allowable number of records working correctly, you can be fairly sure that values in between will also work correctly. So take the time to check: if there are no records in the database, does this loop work right? What if there are 100,000,000?

That doesn’t necessarily mean that you have to write test code that exercises this loop when there are no records and when there are 100,000,000 records. It does mean, though, that at a minimum you should mentally simulate the operation of the loop for each of those extremes. Think in terms of test cases: "if there are no records in the database, the loop will call showCurrentRecord when there is no current record. Hmm, that’s not right. I’d better rewrite it." Go through that process for every loop that you write. It takes a little longer to write code this way, but if you do this consistently you’ll spend far less time fixing mistakes.

DeMarco has another suggestion: developers shouldn’t use compilers. When they feel that the code is finished they turn it over to the testing folks, who begin testing by compiling it. If the code fails to compile it’s defective. Now, I suspect that he had his tongue in his cheek when he wrote this, but it makes a good point about our development processes. I tend to get a little sloppy when I write code, and I rely on the compiler to catch silly mistakes in capitalization and misspellings. That’s a bad habit, and it occasionally leads to defects that the compiler doesn’t catch. I do a better job when I check for and fix silly mistakes myself. It’s not the compiler’s job to make up for my sloppiness.

Another thing you can do to improve your code is to write your own test suite. It won’t be as comprehensive as the suite produced by the professionals in your test group, but it will give you a chance to catch serious errors before they consume other people’s time. In order to do this well, however, you have to understand the rudiments of testing. Testing is in some ways much more difficult than writing code. Some managers make the mistake of thinking that testing is something that junior engineers should do while they’re learning to be real developers, but most development managers understand that testing is a profession in its own right, and not something that can be done by harried developers in their spare time.

Developers have a hard time writing test code because they don’t have the right perspective. Their focus is on creating something that works. A good tester’s focus is on finding things that don’t work. It’s very hard to change attitudes in the middle of a job, so most developers write tests that exercise the main line of their code for a few simple cases, and if that works they figure they’re finished. That’s wrong. A good tester knows that the purpose of testing is to find problems. A test that doesn’t find problems is a failure. If you’re going to test your own code you must develop that attitude. One thing that helps here is to write your test code before you start writing the actual code. That helps you maintain your objectivity, since you don’t have any actual code at stake at the moment.

Writing good test cases is hard. There are a bunch of books available about testing, but many of them put a great deal of emphasis on techniques for integrating testing into the development process. That’s fine for a professional tester and for a manager of a development group, but for those of us whose primary responsibility is producing code, that’s not particularly important. We need to know how to write effective tests. I’ll give an overview of this subject here, and if you want to know more, take a look at "The Art of Software Testing," by Glenford Myers⁴.

There are two general approaches to software testing: black box testing and white box testing. In black box testing we don’t look at the details of the software. We write tests that look for failure to meet the requirements in the specification. In white box testing, on the other hand, we look at the code that we wrote in order to identify things that are likely to fail. In both cases, we’re trying to develop a suite of tests that will identify failures in our code. The difference between the two is in how we decide what to look for. The two approaches are complementary. Test cases developed through black box testing identify missing and incorrectly implemented features. White box testing identifies coding errors, including those in undocumented features.

In using the black box approach to write test code for the memory managers that we looked at last month, we begin by looking at the specification, which calls for two functions, one to allocate a block of a predetermined size, and one to release it for possible reuse. That’s a pretty simple specification, and the simple test cases that I included with the code last month, which simply allocate a block and then free it, pretty much cover it, to the extent that they can be covered without additional information. I’d generally add a stress test to these simple touch tests, however, to see if the code fails under heavier use. For memory managers I like to create three intertwined linked lists, then free all of the elements of one list, recreate the list, and move on to the next list. If I were testing malloc and free the code would look something like this:


#define MAX_LEN 10000

typedef struct node
    {
    struct node *next;
    int id;
    };

struct node *create(void)
    {
    int i;
    struct node *head = NULL;
    for (i = 0; i < MAX_LEN; ++i)
        {
        struct node *new_node = malloc(sizeof(struct node));
        if (new_node == NULL)
            return head;
        new_node->id = i;
        new_node->next = head;
        head = new_node;
        }
    return head;
    }

void destroy(struct node *list)
    {
    while (list != NULL)
        {
        struct node *temp = list;
        list = list->next;
        free(temp);
        }
    }

struct node *add(struct node *list)
    {
    static int count = 0;
    struct node *new_node = malloc(sizeof(struct node));
    if (new_node == NULL)
        return list;
    new_node->next = list;
    new_node->id = count++;
    return new_node;
    }

int main()
    {
    struct node *list1 = NULL;
    struct node *list2 = NULL;
    struct node *list3 = NULL;
    int i;
    for (i = 0; i < MAX_LEN; ++i)
        {
        list1 = add(list1);
        list2 = add(list2);
        list3 = add(list3);
        }
    for(i = 0;; ++i)
        {
        destroy(list1);
        list1 = create();
        destroy(list2);
        list2 = create();
        destroy(list3);
        list3 = create();
        if (i % 100 == 0)
            putch('.');
        }
    return 0;
    }

The biggest drawback to a test like this is that it when it detects a failure it doesn’t tell you what went wrong. It simply crashes. But there’s not a lot to work with in the spec that we have here, so this is about the best we can do from a black box perspective. If we were testing something like sprintf we’d test each of the format specifiers and we could produce much more useful error messages.

When we do white box testing we have a great deal more structural information to work with, and we can often produce much more tightly focused tests. For example, one of the memory managers that we wrote last month imposed a limit of 1000 elements on its internal free list. When a long sequence of releases filled that free list, subsequent blocks would be returned to the underlying C library’s heap instead of being held by our memory manager. That gives us an obvious point to test: allocations and releases when there are about 1000 elements in the free list. The code to test it might look like this:


#include "mm.h"

int main()
    {
    list_element *blocks[1010];
    int i;
    for (i = 0; i < 1010; ++i)
        if ((blocks[i] = alloc_element()) == NULL)
            {
            printf("Unable to allocate block %d\n", i);
            exit(EXIT_FAILURE);
            }
    printf("Initial allocation complete\n");
    for (i = 0; i < 1000; ++i)
        free_block(blocks[i]);
    printf("Freed 1000 blocks\n");
    exhaust_memory();
    printf("Exhausted available memory\n");
    for (i = 0; i < 1000; ++i)
        if ((blocks[i] = alloc_element()) == NULL)
            {
            printf("Unable to allocate block %d\n", i);
            exit(EXIT_FAILURE);
            }
    printf("Test complete\n");
    exit(EXIT_SUCCESS);
    }

Now, you probably noticed that I’ve relied on a magic function named exhaust_memory here. That’s one of the complications that you run into when testing memory managers: they’re designed to be redundant, and in order to check for failures you have to eliminate that redundancy. That can be tricky to do without modifying the code that you’re testing. Professional testers know how to handle this sort of problem, but we don’t have their specialized tools, so we need to write our own. In this case I’d modify our memory manager to call malloc and free through two wrapper functions:


static int exhausted = 0;

void exhaust_memory(void)
    {
    exhausted = 1;
    }

void *test_malloc(size_t sz)
    {
    return exhausted ? NULL : malloc(sz);
    }

void test_free(void *ptr)
    {
    free(ptr);
    }

Now we can test how our memory manager responds when the underlying system runs out of memory. We can also add code to these hook functions, as needed, for more sophisticated tests.

Another thing that professional testers do but software developers often neglect is writing test code that tells the user what the result was. For example, if you haven’t given it much thought, you might write a simple test for the strcat function like this:


#include <string.h>
#include <stdio.h>

int main()
    {
    char buf[30] = "Hello, ";
    printf("%s\n", strcat(buf, "world"));
    return 0;
    }

You might know perfectly well what the correct result is when you write this code, but six months from now, will you know whether


Hello, world

indicates success or failure? Preserve what you know at the time you write the test case, and make it easy to interpret the result:


#include <string.h>
#include <stdio.h>
#include <stdlib.h>

#define EXPECTED "Hello, world"

int main()
    {
    char buf[30] = "Hello, ";
    strcat(buf, "world");
    if (strcmp(buf, EXPECTED) != 0)
        {
        printf("Error (strcat): got `%s’, expected `%s’\n",
            buf, EXPECTED);
        return EXIT_FAILURE;
        }
    else
        {
        printf("No failure detected (strcat)\n");
        return EXIT_SUCCESS;
        }
    }

In addition to telling you the result of the test, this program returns an exit code that a batch file or shell script can use to determine whether an error occurred. That’s important because it allows us to automate our tests. If you have to run all of your tests by hand there will come a time when you feel overwhelmed by the amount of work you have to do, and you won’t bother doing all that boring hand testing. That will probably be the time when you’ve introduced a serious defect into the code, and you’ll turn your untested code over to the testing folks who will gleefully bounce it back to you with a handful of defect reports. Protect yourself from this by automating all of your tests, so that they are easy to run and the results are easy to interpret.

Professional testers have a large toolkit that they use in designing and implementing tests. They do coverage analysis to understand how much of the code has actually been tested, they have replacement memory managers to simulate low memory conditions, they track locations of defects in order to identify problems areas in the code⁵, and so on. These are all things that we don’t need to do when we’re writing code. Still, we need to know something about their techniques so that we can use some of them ourselves, and so that we can understand what they are talking about when they tell us that their tests exercise 70% of the code in the application⁶.

We’ve covered a lot of ground this month, from how to improve our work habits to writing good test cases. Next month we’ll get back to writing memory managers, and look at a few more specialized memory managers and the situations where they are appropriate. In the meantime, as you think about testing, remember that testing does not produce a high quality product. It only helps us recognize poor quality. We’re responsible for the quality of the software that we produce. We can improve the quality of our products by making them work correctly from the start.

Software Testing

The C/C++ Users Journal, September, 1999