Looping and Iteration

The C/C++ Users Journal, November, 1999

I remember an evening about fifteen years ago, back when I wrote code only for recreation and not for a living, that I spent struggling to write what should have been a reasonably simple function. The function took a pointer to an array of message strings. If the array hadn’t already been created the function read a file and built the array from the text in the file. Then it looked up a message in the array. That way if the message array was never used my program wouldn’t incur the overhead of reading the file containing the messages. Further, by coupling initialization of the message array to the code that used the array I eliminated the need to call an initialization function from my program’s startup code, reducing overall coupling and making the program more robust. Unfortunately, I couldn’t get it to work.

The problem arose because a pointer to an array of message strings in C is a pointer to a pointer to a pointer to char. I needed to use malloc to create an array of pointers to char and save the result in the location that the pointer to pointer to pointer to char pointed to. Then for each message, I had to use malloc again to allocate enough space to hold the message, and save the result at the location indicated by the current index into the array that the pointer to pointer to pointer to char pointed to. Each time I tried it I got the dereferencing wrong, and after an hour or so of being hit by system crashes I began to feel like a boxer lying on the floor of the ring, seeing stars. I eventually figured out that three layers of indirection made things too complicated. I reorganized the code to reduce the amount of indirection, quickly got it to work, and vowed never to write code that used a pointer to a pointer to a pointer again. I kept that vow - until a few months ago.

I was working on a function that examined a machine-generated array of pointers to pointers to void. The size of this array could vary and I didn’t know how large it actually was, but the last entry in the array would always be a null pointer. My code was supposed to walk through the elements of the array, and if the pointer that the array element pointed to was not a null pointer call a function with that pointer as an argument1. I don’t like using an array index when I don’t know what the upper bound is, so I used a pointer, and wrote a loop that terminated when the pointer that it pointed to was null. The body of the loop checked whether the pointer that the pointer that the pointer to pointer to pointer pointed to was null, and if it wasn’t, called the function. Needless to say, it didn’t work. It only took about ten minutes for me to recall my old vow, and get rid of that pointer to pointer to pointer to void. It took another two minutes to write the function. It’s easy to read, and it works.

If you’re a regular reader of The Journeyman’s Shop you know that I emphasize developing good programming habits without becoming a slave to them. In this case I applied a habit that I have learned over the years for writing loops, without looking critically enough at its consequences. Fortunately, I recognized the problem fairly quickly, so I didn’t have to spend time on the phone with angry customers. However, despite having to change this particular piece of code, I still don’t like using an array index when I don’t know what the upper bound is. I don’t like it because it mixes two different ways of thinking about loops.

There are two different forms of loops: the ones where we know in advance how many times we’ll execute the body of the loop, and the ones where we don’t. Of course, there are also cases where some condition occurring in the body of the loop tells us that we need to exit prematurely. But when we’re writing the outermost control structure for the loop, those are the two cases we have to think about.

Writing a loop when we know in advance how many time its body will be executed is often very easy: we create a counter, initialize it to zero, increment it each time we execute the loop body, and stop when it reaches the repetition count. Most of us can write that for loop with our fingers, without having to involve our brain:


#include <stdio.h>

void show(const char *const *args, int count)
    {
    int i;
    for (i = 0; i < count; ++i)
        printf("%s\n", args[i]);
    }

int main(int argc, char *argv[])
    {
    show(argv, argc);
    return 0;
    }

If we don’t know how many time we’ll perform the loop body we have to have some sort of sentinel to tell us that we’re done. That’s the approach that the algorithms in the C++ standard library take: they operate on a range of values specified by a starting location and an ending location. For example:


#include <stdio.h>

void show(const char *const *begin, const char *const *end)
    {
    while (begin != end)
        {
        printf("%s\n", *begin);
        ++begin;
        }
    }

int main(int argc, char *argv[])
    {
    show(argv, argv + argc);
    return 0;
    }

Although the form of the loop here is different from the first example, the underlying concept is similar: both use a variable to keep track of the location in the container, and loop until that variable compares equal to a specified value that indicates that the iteration is complete. To see this more clearly, consider a variation of the control loop from the first example:


void show(const char *const *args, int end)
    {
    int begin = 0;
    while (begin != end)
        {
        printf("%s\n", args[begin]);
        ++begin;
        }
    }

Here we’re clearly looping a fixed number of times, indicated by count, but we’ve written the loop to look more like the style of an STL loop. The difference between the two is in how we get at the underlying data that begin refers to. In the STL loop begin is a pointer that we have to dereference; in the counted loop it’s an index that we have to apply to a pointer.

A different form of loop uses a specially chosen data value to indicate the end of the iteration. When the data we’re using consists of pointers, a null pointer is a convenient choice:


#include <stdio.h>

void show(const char *const *args)
    {
    while (*args != NULL)
        {
        printf("%s\n", *args);
        ++args;
        }
    }

int main(int argc, char *argv[])
    {
    show(argv);
    return 0;
    }

If we decide not to use a null pointer we need some other form of distinguished data value:


#include <stdio.h>
#include <string.h>

void show(const char *const *args)
    {
    while (strcmp(*args, "end") != 0)
        {
        printf("%s\n", *args);
        ++args;
        }
    }

int main(int argc, char *argv[])
    {
    show(argv);
    return 0;
    }

If you’re going to use a sentinel value like this that is simply a specialized value of your data you have to be absolutely certain that that value will never occur in actual data. Otherwise you’ll find that your code mysteriously fails2. In general, this approach is only safe in carefully controlled situations.

If you need to move backwards through a container, it’s easy to adapt code that uses a range specification: just start at the end and go the other way. When the range is specified by a count the code looks like this:


#include <stdio.h>

void show(const char *const *args, int count)
    {
    int i;
    for (i = count; 0 < i; --i)
        printf("%s\n", args[i - 1]);
    }

    int main(int argc, char *argv[])
    {
    show(argv, argc);
    return 0;
    }

Here, the loop index is always one greater than the current location in the container. That’s why we display the value of args[i - 1] rather than args[i], and why we end the loop when i reaches 0 rather than when it becomes less than 0. It would also be possible to write the loop with an initial value of count - 1, quitting when i becomes negative. That would avoid the extra subtraction to compute the array index. That approach has one drawback: if for some reason you change to an unsigned type for the index the loop will never end. I’d rather not have to remember two different styles for writing a backward loop, so I’ll stick with the index being one higher than the actual position. It works for both signed and unsigned index types.

Tracking a position one higher than the actual position also works well with the STL form of sentinel:


#include <stdio.h>

void show(const char *const *begin, const char *const *end)
    {
    while (begin != end)
        {
        printf("%s\n", *(end - 1));
        --end;
        }
    }

int main(int argc, char *argv[])
    {
    show(argv, argv + argc);
    return 0;
    }

Just as in the previous example, we move the end position backwards until it reaches the beginning, performing the appropriate operation on the element that is one location behind the iterator.

Moving backwards through a container whose end is marked with a data sentinel is more complicated, because you have to move forward through the container until you reach the sentinel in order to find the starting location. Since there is no sentinel to indicate the beginning of the container, you have to use one of the range-based techniques as you move backwards:


#include <stdio.h>

void show(const char *const *args)
    {
    const char *const *loc = args;
    while (*loc != 0)
        ++loc;
    while (loc != args)
        {
        printf("%s\n", *(loc - 1));
        --loc;
        }
    }

int main(int argc, char *argv[])
    {
    show(argv);
    return 0;
    }

As you’ve looked at the code for moving backwards through a container you might have noticed a minor optimization that could easily be done. Instead of subtracting one each time the iterator is used, then decrementing the iterator at the end of the loop, you might be tempted to decrement the iterator at the beginning of the loop to avoid having to subtract one each time you want to access the data. For example, let’s change our example program so that it writes each command line argument both to standard out and to a file:


#include <stdio.h>

void show(const char *const *begin, const char *const *end)
    {
    File *fp = fopen("demo.txt", "w");
    while (begin != end)
        {
        fprintf(fp, "%s\n", *(end - 1));
        printf("%s\n", *(end - 1));
        --end;
        }
    fclose(fp);
    }

    int main(int argc, char *argv[])
    {
    show(argv, argv + argc);
    return 0;
    }

This loop could be rewritten like this, avoiding two subtractions:


while (begin != end)
    {
    --end;
    fprintf(fp, "%s\n", *end);
    printf("%s\n", *end);
    }

My inclination is to stay away from this, although I wouldn’t criticize anyone who did it. The drawback with this approach is that it’s not as clear what end means in this loop. For example, if you had to break out of the loop partway through, then resume at the point where you left off, you’d have to increment end before resuming in order to restore its meaning:


while (begin != end)
    {
    --end;
    if (some_condition())
        break;
    fprintf(fp, "%s\n", *end);
    printf("%s\n", *end);
    }
++end;  /* restore end’s meaning */
/* some unrelated processing */
while (begin != end)
    {
    --end;
    fprintf(fp, "%s\n", *end);
    printf("%s\n", *end);
    }

By decrementing end before the loop body has executed, the first code in the first loop violates the loop invariant that end points to one past the next element in the container. That makes it harder to maintain.

Java uses a rather different approach to iteration: it provides an abstract class, Enumeration, with two methods. One method tells you whether there are any more elements to be dealt with, and the other gives you the next element, throwing an exception if there aren’t any more. For example:


import java.util.*;

public class Test
{
private static void showAll(Enumeration e)
    {
    while (e.hasMoreElements())
        System.out.println(e.nextElement());
    }
public static void main(String args[])
    {
    Vector v = new Vector();
    for (int i = 0; i < 10; ++i)
        v.addElement(new Integer(i));
    showAll(v.elements());
    }
}

The method Vector.elements returns an object of type Enumeration that will iterate through all of the elements in the vector.

One obvious weakness of this approach is that it doesn’t provide any way of moving backwards. If you wanted to display the elements in reverse order you’d have to save them all as you iterated forward:


private static void showAll(Enumeration e)
    {
    Vector v = new Vector();
    while (e.hasMoreElements())
        v.addElement(e.nextElement());
    for (int i = v.size(); 0 < i; --i)
        System.out.println(v.elementAt(i-1));
    }

This problem wouldn’t exist if the class Enumeration had another method, say, prevElement(), that moved backwards.

In a sense this is a misleading example. For an object of type Vector it’s obviously easy to move in both directions. For other forms of containers it’s not so easy. For example, if you’re reading text from a keyboard it’s very difficult to back up. Since Java’s solution to many design problems is to throw exceptions, it wouldn’t be too jarring to Java programmers for prevElement to throw a NoSuchElementException when it was called for a container that didn’t support reverse iteration. In C++ the solution is to not provide operator-- for iterators that do not support backing up. That way you get a compile-time error, rather than a runtime error, if you use an iterator incorrectly.

One thing that all the forms of iteration we’ve looked at so far have in common is that the loop is part of the code. This is sometimes referred to as "pull" iteration, because the code we write asks for more data when it needs it. This contrasts with "push" iteration, where we provide a function to be called for each data object. For example, Windows provides a function named EnumFontFamiliesEx that is used to build a list of available fonts. You call it with the name of a font family and a pointer to a function to be called, with descriptive data, for each font in that family. A simplified call to EnumFontFamiliesEx might look something like this:


int callback(
    ENUMLOGFONT *log_font,
    NEWTEXTMETRIC *phys_font,
    int type,
    LPARAM user_data)
    {
    /* your code goes here */
    }

    void get_fonts()
    {
    LOGFONT data;
    /* set contents of data */
    EnumFontFamiliesEx(hdc, &data, callback, NULL, 0);
    }

Note that there is no loop in the code that we wrote. Our callback function gets called once for each font. Our only control once we have started the iteration is through the value returned by our callback function: We can end the iteration prematurely by returning zero. If we return a non-zero value the iteration will continue with the next font.

This push form of iteration is easy to implement in object-oriented languages with an abstract class. For example, let’s write an iterator class to iterate through a Java Vector object:


import java.util.Vector;

abstract public class VectorIterator
{
public VectorIterator(Vector vv)
    {
    v = vv;
    }
public void iterate()
    {
    for (int i = 0; i < v.size(); ++i)
        if (!apply(v.elementAt(i)))
            break;
    }
abstract protected boolean apply(Object o);
private Vector v;
}

To use this class we need to derive from it and override apply:


public class VectorDisplay
    extends VectorIterator
{
VectorDisplay(Vector v)
    {
    super(v);
    }
protected boolean apply(Object o)
    {
    System.out.println(o);
    return true;
    }
public static void main(String args[])
    {
    Vector v = new Vector();
    for (int i = 0; i < 10; ++i)
        v.addElement(new Integer(i));
    VectorDisplay disp = new VectorDisplay(v);
    disp.iterate();
    }
}

If you’re thinking that this is a lot of code to have to write just to be able to look at the elements of a container, you’re right. The advantage of this approach is the abstraction that you get when you use it. Once you’ve written the underlying interface classes for the containers that you’re using, you can iterate through any container type in exactly the same way: derive a class from the appropriate iterator class, add the appropriate operation for each element, and you’re done. You don’t have to worry about how the details of moving through a linked list differ from the details of moving through a hash table. This lets you focus on what you’re trying to do without having to spend as much time on the underlying implementation. It also makes it much simpler to change the underlying container type if you later determine that a different form of container is better suited to your application.

There are times, though, when the interface presented by an abstraction like this is too simple, and you might find that it’s hard to do what you need to do. If you use this sort of abstraction and you find that you’re having to do things in odd ways because the underlying abstraction doesn’t do what you need, it’s time to fall back to writing your own loops, as we did at the beginning of this column. It may not be as object oriented, but it gives you more power and more flexibility. Even if it does occasionally make you think about pointers to pointers to pointers.

1. This is part of the root management code for a garbage collector.

2. Since you’re reading this, we must have survived the 9/9/99 bug, caused by all that old FORTRAN code that used the date 9/9/99 as a sentinel to indicate the end of the input data.