Casts and Conversions

The C/C++ Users Journal, April, 2000

Thirty years ago, when Spiro Agnew was nattering about nabobs of negativism and Richard Nixon was not a crook, one of the slogans from the other side of the wedge was "you haven’t converted a man just because you have silenced him." Here in The Journeyman’s Shop we have a similar rule for casts: you haven’t converted an object just because you have silenced the compiler. Or, a little less tersely and quite a bit more comprehensibly, before you add a cast make sure that you understand what you’re telling the compiler to do. A cast is a very powerful tool, one that many programmers are too quick to employ. In many cases a cast says to the compiler, "don’t worry, I know what I’m doing." That’s okay when it’s true. It’s disastrous when it’s not. This month we’re going to try to help avoid coding disasters by looking at pointer conversions and casts.

Terminology

A conversion occurs whenever the compiler changes the type of an expression from the type given to that expression by the language rules to a type required by the context where that expression is used. For example:


unsigned char ch = 3;

Here the expression 3 has type int. It is being used to initialize a variable of type unsigned char, so the compiler converts its type from int to unsigned char.

A conversion can result in a value that is different from the value of the original expression. For example:


unsigned char to_ch(int val)
    {
    unsigned char ch = val;
    return ch;
    }

Here, the expression val has type int. Just as in the previous example, it is being used to initialize a variable of type unsigned char, so the compiler converts its type from int to unsigned char. On a system where, for example, unsigned char is 8 bits and int is 32 it is not possible to store every possible int value in an unsigned char variable. The rule in that case is that the value being stored is reduced modulo 2n, where n is the number of bits in an unsigned char; in this example the value would be reduced modulo 256. Calling to_ch shows the effect of this conversion:


#include <stdio.h>

int main()
    {
    printf("1000 --> %d\n", to_ch(1000));
    return 0;
    }

An implicit conversion is one that the compiler is allowed to do on its own. In the two code examples we’ve just looked at we’ve seen an implicit conversion from int to unsigned char. You probably noticed another implicit conversion, in the main function above: the call to to_ch returns an unsigned char, but the compiler must convert the value that it returns to an int1 in order to pass the value to printf. But that’s not all. There are two more implicit conversions in main. First, printf returns an int, which the compiler converts to type void by ignoring it. Second, the format string must be converted from an array of const char to a pointer to const char2.

An explicit conversion is one that the compiler is not allowed to do unless you tell it to. For example:


int *convert(void *ptr)
    {
    return ptr;
    }

This code is illegal in C++: there is no implicit conversion from a pointer to void to a pointer to any non-void type.

A cast is something that you write in your source code to tell the compiler to perform an explicit conversion3. To make the previous example legal, we could rewrite it, with a cast, like this:


int *convert(void *ptr)
    {
    return (int *)ptr;
    }

C-Style Casts and New-Style Casts

In C casts are simple: you put the name of the type that you want the expression to be converted to in parentheses in front of the expression, just as in the preceding code example. That form of cast is now frowned upon in C++, because C++ provides four different forms of cast for four different kinds of conversions4: static_cast, const_cast, dynamic_cast, and reinterpret_cast. There’s a bit of overlap between the things these casts can do, but they let you say more clearly what it is that you intend to change by a conversion, which helps avoid mistakes. We’ll look at all four kinds of cast, as they apply to various kinds of pointers, in the rest of this column.

static_cast

Suppose you have a simple class hierarchy that looks like this:


class Base
{
};

class Derived : public Base
{
};

and you have some code that uses it like this:


void f(void *ptr)
    {
    // do something with ptr
    }

void g()
    {
    Derived d;
    f(&d);
    }

The type of the expression &d in g is pointer to Derived. We can pass it as an argument to f without a cast because there is an implicit conversion from any pointer type to void*5.

A slightly more complicated case involves a function that takes a pointer to Base:


void f(Base *bp)
    {
    // do something with bp
    }

void g()
    {
    Derived d;
    f(&d);
    }

This, too, is legal, because there is an implicit conversion from pointer to Derived to pointer to Base. This conversion is only allowed in contexts where Base is accessible - if Derived inherited privately from Base instead of publicly, the implicit conversion would only be allowed in member functions and friends6. The conversion is also only allowed when it is unambiguous - we’ll see an example a little later of an ambiguous conversion, and how to handle it.

This conversion is considered safe, and therefore appropriate for an implicit conversion, because every object of type Derived contains all the member data and has all the member functions of an object of type Base. However, the designer of the class Base is responsible for giving users of the class a good shot at being able to use Base safely. This usually means making data private and providing virtual functions so that derived classes can override its behavior properly.

Conversions in the other direction are much more dangerous, and the compiler is not allowed to do them on its own.


void f(Derived *dp)
    {
    // do something with dp
    }

void g(void *ptr)
    {
    Base b;
    f(ptr);     // illegal
    f(&b);      // illegal
    }

Stated more formally, there is no implicit conversion from a void* to a pointer to any non-void type, and there is no implicit conversion from a pointer to Base to a pointer to Derived. The reason for this ought to be fairly evident: there is no guarantee that the data that a pointer to void or a pointer to Base points to has any connection whatsoever with the class Derived.

However, if you know that a pointer to Base in fact points to an object of type Derived, you can tell the compiler to do the conversion anyway. Use a static_cast when you know that the conversion will make sense:


void f(Derived *dp)
    {
    // do something with dp
    }

void g()
    {
    Derived d;
    void *ptr = &d;
    Base *bp = &d;
    f(static_cast<Derived*>(ptr));  // okay
    f(static_cast<Derived*>(bp));   // okay
    }

Here, we know that ptr and bp both point to an object of type Derived, so we know that it’s safe to use a static_cast to tell the compiler to do the conversion.

There is a limitation on this use of static_cast: you cannot use it to convert a pointer to a virtual base into a pointer to a derived type. We’ll look at this limitation later on, when we talk about dynamic_cast.

Now let’s look at another form of conversion, in a more complicated class hierarchy:


class B {};
class I1 : public B {};
class I2 : public B {};
class D : public I1, public I2 {};

D d;
I1 *i1p = &d;   // okay
I2 *i2p = &d;   // okay
B *bp = &d;     // illegal

There is an implicit conversion from a pointer to D to a pointer to I1, and from a pointer to D to a pointer to I2. There isn’t an implicit conversion from a pointer to D to a pointer to B, because an object of type D has two subobjects of type B. Rather than leave it up to the compiler to decide what the initialization in this example means, the language definition does not allow an implicit conversion here because the conversion is ambiguous. You can’t get rid of this ambiguity by telling the compiler to do it anyway -- instead, you have to give the compiler some guidance by telling it which of the two B subobjects you want to end up pointing to:


B *bp1 = static_cast<I1*>(&d);   // okay
B *bp2 = static_cast<I2*>(&d);  // okay

These conversions work because the I1 and I2 subobjects in D each have a single B subobject. The cast tells the compiler to convert the pointer into a pointer to I1 or a pointer to I2, and the compiler then implicitly converts the resulting pointer into a pointer to B. We could also have used the pointers i1p and i2p to initialize pointers to Base:


B *b1p = i1p;   // okay: points to Base
                // subobject of I1
B *b2p = i2p;   // okay: points to Base
                // subobject of I2

Use static_cast to convert a pointer to base type into a pointer to derived type when you know that the conversion is safe, and to give the compiler a hint when you want to convert a pointer to derived type into a pointer to base type and there are multiple subobjects of the desired base type.

dynamic_cast

When you use static_cast to tell the compiler to convert a pointer to a base type into a pointer to a derived type you take responsibility for making sure that the conversion is safe. The compiler doesn’t check whether the object that the original pointer points to is an object of the derived type. It takes you at your word, and converts the pointer. If you don’t know whether the pointer actually points to an object of the derived type you shouldn’t use static_cast. Instead, you should use dynamic_cast, which checks the actual type of the object at runtime. If the conversion is valid dynamic_cast gives you the pointer, adjusted if necessary. If the conversion is not valid it gives you a null pointer7. For example:


class Base
{
public:
    virtual ~Base();
};

class Derived : public Base
{
};

class Other : public Base
{
};

Derived *convert(Base *bp)
    {
    return dynamic_cast<Derived*>(bp);
    }

void test()
    {
    Derived d;
    Other o;
    Base *bpd = &d;
    Other *bpo = &o;
    Derived *dpd = convert(bpd);
    Derived *dpo = convert(bpo);
    }

At the end of test the pointer dpd will point to d, and the pointer dpo will contain a null pointer.

Note that the class Base in this example has a virtual function. That’s required: you cannot use dynamic_cast on a pointer to a class type that has no virtual functions. The reason behind this is pragmatic: runtime type checking requires the compiler to generate information about the actual types of program objects. Adding a pointer to a type description to a struct that merely contains two doubles that represent a complex number would bloat that structure, and would rarely be useful. Classes that have virtual functions already have such a pointer in typical implementations, so adding type conversion information doesn’t make the objects themselves any bigger.

I mentioned earlier that you can’t use a static_cast to convert from a pointer to a virtual base into a pointer to a derived type. That’s because the location of the virtual base subobject within the derived type depends on whether there is a more derived type and how that type is defined. For example:


class Vbase {int i;};
class I1 : public virtual Vbase{int j;}
class I2 : public virtual Vbase{int k;}

One fairly typical way for the compiler to lay out this data in memory is like this:


Vbase *vbptr;
int j;  // I1’s data
int i;  // Vbase’s data

Similarly, if a class I2 is derived virtually from Vbase, its data might look like this:


Vbase *vbptr;
int k;  // I2’s data
int i;  // Vbase’s data

In both cases, the Vbase subobject is located at the same offset from the beginning of the object. But look at what happens when we combine these two types in a derived class:


class Derived : public I1, public I2
{
    int m;
};

Remember, there is only one Vbase subobject in Derived; it usually looks something like this:


Vbase *vbptr1;   // I1’s virtual base pointer
int j;          // I1’s data
Vbase *vbptr2   // I2’s virtual base pointer
int k;          // I2’s data
int m;          // Derived’s data
int i;          // Vbase’s data

If the compiler has to convert a pointer to Vbase into a pointer to I1 it has to know whether the Vbase subobject is part of an object of type I1 or part of an object of type Derived. In general there is no way to know this at compile time, so a conversion from a pointer to a virtual base into a pointer to a derived type can only be done with a dynamic_cast.

You can also use dynamic_cast to convert a pointer to a type in another branch of a class hierarchy. For example:


class Base1
{
    virtual void f();
};

class Base2
{
};

class Derived : public Base1, public Base2
{
};

void test()
    {
    Derived d;
    Base1 *b1p = &d;
    Base2 *b2p = dynamic_cast<Base2*>(b1p);
    }

Just as before, dynamic_cast will return a null pointer if the conversion cannot be performed.

Finally, you can use dynamic_cast to get a pointer to the beginning of an object, with a dynamic_cast<void*>:


void test(Base2 *bp)
    {
    void *addr = dynamic_cast<void*>(bp);
    }

This can be used to determine whether two Base pointers point to subobjects of the same object:


boolean same_object(Base1 *bp1, Base2 *bp2)
    {
    return dynamic_cast<void*>(bp1)
        == dynamic_cast<void*>(bp2);
    }

If you use a dynamic_cast to initialize a variable that is defined in an if statement you can tightly control the scope of the converted pointer:


void use(Base *bp)
    {
    if (Derived *dp = dynamic_cast<Derived*>(bp)
        {
        // code that uses dp
        }
    // code here cannot access dp
    }

If you have a small, well-defined set of derived types you can use a series of else if statements to perform a type-specific operation:


void use(Bse *bp)
    {
    if (Der1 *dp1 = dynamic_cast<Der1*>(bp)
        {
        // code that uses dp1
        }
    else if (Der2 *dp2 = dynamic_cast<Der2*>(bp)
        {
        // code that uses dp2
        }
    else if (Der3 *dp3 = dynamic_cast<Der3*>(bp)
        {
        // code that uses dp3
        }
    else
        // unknown type
    }

Of course, it’s almost always more appropriate to use virtual functions than to explicitly check a pointer’s type in this way. But once in a while this sort of cascade can be the best solution.

Use dynamic_cast to get a runtime check of the validity of a conversion, to convert a pointer to a virtual base into a pointer to a derived type, and to get the address of the start of an object.

reinterpret_cast

Don’t ever use this. There is nothing portable you can do with it. If you really need to do bit twiddling on pointers read what the C++ standard says about reinterpret_cast and read your compiler’s documentation. Be warned: you’re playing with fire.

const, volatile, and const_cast

So far, none of the pointers we’ve looked at had const or volatile qualifiers in their type declarations. Now it’s time to consider the effect of const and volatile qualifiers on the validity of a cast expression. Rather than keep on repeating `const and volatile’ or using the standardese cv-qualifier, I’ll simply talk about const. The same rules apply to volatile.

You can use any of the three cast operators that we looked at to change the constness of a type, provided you don’t cast away constness. This means that most of the conversions that we’re used to applying in const-correct code are valid as part of a cast. For example,


class Base {};
class Derived : public Base {};

Derived d;
Base *bp = &d;
const Derived *dp =
    static_cast<const Derived *>(bp);

Here, the conversion from pointer to Base into pointer to const Derived does not cast away constness, so we’re allowed to add the const qualifier as part of the cast8.

If you need to cast away constness you can use a const_cast:


Derived d;
const Derived *cdp = &d;
Derived *dp = const_cast<Derived *>(cdp);

Here, the const_cast tells the compiler that the programmer is taking responsibility for the consequences if removing the const qualifier causes problems.

The C++ standard has a precise definition of what it means to cast away constness, and if you want to see the formalism you can look there. Informally, once you add a const qualifier to a multi-level pointer you must add const qualifiers all the way up to the pointer type that you are converting to. For example,


int **ipp;
int const **cipp = ipp;         // illegal: casts away const
int const * const *ccipp = ipp; // okay

Here, ccipp is a pointer to a const pointer to const int. Because we added a const qualifier to the int part of the declaration of ipp, we had to add a second const qualifier, moving up through the levels of pointers, in order to avoid casting away const.

Got that? Now let’s look at a pointer to a pointer to a pointer:


int ***ippp;
int const *** cippp = ippp;                 // illegal
int const * const ** ccippp = ippp;         // illegal
int const * const * const * ccippp = ippp;  // okay

So far, this example looks pretty much like the previous one. We’ve added a const qualifier to the type of the underlying object that we’re pointing to, so the rule says that we have to add const qualifiers all the way up to the next to last pointer. But we don’t have to start adding const qualifiers at the underlying type. We can start part way through, just so long as we keep adding qualifiers on the way out:


int * const **xcippp = ippp;                // illegal: casts away const
int * const * const *xccippp = ippp;        // okay

Use const_cast to cast away constness when you know that it is safe to do so.

In General

All of these casts can be safely applied to a null pointer. If the cast requires changing the value of the pointer, the compiler will generate code that checks for a null pointer before doing any arithmetic on the pointer value.

Although none of our examples so far have done it, these casts can be cascaded. This can be rather hard to read:


Derived d;
const Base *bp = &d;
Derived *dp = const_cast<Derived*>(
    static_cast<const Derived*>(bp));

That’s actually a good thing: it helps discourage use of casts. Most casts are used to get around the protection that C++’s strong typing provides. Occasionally there are good reasons for doing that, but often the need to violate the type system is a symptom of a design problem. When the compiler tells you that it can’t do an implicit conversion, make sure that you understand why the compiler is complaining. Then think about whether what you’re trying to do is appropriate, or whether a change in design can eliminate the need for violating the type system. After thinking hard about this, if you are convinced that you really do need to do the conversion, add a cast.

1. That is, on our (not very) hypothetical hardware architecture. On an architecture where an int cannot hold all possible values of type unsigned char the conversion would be to an unsigned int. This happens when unsigned char and int are the same size.

2. In C this involves two implicit conversions: the array of char is converted to a pointer to char, and the pointer to char is converted to a pointer to const char.

3. Programmers who anthropomorphise compilers sometimes muddle "conversion" and "cast" with statements like "the compiler casts the int value to an unsigned char." This is wrong. A cast is something you write in your source code. The compiler can convert values, but it cannot cast them. (It’s also arguably wrong because it introduces an outside entity, the compiler, into what ought to be a conversation between a programmer and the language definition, but that’s a subject for another column).

4. We may have missed an opportunity here: there should be a fifth conversion, of the form compiler_cast<T>(v), which tells the compiler to do exactly what it would have done in the absence of the cast, but not to generate a warning. That way we could suppress warnings on valid, well- defined implicit conversions such as int to unsigned char that compiler writers suspect we don’t understand well enough to be allowed to use.

5. We’ll look at the limits that are imposed by the qualifiers const and volatile a bit later.

6. In addition, an old-style cast can be used for this conversion, and it is required to produce a pointer to the correct subobject in the presence of multiple inheritance.

7. We haven’t said anything about references so far, but in general you can convert references in the same way as pointers. However, since null references do not exist in C++, dynamic_cast has to use a different mechanism to tell you that the conversion was not valid. When you use dynamic_cast to convert a reference and the conversion is not valid it throws an exception object of type std::bad_cast.

8. When these new-style casts were first introduced the rule was that you couldn’t add or remove const qualifiers. Somewhere along the way this was replaced with the present rule. I, for one, overlooked this sensible change, until I reread the rules for these conversions in preparation for writing this column.