Thirty years ago, when Spiro Agnew was nattering about nabobs of negativism and Richard Nixon was not a crook, one of the slogans from the other side of the wedge was "you haven’t converted a man just because you have silenced him." Here in The Journeyman’s Shop we have a similar rule for casts: you haven’t converted an object just because you have silenced the compiler. Or, a little less tersely and quite a bit more comprehensibly, before you add a cast make sure that you understand what you’re telling the compiler to do. A cast is a very powerful tool, one that many programmers are too quick to employ. In many cases a cast says to the compiler, "don’t worry, I know what I’m doing." That’s okay when it’s true. It’s disastrous when it’s not. This month we’re going to try to help avoid coding disasters by looking at pointer conversions and casts.
A conversion occurs whenever the compiler changes the type of an expression from the type given to that expression by the language rules to a type required by the context where that expression is used. For example:
unsigned char ch = 3;
Here the expression 3 has type int
. It is being used to
initialize a variable of type unsigned char
, so the
compiler converts its type from int
to unsigned
char
.
A conversion can result in a value that is different from the value of the original expression. For example:
unsigned char to_ch(int val)
{
unsigned char ch = val;
return ch;
}
Here, the expression val
has type int
. Just
as in the previous example, it is being used to initialize a variable of
type unsigned char
, so the compiler converts its type from
int
to unsigned char
. On a system where, for
example, unsigned char
is 8 bits and int
is 32
it is not possible to store every possible int
value in an
unsigned char
variable. The rule in that case is that the
value being stored is reduced modulo 2n, where n is the
number of bits in an unsigned char
; in this example the
value would be reduced modulo 256. Calling to_ch
shows the
effect of this conversion:
#include <stdio.h>
int main()
{
printf("1000 --> %d\n", to_ch(1000));
return 0;
}
An implicit conversion is one that the compiler is allowed
to do on its own. In the two code examples we’ve just looked at we’ve
seen an implicit conversion from int
to unsigned
char
. You probably noticed another implicit conversion, in the
main
function above: the call to to_ch
returns
an unsigned char
, but the compiler must convert the value
that it returns to an int
1
in order to pass the value to printf
. But that’s not all.
There are two more implicit conversions in main
. First,
printf
returns an int
, which the compiler
converts to type void
by ignoring it. Second, the format
string must be converted from an array of const char
to a pointer to const char
2.
An explicit conversion is one that the compiler is not allowed to do unless you tell it to. For example:
int *convert(void *ptr)
{
return ptr;
}
This code is illegal in C++: there is no implicit conversion from a
pointer to void
to a pointer to any
non-void
type.
A cast is something that you write in your source code to tell the compiler to perform an explicit conversion3. To make the previous example legal, we could rewrite it, with a cast, like this:
int *convert(void *ptr)
{
return (int *)ptr;
}
In C casts are simple: you put the name of the type that you want the
expression to be converted to in parentheses in front of the expression,
just as in the preceding code example. That form of cast is now frowned
upon in C++, because C++ provides four different forms of cast for four
different kinds of conversions4:
static_cast
, const_cast
,
dynamic_cast
, and reinterpret_cast
. There’s a
bit of overlap between the things these casts can do, but they let you
say more clearly what it is that you intend to change by a conversion,
which helps avoid mistakes. We’ll look at all four kinds of cast, as
they apply to various kinds of pointers, in the rest of this column.
Suppose you have a simple class hierarchy that looks like this:
class Base
{
};
class Derived : public Base
{
};
and you have some code that uses it like this:
void f(void *ptr)
{
// do something with ptr
}
void g()
{
Derived d;
f(&d);
}
The type of the expression &d
in g
is
pointer to Derived
. We can pass it as an argument to
f
without a cast because there is an implicit conversion
from any pointer type to void*
5.
A slightly more complicated case involves a function that takes a
pointer to Base
:
void f(Base *bp)
{
// do something with bp
}
void g()
{
Derived d;
f(&d);
}
This, too, is legal, because there is an implicit conversion from
pointer to Derived
to pointer to
Base
. This conversion is only allowed in contexts where
Base
is accessible - if Derived
inherited
privately from Base
instead of publicly, the implicit
conversion would only be allowed in member functions and friends6. The conversion is also only allowed when it
is unambiguous - we’ll see an example a little later of an ambiguous
conversion, and how to handle it.
This conversion is considered safe, and therefore appropriate for an
implicit conversion, because every object of type Derived
contains all the member data and has all the member functions of an
object of type Base
. However, the designer of the class
Base
is responsible for giving users of the class a good
shot at being able to use Base
safely. This usually means
making data private and providing virtual functions so that derived
classes can override its behavior properly.
Conversions in the other direction are much more dangerous, and the compiler is not allowed to do them on its own.
void f(Derived *dp)
{
// do something with dp
}
void g(void *ptr)
{
Base b;
f(ptr); // illegal
f(&b); // illegal
}
Stated more formally, there is no implicit conversion from a
void*
to a pointer to any non-void
type, and there is no implicit conversion from a pointer to
Base
to a pointer to Derived
. The
reason for this ought to be fairly evident: there is no guarantee that
the data that a pointer to void
or a pointer to
Base
points to has any connection whatsoever with the
class Derived
.
However, if you know that a pointer to Base
in
fact points to an object of type Derived
, you can tell the
compiler to do the conversion anyway. Use a static_cast
when you know that the conversion will make sense:
void f(Derived *dp)
{
// do something with dp
}
void g()
{
Derived d;
void *ptr = &d;
Base *bp = &d;
f(static_cast<Derived*>(ptr)); // okay
f(static_cast<Derived*>(bp)); // okay
}
Here, we know that ptr
and bp
both point to
an object of type Derived
, so we know that it’s safe to use
a static_cast
to tell the compiler to do the
conversion.
There is a limitation on this use of static_cast
: you
cannot use it to convert a pointer to a virtual base into a pointer to a
derived type. We’ll look at this limitation later on, when we talk about
dynamic_cast
.
Now let’s look at another form of conversion, in a more complicated class hierarchy:
class B {};
class I1 : public B {};
class I2 : public B {};
class D : public I1, public I2 {};
D d;
I1 *i1p = &d; // okay
I2 *i2p = &d; // okay
B *bp = &d; // illegal
There is an implicit conversion from a pointer to
D
to a pointer to I1
, and from a
pointer to D
to a pointer to I2
.
There isn’t an implicit conversion from a pointer to
D
to a pointer to B
, because an
object of type D
has two subobjects of type B
.
Rather than leave it up to the compiler to decide what the
initialization in this example means, the language definition does not
allow an implicit conversion here because the conversion is ambiguous.
You can’t get rid of this ambiguity by telling the compiler to do it
anyway -- instead, you have to give the compiler some guidance by
telling it which of the two B
subobjects you want to end up
pointing to:
B *bp1 = static_cast<I1*>(&d); // okay
B *bp2 = static_cast<I2*>(&d); // okay
These conversions work because the I1
and
I2
subobjects in D
each have a single
B
subobject. The cast tells the compiler to convert the
pointer into a pointer to I1
or a pointer to
I2
, and the compiler then implicitly converts the
resulting pointer into a pointer to B
. We could also
have used the pointers i1p
and i2p
to
initialize pointers to Base
:
B *b1p = i1p; // okay: points to Base
// subobject of I1
B *b2p = i2p; // okay: points to Base
// subobject of I2
Use static_cast
to convert a pointer to base type
into a pointer to derived type when you know that the conversion
is safe, and to give the compiler a hint when you want to convert a
pointer to derived type into a pointer to base type and
there are multiple subobjects of the desired base type.
When you use static_cast
to tell the compiler to convert
a pointer to a base type into a pointer to a derived type
you take responsibility for making sure that the conversion is safe. The
compiler doesn’t check whether the object that the original pointer
points to is an object of the derived type. It takes you at your word,
and converts the pointer. If you don’t know whether the pointer actually
points to an object of the derived type you shouldn’t use
static_cast
. Instead, you should use
dynamic_cast
, which checks the actual type of the object at
runtime. If the conversion is valid dynamic_cast
gives you
the pointer, adjusted if necessary. If the conversion is not valid it
gives you a null pointer7. For
example:
class Base
{
public:
virtual ~Base();
};
class Derived : public Base
{
};
class Other : public Base
{
};
Derived *convert(Base *bp)
{
return dynamic_cast<Derived*>(bp);
}
void test()
{
Derived d;
Other o;
Base *bpd = &d;
Other *bpo = &o;
Derived *dpd = convert(bpd);
Derived *dpo = convert(bpo);
}
At the end of test the pointer dpd
will point to
d
, and the pointer dpo
will contain a null
pointer.
Note that the class Base
in this example has a virtual
function. That’s required: you cannot use dynamic_cast
on a
pointer to a class type that has no virtual functions. The reason behind
this is pragmatic: runtime type checking requires the compiler to
generate information about the actual types of program objects. Adding a
pointer to a type description to a struct that merely contains two
doubles that represent a complex number would bloat that structure, and
would rarely be useful. Classes that have virtual functions already have
such a pointer in typical implementations, so adding type conversion
information doesn’t make the objects themselves any bigger.
I mentioned earlier that you can’t use a static_cast
to
convert from a pointer to a virtual base into a pointer to a
derived type. That’s because the location of the virtual base
subobject within the derived type depends on whether there is a more
derived type and how that type is defined. For example:
class Vbase {int i;};
class I1 : public virtual Vbase{int j;}
class I2 : public virtual Vbase{int k;}
One fairly typical way for the compiler to lay out this data in memory is like this:
Vbase *vbptr;
int j; // I1’s data
int i; // Vbase’s data
Similarly, if a class I2
is derived virtually from
Vbase
, its data might look like this:
Vbase *vbptr;
int k; // I2’s data
int i; // Vbase’s data
In both cases, the Vbase
subobject is located at the
same offset from the beginning of the object. But look at what happens
when we combine these two types in a derived class:
class Derived : public I1, public I2
{
int m;
};
Remember, there is only one Vbase
subobject in
Derived
; it usually looks something like this:
Vbase *vbptr1; // I1’s virtual base pointer
int j; // I1’s data
Vbase *vbptr2 // I2’s virtual base pointer
int k; // I2’s data
int m; // Derived’s data
int i; // Vbase’s data
If the compiler has to convert a pointer to Vbase
into a pointer to I1
it has to know whether the
Vbase
subobject is part of an object of type
I1
or part of an object of type Derived
. In
general there is no way to know this at compile time, so a conversion
from a pointer to a virtual base into a pointer to a derived
type can only be done with a dynamic_cast
.
You can also use dynamic_cast
to convert a pointer to a
type in another branch of a class hierarchy. For example:
class Base1
{
virtual void f();
};
class Base2
{
};
class Derived : public Base1, public Base2
{
};
void test()
{
Derived d;
Base1 *b1p = &d;
Base2 *b2p = dynamic_cast<Base2*>(b1p);
}
Just as before, dynamic_cast
will return a null pointer
if the conversion cannot be performed.
Finally, you can use dynamic_cast
to get a pointer to
the beginning of an object, with a
dynamic_cast<void*>
:
void test(Base2 *bp)
{
void *addr = dynamic_cast<void*>(bp);
}
This can be used to determine whether two Base
pointers
point to subobjects of the same object:
boolean same_object(Base1 *bp1, Base2 *bp2)
{
return dynamic_cast<void*>(bp1)
== dynamic_cast<void*>(bp2);
}
If you use a dynamic_cast
to initialize a variable that
is defined in an if
statement you can tightly control the
scope of the converted pointer:
void use(Base *bp)
{
if (Derived *dp = dynamic_cast<Derived*>(bp)
{
// code that uses dp
}
// code here cannot access dp
}
If you have a small, well-defined set of derived types you can use a
series of else if
statements to perform a type-specific
operation:
void use(Bse *bp)
{
if (Der1 *dp1 = dynamic_cast<Der1*>(bp)
{
// code that uses dp1
}
else if (Der2 *dp2 = dynamic_cast<Der2*>(bp)
{
// code that uses dp2
}
else if (Der3 *dp3 = dynamic_cast<Der3*>(bp)
{
// code that uses dp3
}
else
// unknown type
}
Of course, it’s almost always more appropriate to use virtual functions than to explicitly check a pointer’s type in this way. But once in a while this sort of cascade can be the best solution.
Use dynamic_cast
to get a runtime check of the validity
of a conversion, to convert a pointer to a virtual base into a
pointer to a derived type, and to get the address of the start of
an object.
Don’t ever use this. There is nothing portable you can do with it. If you really need to do bit twiddling on pointers read what the C++ standard says about reinterpret_cast and read your compiler’s documentation. Be warned: you’re playing with fire.
So far, none of the pointers we’ve looked at had const
or volatile
qualifiers in their type declarations. Now it’s
time to consider the effect of const
and
volatile
qualifiers on the validity of a cast expression.
Rather than keep on repeating `const and volatile’ or using the
standardese cv-qualifier, I’ll simply talk about const
. The
same rules apply to volatile
.
You can use any of the three cast operators that we looked at to change the constness of a type, provided you don’t cast away constness. This means that most of the conversions that we’re used to applying in const-correct code are valid as part of a cast. For example,
class Base {};
class Derived : public Base {};
Derived d;
Base *bp = &d;
const Derived *dp =
static_cast<const Derived *>(bp);
Here, the conversion from pointer to Base
into
pointer to const Derived
does not cast away
constness, so we’re allowed to add the const
qualifier as
part of the cast8.
If you need to cast away constness you can use a
const_cast
:
Derived d;
const Derived *cdp = &d;
Derived *dp = const_cast<Derived *>(cdp);
Here, the const_cast
tells the compiler that the
programmer is taking responsibility for the consequences if removing the
const
qualifier causes problems.
The C++ standard has a precise definition of what it means to cast
away constness, and if you want to see the formalism you can look there.
Informally, once you add a const
qualifier to a multi-level
pointer you must add const
qualifiers all the way up to the
pointer type that you are converting to. For example,
int **ipp;
int const **cipp = ipp; // illegal: casts away const
int const * const *ccipp = ipp; // okay
Here, ccipp
is a pointer to a const
pointer to const int
. Because we added a
const
qualifier to the int
part of the
declaration of ipp
, we had to add a second
const
qualifier, moving up through the levels of pointers,
in order to avoid casting away const.
Got that? Now let’s look at a pointer to a pointer to a pointer:
int ***ippp;
int const *** cippp = ippp; // illegal
int const * const ** ccippp = ippp; // illegal
int const * const * const * ccippp = ippp; // okay
So far, this example looks pretty much like the previous one. We’ve
added a const
qualifier to the type of the underlying
object that we’re pointing to, so the rule says that we have to add
const
qualifiers all the way up to the next to last
pointer. But we don’t have to start adding const
qualifiers
at the underlying type. We can start part way through, just so long as
we keep adding qualifiers on the way out:
int * const **xcippp = ippp; // illegal: casts away const
int * const * const *xccippp = ippp; // okay
Use const_cast
to cast away constness when you know that
it is safe to do so.
All of these casts can be safely applied to a null pointer. If the cast requires changing the value of the pointer, the compiler will generate code that checks for a null pointer before doing any arithmetic on the pointer value.
Although none of our examples so far have done it, these casts can be cascaded. This can be rather hard to read:
Derived d;
const Base *bp = &d;
Derived *dp = const_cast<Derived*>(
static_cast<const Derived*>(bp));
That’s actually a good thing: it helps discourage use of casts. Most casts are used to get around the protection that C++’s strong typing provides. Occasionally there are good reasons for doing that, but often the need to violate the type system is a symptom of a design problem. When the compiler tells you that it can’t do an implicit conversion, make sure that you understand why the compiler is complaining. Then think about whether what you’re trying to do is appropriate, or whether a change in design can eliminate the need for violating the type system. After thinking hard about this, if you are convinced that you really do need to do the conversion, add a cast.
1. That is, on our (not very) hypothetical hardware
architecture. On an architecture where an int
cannot hold
all possible values of type unsigned char
the conversion
would be to an unsigned int
. This happens when
unsigned char
and int
are the same size.
2. In C this involves two implicit conversions: the
array of char
is converted to a pointer to
char
, and the pointer to char
is
converted to a pointer to const char
.
3. Programmers who anthropomorphise compilers
sometimes muddle "conversion" and "cast" with
statements like "the compiler casts the int
value to
an unsigned char
." This is wrong. A cast is something
you write in your source code. The compiler can convert values, but it
cannot cast them. (It’s also arguably wrong because it introduces an
outside entity, the compiler, into what ought to be a conversation
between a programmer and the language definition, but that’s a subject
for another column).
4. We may have missed an opportunity here: there
should be a fifth conversion, of the form
compiler_cast<T>(v)
, which tells the compiler to do
exactly what it would have done in the absence of the cast, but not to
generate a warning. That way we could suppress warnings on valid, well-
defined implicit conversions such as int
to unsigned
char
that compiler writers suspect we don’t understand well
enough to be allowed to use.
5. We’ll look at the limits that are imposed by the
qualifiers const
and volatile
a bit later.
6. In addition, an old-style cast can be used for this conversion, and it is required to produce a pointer to the correct subobject in the presence of multiple inheritance.
7. We haven’t said anything about references so
far, but in general you can convert references in the same way as
pointers. However, since null references do not exist in C++,
dynamic_cast
has to use a different mechanism to tell you
that the conversion was not valid. When you use
dynamic_cast
to convert a reference and the conversion is
not valid it throws an exception object of type
std::bad_cast
.
8. When these new-style casts were first introduced
the rule was that you couldn’t add or remove const
qualifiers. Somewhere along the way this was replaced with the present
rule. I, for one, overlooked this sensible change, until I reread the
rules for these conversions in preparation for writing this column.
Copyright © 2000-2006 by Pete Becker. All rights reserved.