gcjx uses a simple version of the Visitor pattern for
code generation. I’ve been thinking about this a bit lately, as
experience with gcjx and random discussions with Graydon have been
tweaking my interest in language design.
For those who don’t know, visitors are basically a way to achieve
dispatch on the dynamic type of an argument to a method. This is
very handy for doing things like walking the model of a program that
is built up inside a compiler.
In gcjx this takes a very simple form. There is an abstract
visitor base class which has one abstract method for each object in
the model, like:
class visitor {
virtual void visit_block (model_block *,
const std::list<ref_stmt> &) = 0;
...
};
The arguments here are ad hoc, according to the particular object
being visited (it need not be done this way, but it was convenient
for gcjx).
Then each class in the model has its own visit
method:
class model_block {
void visit (visitor *v) {
v->visit_block (this, statements);
}
};
As you can see this results in a straightforward way to achieve
multiple dispatch. You simply call the visit
method on
any element of the model, and the appropriate method in your visitor
will be called.
One nice thing about this approach is that the compiler will tell
you if your visitor is incomplete, since that can only happen if you
didn’t implement some abstract method. This also means it is easy to
add a new class to the model — all existing visitors will break,
making it simple to figure out where to add new methods.
The downside of this approach is that it is inflexible in a few
ways. For instance, consider the tree-generating back end in gcjx.
When compiling to trees, we want to build a new GCC tree object
representing each object in model of the program. So, the obvious
way to do that would be to have the visit
method
return a tree
.
This is unsatisfactory, though, because it means you have to
modify every class in the model to allow this. This in turn means
that the declaration of tree
must be visible globally —
it can no longer be segregated to a single back end. Of course this
could be worked around; e.g., visit
could return
void*
… but then you lose type safety and have to add
casts all over.
Another approach to this problem is multi-methods, which means
doing dispatch on the runtime type of the arguments. This way you can
use generic functions instead of visitors, and then easily add new
kinds of visitors without modifying the classes in the model.
C++ doesn’t directly support this, though apparently it can
be done. One drawback I do see here is that it doesn’t seem
possible to determine when you haven’t written a method. The
compiler, seemingly, can’t tell you… a classic sort of
static/dynamic tradeoff. I’m not really all that familiar with
existing multimethod implementations, maybe there is some nice way to
inform compilers of one’s intent here.
A third approach, taken in GCC, is to simply switch
on the type of the object. One advantage of this approach is that it
is often simpler to keep track of local state — you can write
iterative code instead of recursive code in some places, you don’t
have to invert a lot of logic to put things in separate functions,
etc. This also suffers from the problems that arise if you add a new
class.
Coding styles that substitute programmer discipline for compiler
errors don’t seem to work that well for me. The ideal approach would
look somewhat like multimethods, but would let me have the compiler
check self-imposed constraints about which methods must exist.