C++ compilation

I’ve been spending a lot of time thinking about C and C++
development lately. In particular I’ve been thinking along the lines
of, “how can we make C++ development as easy as Java development?”.
This turns out to touch a lot of different things, and I’m going to
blog more fully about it later. Today I just wanted to share some
thought about one particular problem: building gcjx.

I wrote about this earlier, noting
the incredible performance difference between g++ and jikes
.
Today I went a little bit deeper and built gcjx with the
-ftime-report flag. Then I wrote a little perl script to
summarize the results (I’m only showing the top 4 here, the rest are
all 3% or less):

Pass User Time % Total
parser 113.62 44.6
name lookup 34.34 13.5
symout 26.59 10.4
tree gimplify 18.95 7.4

I think the underlying problem here is that each compilation of a
gcjx source file re-scans many of the gcjx headers and also a fair
chunk of libstdc++, which hugely inflates the amount of
work done. For instance, mangle.cc is a mere 386 lines
of code, but run it through cpp and out comes a whopping 51,997 lines.
Like most sane programs, gcjx doesn’t play odd cpp tricks like
including files multiple times; so much of this work is simply
redundant.

The fix here is to move away from purely textual preprocessing and
move to a more sophisticated model, one that eliminates redundant
work. This is actually not as hard as you might first believe. For
the most part it is an application of better data structures to the
problem, coupled with the observation that, since typically header
files do not interfere with each other via macro tricks, you can gain
efficiency by caching lookup contexts during preprocessing. Perhaps I
ought to write this up more fully; that is pretty sketchy.

I know of two attempts at this already. One is the gcc
compiler server branch
. Unfortunately this project seems to have
been canceled by Apple, as there has not been progress on it in quite
some time. There was little documentation for this project and, I
think, nobody outside Apple really understood what it was about back
when it was active.

The other project in this area is Doug Schaefer’s PDOM work in
the Eclipse CDT. Unfortunately this is for indexing only — as far as
I know there is no code generation planned. (Remedying this is one of
my major prescriptions for improving C++ tools… but you have to wait
for the bigger entry on this topic.)

Be the first to leave a comment. Don’t be shy.

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.