C++ noodling

A while back I spent a little time playing with g++, trying to understand compilation performance. For this experiment I tried to measure how much speedup a “model-based” compiler could expect to achieve.

I compiled my test program a few different ways. I timed compilation of a plain build (more or less a “make”) both with and without PCH. And, I timed compilation of the “all.cc” approach — I’ve heard many times over the years that C++ shops will cat their sources together into one big compilation unit, and that this reduces overall build times.

So far I’ve only done this experiment by compiling gcjx, a moderately sized, fairly ordinary, C++ program with which I am familiar. I plan to redo this with a couple other programs as well (send suggestions).

The results:

Approach Time
Plain build 13m50s
Plain + PCH 8m05s
all.cc 3m18s
all.cc + PCH 3m17s

This basically conforms with things I’ve found previously, but I was a little surprised that PCH is not a bigger win than it is, especially considering that I intentionally wrote gcjx with one big header file that includes most things, precisely to make better use of this feature. (I’ve heard from other folks that in some situations PCH is a net lose and they disable it. Any experiences out there?)

Anyway, I consider this promising data for a compile server approach, since what I’m thinking about essentially models the “all.cc” approach in the compiler.

8 Comments

  • Wow, the all.cc trick results in a surprisingly massive speedup! Although incremental builds will be slower (no individual .o files for each .cc file are generated).

  • Thank you Tom, for the accurate analysis…

    I was also surprised when I introduced PCH in my software and the gain in compilation time was only marginal. But at the same I don’t like too much the all.cc approach. This could result in a big win when you are compiling the full code (say a new installation), but during the development you rebuild the full code very very rarely, and what “make” does for you is to build only the little things you modified.

    cheers…

  • I’ve heard of an all.h approach, where all the project’s includes are put into a master include, which can then be made into a pch, but not an all.cc approach. For the all.h approach, pch seems like an obvious win and a minor perturbation to the existing build machinery.

    I would be surprised to find out that the all.cc approach is a common build strategy, especially for projects that do template instantiations. Also, wouldn’t this muck with dependency chains for shared libs, where essentially all objects get pulled in, all the time?

    Although your timings are certainly interesting…

  • Benjamin — yeah, this trick is something Anthony told me about a long time ago. But I’ve also heard it occasionally from other folks in the intervening years. I don’t know about the shared libs thing; but for template instantiations this ought to be faster than the ordinary style — since really all this does is eliminate any redundant #include parsing overhead.

    One problem with this technique is that file-local objects aren’t guaranteed to have unique names across the entire project. So if you make extensive use of static you may not be able to do this.

    And of course, as Hugo and hilbert point out, this is not ideal for incremental development. But for me the point of this experiment is to try to understand whether changing the compiler to a different model makes sense — and this, I think, shows that it does.

  • Interesting stuff!
    Wondering if you have any idea where does the advantage of all.cc approach come from. One way would
    be to look at the results of /usr/bin/size on the object files produced. Maybe in the plain case some
    templates are instantiated over and over and compiled in a number of object files.
    Also -ftime-report numbers would be interesting for both approaches.
    Another interesting comparison would be to compare all.cc with “g++ *.cxx –combine …”

  • @d: I looked at the time-report output of an ordinary build a while back:
    http://tromey.com/blog/?p=39

    From these rough figures we can see that parsing and semantic analysis are the killers. And, based on the all.cc experiment, we can hypothesize that it is related to the explosion of source lines due to header file inclusion.

    FWIW my recollection is that g++ does not yet support –combine — that it was only implemented for the C compiler.

    More on this topic soon. Next I think I want to learn more about potential downsides for my “model-based” approach. And, I want to look a memory use a bit.

  • Tom, it’s great that you are thinking deeply about build-related issues again.

    The all.cc build timing continue to interest me. I suppose I just separated out all the template instantiations because of memory usage issues… but I have not re-thought out this in many years, and perhaps gcc/g++ has changed behavior. Forcing all of this stuff into one file will most definitely make peak memory usage increase substantially, no? I suppose I could just test this out myself and cat all the src/*-inst.cc files together in libstdc++. Hmmm.

    Re: perturbations in the force, and static changing meaning, this will be an issue, although with gcc-4.2 and later anonymous namespaces are the preferred idiom for specifying local storage. Maybe this is why Geoff Keating has been so keen on static mangling issues for darwin?

    Re: PCH expectations not living up to reality. Is this something to do with the gcc implementation, or something else? Do other compilers do PCH better? I find that PCH on g++ gives about 30% increase, but more if you are repeatedly compiling small files with the same set of includes (ie, stl.h, the usual C++ name for all.h). My thinking on this has been warped by my usual use case, in that libstdc++ regression testing sees a big win, but that using PCH for the actual build is not a win.

  • Yeah, I think this will increase peak memory use, though in my particular example I suspect the effect will not be severe. I’m building a new gcc to try to measure this (it turns out that on Linux getrusage doesn’t return peak memory stats — how did I never run into this before?).

    The “static” thing is really just a problem if you’re hacking around trying this. For my future plan for a compile server it won’t be an issue, since we’ll just modify the compiler to do the right thing.

    I don’t know what other compilers do for PCH. In my case I got 40% — which is pretty good but still not enough to make the compiler feel responsive. BTW I heard somewhere that some distros don’t ship the libstdc++ PCH files since they can slow down compilation in some situations :-(. I didn’t try to measure this myself.

    I really need to write up some more of this. If the model-based idea doesn’t work out there are still fallback plans: the old compile server branch, token-diffing incremental compilation, or perhaps a more flexible PCH design of some kind.

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.