I just managed to get through an ordinary make
of a real program (zenity, which for some reason is my standard test) using gcc
as a server. Yay! It is, as predicted, about twice as fast as an ordinary make
.
Now pile on the caveats.
I just managed to get through an ordinary make
of a real program (zenity, which for some reason is my standard test) using gcc
as a server. Yay! It is, as predicted, about twice as fast as an ordinary make
.
Now pile on the caveats.
I’ve been working steadily on the incremental compiler project these last couple of months, but not blogging about it. So, time for a status report.
I made a branch in the GCC repository, cleaned up my existing patches, and checked it all in. There’s a short web page describing the project on the GCC wiki (branch details there).
I’ve also written a rough draft of the support code for “decl smashing”, and also changed the parser to consider “per-declaration hunks”.
The idea behind per-declaration hunks is that, for code written by the user, we get better incremental behavior if we consider each hunk to hold a single declaration or definition, rather than the previous rule of “everything between two file change events”. The former is better because it means that a change is isolated to its containing declaration and uses of that declaration (and we can do even better here, perhaps, with a bit more analysis); the latter is better for system headers and the like because it has lower overhead.
Finally, I’ve started server-izing GCC. This part should be ready for real testing pretty soon.
Comcast had a trial offer where you could get a DVR machine rent-free for a year, so we dropped by their office and picked one up. These machines are as nice as people say — much, much friendlier than plain TV or VCRs. It only took a couple of weeks for it to change how I watch TV.
While playing with it, though, I’m reminded once again why I first turned to free software all those years ago. I’d like to be able to hack the machine a little… say, upload my favorite DVDs, or add more disk space, or get a second one and be able to share videos between the two.
Maybe I should get a Neuros OSD box. Anybody try one of these? Or of course I could set up MythTV, though that seems more expensive (given that I have basically no usable hardware).
The 2007 GCC Summit just ended; all that’s left is the wrap party. Lately I haven’t been too interested in writing up comprehensive trip reports (but if you liked that style, post a comment — I’d like to know), and instead I’m just going to write about the bits I found particularly interesting. Overall the summit was good; I think the proceedings are online if you want to read the papers.
I had a few personal goals for the summit and I think I’ve accomplished all of them. I wanted to meet up with the various GCC hackers I’ve met over the years and sort of get them used to the idea that I might be around in GCC-land a bit more than previously. This was the easy part — I talked to tons of people and met many more.
I also wanted to pitch my incremental compiler to anybody who would listen, and get feedback about the idea in general and about a few specific potential problem areas. Thanks here go to Ian Taylor and Alan Modra for some assembler and linker advice; and to Mark Mitchell and Nathan Sidwell for telling me some awful truths about C++.
Nobody really complained much about my plans, not even the evil part where GCC will generate a single object per top level definition, or the even eviler part where code generation is done by fork()ing the server and doing the work in the child, to avoid having to clean up a million global variables. Now I’m left hoping I’m not in a tragedy of manners, where everyone holds back their knowledge of impending doom.
Diego, appropriately, gave the opening remarks. He showed how GCC’s SpecINT performance has remained flat over the last five years, inducing widespread depression. SpecFP looks a bit better, though, and plus everyone can agree that GCC’s internals are much nicer than they used to be.
Most of the talks were about optimizations. This is fun stuff but not my current area of interest. There were a few talks touching on the idea of writing plugins for GCC — one paper discussing a custom version of Lisp, and another having impressive amounts of working code including Python bindings and a GIMPLE visualizer. I’m hoping one or both of these go into GCC eventually; I especially like the idea of writing custom code analysis passes in Lisp.
A couple talks were about debugging. I missed one of these but the other had Daniel Jacobowitz demonstrating gdb stepping backward in time. This is awesome stuff! … though at the moment there are several restrictions and not all the patches are available. Lots of people are concerned about GCC’s debuginfo, which I hear from many sources is rather sucky.
It’s a bit strange to embark on a project that seems so unrelated to the rest of the work in GCC. I keep doing mental management on myself so as not to obsessively worry about this. Anyway the GCC crowd is pretty nice and I’m not worrying about how people will react any more.
Several people asked me about gcj of course. I tried to be clear about the fact that the situation is murky: my own gcj hacking has dropped off precipitously, though I still review patches, and what to do about gcj really depends on the level of involvement of other parties in the future (it is hard to pick a time period here but I would say at least two major releases).
I’m headed to the GCC Summit this week, with a trip to the office in Toronto thrown in.
In preparation for the trip, I posted a short note about the incremental compiler project to the GCC list. These posts are always a bit stressful, since I spend a few hours checking email and waiting anxiously for flames. But, as usual, little happened; you can check the (non-) responses for yourself. Wouldn’t the wisdom not to worry be nice?
I also prepared by upgrading my laptop to Fedora 7. The pre-preparation for this was a pain — but it was also all my fault, since I’d installed an ill-advised mishmash of random things either for development testing (new Eclipse and gcj) or for my own enjoyment (Emacs 22).
Fedora 7 is quite nice. The fonts look prettier. The icons are prettier (except, strangely, the rhythmbox icon). powertop is cool. This time around I couldn’t find a CD download (only DVD), and I didn’t want to risk a yum upgrade, so I downloaded the network install CD, booted it, and did an upgrade that way. This went smoothly; my only complaint was that I picked the wrong mirror at first (one seemingly without the necessary tree) and spent some time checking and rechecking URLs before catching on. This could perhaps be done a bit more smoothly somehow.
I haven’t had a chance yet to use Fedora 7 in anger, but so far I’m already happy with the upgrade. It went smoothly and I already see benefits; for instance madwifi seems to work with NetworkManger now.
Andrew MacLeod had an interesting idea yesterday — make the compile server interact nicely with distcc
. I’ve been thinking about this a bit, and I don’t think there are any major problems with the idea. It would be pretty simple to run a compile server on multiple machines. Maybe it would make sense to ensure that a given file is always sent to the same server, but even this isn’t strictly necessary.
One possible problem is that this would not interact well with any indexing feature we add to the server. Also it would not allow cross-module inlining in all cases, or, worse, it might choose to inline or not depending on which jobs end up on which server — something would have to be done about that.
The point behind doing this is to try to scale to very large programs — larger than will fit in memory on your desktop.
Another idea in this space is to somehow serialize hunks to disk, GC “rarely used” hunks when memory is low, and then read hunks back in as needed. Whether this would actually be a win would depend, I think, on the speed of deserialization — it would have to be reliably faster than simply re-parsing (this is more likely if GCC adopt’s Ian’s proposed pointer-free “PC-rel-like” IR — but I don’t know how likely that is). I think I’ll wait for the LTO project to be a bit further along before investigating this any more; hopefully I’d be able to use LTO’s tree-reading and -writing capabilities to save and restore hunk contents.
Anyway, a distributed compile server is an interesting idea for the future. I’ll revisit it once the server is actually working.
I added “prerequisite” support to my front end prototype recently — this is code that ensures that a hunk’s contents are only re-used if all the things that the hunk depends on are also re-used. This turned out to be much simpler to implement than I had anticipated.
With this I still can’t compile the C front end with my compiler, because GCC has many ordering dependencies in its headers (ugly stuff — you should avoid this) and so ends up requiring the decl smashing fix. The decl smashing fix looks like a tricky project, especially given the multi-threading constraint, so I haven’t done it yet.
Meanwhile I took a simpler approach to trying to get some results from this test case: I reordered all the #include
statements in the C front end to ensure consistency. Yay hacks!
Even with the additional prerequisites bookkeeping, the results are quite nice, though a bit more modest than the zenity case:
Compiler | Time (M:S) | Memory at exit |
---|---|---|
Prototype | 0:49 | 43M |
Trunk | 1:29 | 83M |
The other day Roland asked me why PCH is not a bigger win. While discussing this I realized that I didn’t try to time the very best case for PCH. So, I looked at a couple more things.
First, I spent some time munging the output of -ftime-report
and making graphs with gnuplot. I thought it might be interesting to visualize the problem areas. All this told me, though, is that in my earlier tests, the PCH build still spent an inordinate amount of time in the parser.
Back when I wrote gcjx (my test case here), I tried to make it PCH-friendly. I made a master file, "typedefs.hh"
, which declared most things and which included a number of other files. But, it turns out, it didn’t include absolutely everything — which the "all.cc"
approach does. So, I made a .gch file that did include everything (not entirely trivial, as there are ordering dependencies between headers to consider) and re-ran my test. This did speed up the PCH case — but the all.cc
case is still much faster. A look at the graph here shows the same thing, namely that the PCH build is spending a lot of time in the parser.
I didn’t look much deeper than this. I’ve learned that PCH is not excellent even in its best case, but I don’t know why. Given PCH’s many restrictions, I don’t think it is too interesting to pursue this further. The approach I’m investigating is faster and does not have as many limitations as PCH. Also it will not require changes to your Makefiles.
Today I’ve been thinking about a problem I’ve run across in my compiler front end work.
The problem is a technical issue having to do with how the C compiler is implemented. Currently if there are multiple declarations for an object, the compiler might modify one of the declarations in place. This causes problems when re-using declarations. What if the second declaration is visible in one compilation unit but not another? Remember that since we’re saving object pointers here, other declarations that refer to the multiply-declared object will point to the first declaration — the one that gets modified.
I haven’t been able to think of a nice way to handle this that also lets us do multi-threaded parsing. We could redefine the various DECL tree accessors to forward requests to a per-compiler (aka thread) “current declaration” — but this is fairly nasty and, due to the nature of trees, probably also error-prone
Any ideas? I mean ones that don’t involve redesigning GCC? The old compile server branch planned to solve this via an undo buffer, but that is not very nice for multi-threaded use (something I plan to do that the compile server did not).
Sometimes I think this project would be better if I went ahead with the more ambitious project of eliminating trees from the front end… but that would both be more work and, I suspect, harder to get approved. But, at least that way I could design in a fix, since I’d be touching every line in the front end anyway.
I added the needed multi-hunk processing to my front end hack, to try to be able to compile parts of GCC. What this means is that if a construct (say, an enum) spans multiple file changes, the parser will now notice this and record it properly. If a subsequent parse sees the same starting hunk, it will check to see if subsequent hunks are the same as the previous chain, and if so, will map in all the bindings.
Unfortunately this was not the only problem — GCC also has an enum defined in two different header files, meaning that the token stream actually seen depends on the order of header inclusion. Gross stuff! Anyway, handling this was always on the agenda; but it means writing more code and so I am putting it off a little bit.
Meanwhile I compared my hacked compiler with a roughly equivalent unmodified version. I compiled zenity
(a Gnome program; the part I compiled is about 5KLOC according to wc
— but 475KLOC after preprocessing) with both, using the same command line arguments, including --combine
.
The results: the modifications are a big win:
Compiler | Time | Memory (used at exit) |
---|---|---|
Hacked | 11.98sec | 17M |
Trunk | 23.87sec | 62M |
The hacked compiler with --combine
is faster than running a make
on the same program (though to be fair I didn’t try to measure any of the build overhead — but it would have to be more than 4 seconds of overhead, since the normal build took 16.55sec).