Archive for the ‘software’ Category

Incremental Code Generation

In an earlier post I said that I’d eventually explain how incremental code generation works.

First, to recap: I’ve already explained how incremental compilation works in the front end. Stopping here would still yield some speedups — I regularly see 20% compile-time improvements on ordinary programs; sometimes much larger; and for C++ I expect the effect will be bigger.

But, for many compilations, particularly when compiling C using optimization, it is the back end which is the major time sink. And since we already know from the front end which functions have changed (i.e., have been re-parsed) and which have not, it seems like it should be easy to also run the back-end incrementally.

There are a few plausible ways to implement this. Essentially, we can preserve the compiled form of the program at any point in the pipeline. By choosing a point after most of the work is done, we can minimize the time spent needlessly recompiling.

My current plan is to do the work at the very latest point possible, by writing out object files. The idea here is that each top-level definition in a translation unit — each global variable or function — will get its own tiny object file. Then, these files will be linked together to form the object file that the user actually asked for.

Of course, the current linker is slow and not at all suitable for this sort of thing. So, we’ll use a new, fast, incremental linker — which we’ll need anyway, in order to preserve incrementality through the entire compile cycle. (It isn’t much good to have a fast incremental compiler if you still end up waiting for the linker all the time.)

This approach has the nice property that the gcc work is relatively easy, for the most part. I don’t know yet whether this outweighs the work that must be done elsewhere, so I’m going to implement a prototype and find out. In fact, this is the next major milestone (though it will be a while, because I’m in a bug-fixing phase).

Another fun possibility here is that we can try to be smart about managing the cache of small object files. In particular, we may be able to preserve some benefits of incrementality even across restarts of the server. For C++, this could include template instantiations.

One drawback of the plan is that it may make dealing with debug info somewhat tricky. This ties in to how we compute hunks; if we don’t include source location in the hunk checksum, then we can end up with objects that lie to the debugger about where their corresponding source appears. We’d want to fix that up somehow, but that is not simple. I’m still thinking about this particular sub-problem.

Another oddity is handling inlining. In my planned implementation, gcc would fork and run code generation in a subprocess. (This lets me avoid excessive modifications to gcc.) But, if we inline a function then the compile server needs to know about this in order to properly re-compile when the inlined function is changed.

Future posts should include information about computing hunk boundaries (pretty short, but maybe I’ll touch on ideas for C++), multi-processor support, and maybe some speculation about scalability.

Replacing Automake

Inspired by Ian, this past weekend I took a look at replacing the auto tools with GNU make code.

Replacing automake turns out to be very easy. And, given that GNU make is everywhere now, I think it is well past time to do this.

It took about 250 lines of code to get C and C++ compilation — with dependency tracking — install, clean, libraries, and programs working.

Of course, this is just the beginning. There’s still shared libraries (ugh), dist, distcheck, and a host of smaller things.

Still, the result is quite nice. The resulting user Makefiles are pretty similar — but, unfortunately, not identical. With a bit of help it could be ready for real use by Christmas. Write me if you’re interested.

The long term goal, of course, is to unify all three tools. I’ve also got some prototype GNU make code to do some configure-style checking, based on some earlier prototyping by Ben Elliston. This is less fully baked, and there are some problems I haven’t solved. But, I think the idea is sound.

In my view the plan would be to finish automake and libtool functionality in the new code, then move on to adding configuration support. The latter can be done incrementally. In the end, we will have replaced all the tools with something better, without undue disruption.

The most important decision — what to name it — is still open. Submit your ideas.

Anti-dependencies

Last week I implemented anti-dependencies in the C front end. This wasn’t difficult enough to write about, but it provides an excuse to describe dependencies and incrementalism in general.

The fundamental idea in speeding up the front end is that it should avoid parsing whenever possible. If we’ve already parsed a function definition in some earlier compilation, it is faster to reuse that definition than to try to parse it again. (At least, that is true if noticing that we can reuse it is faster than parsing. But it always is. Note also that this will be a much bigger win for C++ than for C.)

Reuse works by computing a checksum of a run of tokens (one of these runs is called a “hunk” in the implementation, a name that vaguely embarrasses me). If we’ve seen the checksum before, we reuse the parsed results, which we store in a table.

Of course, the situation is more complicated than that. A bit of code may depend on other bits of code, for instance:

typedef int myint;
extern myint variable;

Here we can’t reuse variable in isolation — we can only reuse it if we’ve already reused the previous line. So, each hunk carries with it some dependency information. Before last week, this was just a list of all the checksums of hunks on which the current hunk depends.

Even this dependency tracking is not enough for proper operation. Suppose we already compiled this program:

struct x;
struct x { int field; };

… and then try to compile:

struct x;
struct x { int bug; };
struct x { int field; };

With a naive dependency scheme, this would compile without error, because we would erroneously reuse the second definition. After all, all its dependencies have been met.

Anti-dependencies are the fix for this. When considering a hunk for reuse, we check not only that its dependencies have been met, but we check that any declarations that we might reuse have the correct “prior value”. In fact, these two checks amount to the same thing; rather than record the checksums of prerequisite hunks, now we simply record declarations.

Given all this, how will incremental compilation work? My current plan is not to introduce any sort of special case. Instead, the parser will notify the backend when it parses a function. A function which comes from a reused hunk will simply not be compiled again.

If you think about it a bit, you’ll see that the prerequisites form a dependency model of a compilation unit. If the developer changes a declaration, anything using that declaration will be invalidated, and this invalidation will itself propagate outward. So, the amount of recompilation will be proportional to the “importance” of the change. Change a central typedef, and many things may need recompilation. Change a comment, and nothing will (though handling debug info properly may be a challenge).

A future refinement is the idea of a “semantic dependency”. The idea here is that, rather than check prerequisite declarations by identity, check them by ABI compatibility. There are a few tricks here, so this is a bit lower on my priority list for the moment.

You might have wondered how we compute the boundaries of a hunk, or how we can actually avoid recompiling functions. I’ll answer both of these in future posts.

Fedora 8

Anthony was in town this past Thursday, and after talking to him I was inspired to follow his example and upgrade to Fedora 8. Like him, I did a live upgrade; but I only upgraded my laptop, which was running Fedora 7. My main machine still runs 6… I’m more cautious about upgrading it, but after playing with Fedora 8 a bit, I can feel the pressure rising.

The biggest change I’ve noticed so far is the inclusion of IcedTea. I’m happily using the browser plugin for the all-important chess applet.

Speaking of which… I’ve been looking at other brower plugins lately. Any reports on the flash support in Fedora 8? I’m using the (boo hiss) proprietary flash plugin right now. Also, any experiences with Firebug or Adblock?

I’m impressed with the Fedora maintainers. They manage to release a solid operating system every six months… not an easy task. Upgrades for me have always gone pretty smoothly; I’ve done a mix of upgrades from CD and yum upgrades, and haven’t been burnt (a couple minor singes, but always known bugs with simple fixes). Also, each release has had some compelling benefit making me want to upgrade.

GCC 4.3 Hacking

I haven’t talked about the incremental compiler in a couple of weeks — first I was out of town, and then I was sick. And then yesterday, I put it off… I don’t want to be that way, but the truth is for the last couple of weeks I haven’t been working on this project much.

Instead, I’ve been doing a bit of work fixing bugs on the trunk, to help make it so that GCC 4.3 can be released in a timely way. I don’t really know much about most of GCC, though, so I’ve pretty much been working on only the bugs in parts I do know: more or less the C front end and the preprocessor.

Working on bugs this way is a humbling experience. Last week I think I fixed four bugs. Looking through the ChangeLog, I think Jakub fixed twenty. Hmm… I don’t even know how he can do that many bootstrap-and-check cycles in a week.

I also put a bit of work into making GCC emit better column number information — there’s some new location code that is more memory efficient (saves about 5%) and that enables column number output. Unfortunately the parsers and constant folders and debug output generators know nothing about columns; and fixing this is a big enough job that it won’t happen in 4.3.

The more I work on parts of GCC like this, the more I realize how awesome gcjx really was, if I may say so myself. GCC has several bad design decisions baked into things as basic as location handling. Sad. It will be a lot of work to clean this up… and when I look at GCC I think my eyes are bigger than my stomach. Or in other words, I need help.

I did do a little work on the incremental compiler, starting this week: I did a few cleanups to undo breakage I introduced earlier. My plan is to merge what I’ve got to the trunk during the next Stage 1. So, I’m cleaning up the little details I intentionally ignored during the experimentation phase.

My thinking behind the merge is that, first, the old compile server partly failed due to being on a branch too long, a mistake I don’t want to repeat; and second, even though this code does not do everything, it is reaching a good milestone, and at the very least it greatly speeds up --combine builds. I’ve heard that the Linux kernel uses combine; benchmarking this is on my to-do list.

Conkeror

Tom Fitzsimmons showed me Conkeror while I was up in Toronto this past week. It is chrome for firefox that acts like Emacs. It features Emacs-style key bindings, including M-x and prefix arguments, numbered links, and buffer-style browsing. Developer quote from the web site: “To make sure Conkeror remains pure, I do not own a mouse.”

I’ve been using this on my laptop for the last few days. It is a bit unpolished. And, I find I do still miss tabbed browsing — but hardly anything else. It is extremely simple to install… so come on, you know you want to.

Threaded GCC

This past week I spent playing around with making the C front end multi-threaded. What this means in particular is that the compile server makes a new thread for each incoming compilation request; the threads are largely independent but do share some results of parsing.

Parts of this work are, I think, reasonably clean. Making gengtype and the GC understand threads was pretty simple. The code to lock the global hunk map (this is where parsed bits of files are registered) was easy.

What is ugly is dealing with GCC’s many global variables. Cleaning this up is a multi-person, multi-year project — so I took the easy route and marked many variables (831 at last count) with __thread.

This works semi-well. There are some bugs, especially with higher -j levels, which means data races somewhere. Since this is not my most pressing project, I’m going to put this patch off to the side for a while.

Also I’ve been wondering whether the other GCC maintainers will really want to see this. For the most part the damage is isolated — the GC, the server, and the pieces of code that explicitly want to share things (for now this just means the C front end) need to know about threads. However, the thread-local variables are pretty ugly… if you’re a GCC maintainer, let me know what you think.

Meanwhile it is back to the grindstone of debugging C front end regressions that I’ve introduced. Also I’m looking at changing the debug output module so that I can defer writing any assembly output until fairly late in compilation. This is one step toward the incremental code generation plan.

External Refactoring

Many years ago I looked at changing Emacs to have an incremental garbage collector. An incremental GC requires a write barrier, which means that I wanted to insert some instructions at every point that mutated a lisp object. However, Emacs’ style at the time used macros to access fields of lisp objects, and these macros were used as both rvalues and lvalues. So, XCAR(x) would extract the car of a cons, but XCAR(x) = y would act as setcar.

Once you have code like this in C, pretty much your only choice is brute force: find every assignment and change it to use a new macro. One nice trick you can use to make the job simpler is to get the compiler to tell you the locations of all the assignments; you can do this by redefining XCAR to yield an invalid lvalue. Either way, though, you’re still in for a lot of typing.

In my current GCC project I’m running into a somewhat similar need. I want to make parts of GCC run multi-threaded. However, GCC has many global variables, which interact poorly with multiple threads, so something must be done about the globals.

The ideal solution would be to move globals into structures and change GCC to be a bit more object-oriented. Again, though, this is a lot of editing — for instance it would require adding a "this" argument to just about every function.

Problems like these are one reason I generally prefer C++ to C. In C++ you have options that do not involve massive editing.

In C++ the XCAR solution is simple: change the macro to return a new “car reference” object, ensure that this object has a conversion to a lisp object, and define an operator= which calls the incremental GC mark function in addition to modifying the car. With modern C++ compilers this should be as efficient as a macro, much less typing to implement, and (IMO) just as clear and maintainable.

The GCC solution still involves a fair amount of typing. Here I would turn existing functions into methods in a class, and move their global state into the class. The this argument will be invisibly supplied by the compiler, but in some cases I would still have to update calls to provide an object. Still, this is less work than updating every function definition and call, and parts (adding "classname::" to the definitions) can be mostly automated.

The idea behind this is that some C++ features, notably operator overloading, let you change the meaning of a piece of source text. I usually think of this as an ugly cousin of refactoring, useful when making large changes to existing code bases.