Archive for October, 2007


Tom Fitzsimmons showed me Conkeror while I was up in Toronto this past week. It is chrome for firefox that acts like Emacs. It features Emacs-style key bindings, including M-x and prefix arguments, numbered links, and buffer-style browsing. Developer quote from the web site: “To make sure Conkeror remains pure, I do not own a mouse.”

I’ve been using this on my laptop for the last few days. It is a bit unpolished. And, I find I do still miss tabbed browsing — but hardly anything else. It is extremely simple to install… so come on, you know you want to.

Threaded GCC

This past week I spent playing around with making the C front end multi-threaded. What this means in particular is that the compile server makes a new thread for each incoming compilation request; the threads are largely independent but do share some results of parsing.

Parts of this work are, I think, reasonably clean. Making gengtype and the GC understand threads was pretty simple. The code to lock the global hunk map (this is where parsed bits of files are registered) was easy.

What is ugly is dealing with GCC’s many global variables. Cleaning this up is a multi-person, multi-year project — so I took the easy route and marked many variables (831 at last count) with __thread.

This works semi-well. There are some bugs, especially with higher -j levels, which means data races somewhere. Since this is not my most pressing project, I’m going to put this patch off to the side for a while.

Also I’ve been wondering whether the other GCC maintainers will really want to see this. For the most part the damage is isolated — the GC, the server, and the pieces of code that explicitly want to share things (for now this just means the C front end) need to know about threads. However, the thread-local variables are pretty ugly… if you’re a GCC maintainer, let me know what you think.

Meanwhile it is back to the grindstone of debugging C front end regressions that I’ve introduced. Also I’m looking at changing the debug output module so that I can defer writing any assembly output until fairly late in compilation. This is one step toward the incremental code generation plan.


Last night I had a long dream that I was working on the TV pilot for “Criss Angel Investor MindFreak”. As I recall he was rather difficult to deal with. I woke up tired.

External Refactoring

Many years ago I looked at changing Emacs to have an incremental garbage collector. An incremental GC requires a write barrier, which means that I wanted to insert some instructions at every point that mutated a lisp object. However, Emacs’ style at the time used macros to access fields of lisp objects, and these macros were used as both rvalues and lvalues. So, XCAR(x) would extract the car of a cons, but XCAR(x) = y would act as setcar.

Once you have code like this in C, pretty much your only choice is brute force: find every assignment and change it to use a new macro. One nice trick you can use to make the job simpler is to get the compiler to tell you the locations of all the assignments; you can do this by redefining XCAR to yield an invalid lvalue. Either way, though, you’re still in for a lot of typing.

In my current GCC project I’m running into a somewhat similar need. I want to make parts of GCC run multi-threaded. However, GCC has many global variables, which interact poorly with multiple threads, so something must be done about the globals.

The ideal solution would be to move globals into structures and change GCC to be a bit more object-oriented. Again, though, this is a lot of editing — for instance it would require adding a "this" argument to just about every function.

Problems like these are one reason I generally prefer C++ to C. In C++ you have options that do not involve massive editing.

In C++ the XCAR solution is simple: change the macro to return a new “car reference” object, ensure that this object has a conversion to a lisp object, and define an operator= which calls the incremental GC mark function in addition to modifying the car. With modern C++ compilers this should be as efficient as a macro, much less typing to implement, and (IMO) just as clear and maintainable.

The GCC solution still involves a fair amount of typing. Here I would turn existing functions into methods in a class, and move their global state into the class. The this argument will be invisibly supplied by the compiler, but in some cases I would still have to update calls to provide an object. Still, this is less work than updating every function definition and call, and parts (adding "classname::" to the definitions) can be mostly automated.

The idea behind this is that some C++ features, notably operator overloading, let you change the meaning of a piece of source text. I usually think of this as an ugly cousin of refactoring, useful when making large changes to existing code bases.

Friday GCC Update

I haven’t been blogging much lately, at least partly because I’ve been having fun hacking on GCC. Mark pointed out, though, that I should blog regularly about my progress so people know what is happening.

I’ve got the basic compile server running. Recently I’ve been fixing bugs in it. I’m not seeing the mind-blowing 2x speedup I once was; I need to look into that a bit. This doesn’t matter, though, because the real benefits will come from the next step, which is incremental code generation.

This week, to get away from the bugs a bit, I made GCC’s garbage collector thread-aware. Following the implementation methodology known as “Biting Off More Than You Can Chew”, I also dove in to making the whole front end work multi-threaded. So far this has yielded an evil-looking patch that marks a couple hundred global variables with __thread.

My current plan is to try to merge what I’ve got during the next Stage 1. My thinking is, the current code provides benefits, even if it not every planned benefit. It would be better to merge as frequently as possible — not just for testing but also because smaller patches are simpler to understand, and an early merge would get other GCC maintainers used to the new stuff.

Actually… why don’t I just post my road-map here

  1. Fix bugs. Make it bootstrap again.
  2. Clean up the hacks I put in. Implement the things I skipped (anti-dependencies, maybe semantic dependencies, fix PCH).
  3. Test on bigger programs.
  4. Make sure --enable-intermodule bootstrap works. Measure performance improvement.
  5. Stage 1 merge.

At this point there are multiple options for what to do next. I’m not sure what order I will tackle them — as the thread experiment shows, I’m prone to following what looks fun, in the absence of other constraints.

  1. Incremental code generation. May require linker changes.
  2. C++ front end. (I’m thinking we should make the front ends into shared libraries that gcc will dlopen.)
  3. Finish multi-threading work (if I don’t do this next week).
  4. If there isn’t one yet, finish incremental linker.

After this comes other fun stuff, like adding a way to query the server for information about your program. I’m thinking http…

I’ve also gotten interested in other aspects of GCC’s C front end. For example, it is not ideal for some kinds of static analysis because it folds too early. It might be fun to fix this.