Archive for the ‘software’ Category

Gold is released

Ian Taylor checked in the long-awaited “gold“. Gold is a new ELF-only linker written in C++. It is designed for performance and is much faster than the current binutils ld.

I’m very happy about this for a few reasons. First, we’ve needed a new linker for a long, long time. Second, this will help the incremental compiler.

I looked through the gold sources a bit. I wish everything in the GNU toolchain were written this way. It is very clean code, nicely commented, and easy to follow. It shows pretty clearly, I think, the ways in which C++ can be better than C when it is used well.

Congratulations, Ian!

Compile Server Scalability

There are a few aspects to compile server scalability that are important to address.

First, and most obviously, memory use. Because we want to be able to send real programs through the compile server, and because we want it to remain live for relatively long periods of time, it is important that memory use be “acceptably bounded”. Naturally, the server process will grow with each additional compilation unit. At least in the straightforward implementation, there’s no way around that (but see below). However, it is important that the server not leak memory, and that recompilations generally not increase memory use. Also, ideally, all that work on decl sharing will keep memory use in check.

For the most part, this did not take any effort to achieve. GCC has a built-in garbage collector, and most nontrivial data structures are allocated using the GC. This is not a silver bullet, of course, but it has yielded good results with little effort in practice.

In the case of recompilation, we employ a simple heuristic — we store all parsed hunks keyed off the name of the requested object file (note: not the input file; it is common for a project to compile a given source file multiple times, but it is rare to see the same object file name more than once). When recompiling an object, we assume that there will be a lot of reuse against the object’s previous version, so we store those hunks temporarily, but then discard the old ones at the end of compilation. This way, we reuse, but we can also free hunks which are no longer in use.

Results from a few tests are very encouraging here. I compiled gdb with the compile server, then deleted the object files and re-compiled. Memory use (as reported by -fmem-report) stayed flat at around 51M — meaning that recompilation doesn’t grow the image, and the collection approach is working as desired.

I also built gdb using the compiler in “normal” mode, and looked at the -fmem-report totals. If you sum them up, which I naively expect gives a rough idea of how much memory --combine would use, you get 1.2G. Or, in other words, decl sharing appears to make a huge difference (I’m not completely confident in this particular number).

If memory use does become a problem for very large compiles, we could look at scaling another way: writing out hunks and reading them back in. Maybe we could use machinery from the LTO project to do this. This would only be useful if it is cheaper to read decls via LTO than it is to parse the source; if this is not cheaper then we could instead try to flush out (and force re-parsing of) objects which are rarely re-used. One special case of this is getting rid of non-inlineable function bodies — when we have incremental code-generation, we’ll never compile a function like that more than once anyway.

Another scalability question is how to exploit multiple processors, either multi-core machines, or compile farms. In an earlier post, I discussed making the compile server multi-threaded. However, that interacts poorly with our code generation approach (fork and do the work in the child), so I am probably not going to pursue it. Instead, for the multi-core case, it looks straightforward to simply run multiple servers — in other words, you would just invoke “gcc --server -j5“. Something similar can be done for compile farms.

An ideal result for this project would be for small changes to result in compilation times beneath your perceptual threshold. I doubt that is likely to happen, but the point is, the absolute turnaround time is important. (This is not really a question of scalability, but I felt like talking about it anyway.)

In the current code, though, we always run the preprocessor for any change. So, even once incremental code generation is implemented, the turnaround time will be bound by the time it takes to preprocess the source. This might turn out to be a problem.

In an earlier design (and in some other designs I have heard of), this is handled by making a model of compilation that includes preprocessing. That seems too complicated to me, though, and instead I think that it should be possible to also make an incremental preprocessor (say, one that uses inotify to decide what work must be re-done), and then use it without excessive cooperation from the parser.

Python and Gdb

Recently I’ve been hacking on integrating Python scripting support into gdb. For years now I’ve been wanting better scripting in gdb, but until I saw Volodya’s patch I never did anything about it. So, the other night I made a git repository (thanks gitorious!) and started hacking away. Thiago Bauermann did some nice updates on Volodya’s value-inspecting code, too.

A decent number of things are working. See the wiki page for details on cloning the repository.

Since I basically live in Emacs nowadays, I wanted to install the Python documentation in info form. Am I the only person who still loves info? It isn’t beautiful, to be sure, but it is amazingly convenient inside Emacs — simple to navigate, call up, and dismiss; with info-lookup it can function as a low-rent context-sensitive help; no messy fussing with the mouse.

Anyway, I couldn’t find this readily available anywhere, so in the end I checked out python myself and built the docs. That was sort of a pain… I’m half considering making an ELPA package out of the info pages. Come to think of it there are probably a number of potential info-only packages out there.

Tools Happenings

There are some fun things going on in the tools arena right now.

Do you read Taras Glek’s blog? He’s working on GCC Dehydra, which lets you write GCC passes in javascript. I think his work is one of the most interesting developments in the GCC world today.

There are a few similar projects in the works right now. The plugin branch lets you write GCC plugins; the authors of this branch have a Python plugin, but as far as I’m aware this is not publicly available.

On a slightly more idiosyncratic note, Basile Starynkevitch made a branch for his MELT project. This is also a plugin system, but it uses a lisp dialect for which he’s written his own lisp-to-C translator. I’m kind of partial to this idea — I think it would be fun to write parts of GCC in lisp, at least if it compiled down to something reasonable.

I’m quite interested in pushing GCC more into source analysis and refactoring. Right now the front ends have some problems that make this difficult, but I think these are surmountable without too much trouble. Maybe when I finish this pesky incremental compiler…

With all this going on I wonder why GCC-XML is not in the tree, at least on a branch.

Vladimir Prus recently made available his prototype which integrates Python into gdb. This is promising work — we’ve needed something like this for years. Maybe we can finally print complex data structures in sensible ways.

Finally, Benjamin Kosnik has checked in a restructuring of the libstdc++ documentation. I browsed the new stuff a bit and I found it much simpler to navigate. I’m very happy about this; good, clear documentation is critical to the success of free software.

Using Quagmire

Over the last two weeks I’ve spent some time hacking on Quagmire. I’ve tried to add the features I think are most commonly needed, and I think now it is ready for early adopters to come on board. It isn’t at feature parity with Automake, but it does implement a large subset of Automake’s functionality:

  • Build and install C/C++ programs and static libraries, with Automake-style dependency tracking.
  • Initial (aka, gcc-only) shared library support. (This is still fairly lame… “libtool in two lines of code”. My current plan here is to do something much more minimal than what libtool provides.)
  • Automatic support for the standard clean targets.
  • Install and uninstall, including DESTDIR.
  • dist and distcheck support (a bit incomplete; and FWIW this is just ripped right out of Automake).
  • Support for some minor (but standard) targets: TAGS, texinfo stuff, rebuilding configure-generated files.

It also has some features that Automake does not. One long-requested Automake feature is to be able to turn off its verbose output; Quagmire does this dynamically depending on whether -s was passed to make.

Quagmire also has some initial support for build-time function and header checks, and pkg-config support. This is not fully baked, but fun to play with. One way this is nicer than using configure is that if you add a new function to the list, the next invocation of make will run that test only.

There’s some example code in the repository that shows how to use most of the features.

Currently Quagmire does some things differently from Automake. For instance, it does not use directory prefixes for things like PROGRAMS. However, I recently figured out that it could (and in some situations, like _DATA, this is really what you want), so I’m wondering whether I should change all this to follow Automake more closely. Let me know what you think.

Also I’ve been wondering about whether it would be appropriate to post an announcement of Quagmire on the Automake mailing list.

Project Quagmire

This weekend I finally registered project quagmire with Google’s code hosting service. I’ve checked in all my initial .mk files, plus a little bit of documentation.

I’m a little nervous about posting this… Quagmire is still very immature, and I don’t want expectations set too high. There is still a lot to do! But, the good news is, you can easily help set the direction.

I suppose I will set up a Google group for Quagmire eventually. What do you think? Meanwhile, edit the Quagmire wiki with your questions, comments, and ideas.

Emacs Tip: gtk-look.el

One thing I like about Emacs is that when I ask for information, the default case is that it is displayed in a section (which Emacs calls a “window”) of the window (which Emacs calls a “frame”) that I’m already focused on. What’s nice about this is that if I ask for, say, help on a symbol in C mode, the information will show up, but I don’t have to look away, or worry about a new window stealing the focus, or move the mouse around. Also, I can easily dismiss the display without moving my hands from the keyboard.

Do you know about help on symbol in C mode? I didn’t know about this until last week. If you are in a C mode buffer, you can type C-h S, aka info-lookup-symbol, to look up help for a given C symbol. This works by digging through the libc info pages… this isn’t perfect, but it is pretty handy. (And, yes, I’m one of the few people who really loves info and uses it every day. I wish all the documentation on my system were available in this format — plus one or two others, yet to be invented, which would allow better IDE-ish features in Emacs.)

On a related note, a couple days Kevin Ryde sent me the latest version of gtk-look.el for uploading into ELPA. I’ve been meaning to try this for a while, so I finally installed it. gtk-lookup-symbol is similar to info-lookup-symbol, except it uses the devhelp documentation libraries as its source. Great stuff!

Naturally, I could just use devhelp, which is a very nice tool. Unfortunately, it is a very nice tool with one major drawback — it makes a new window, with all the attendant problems. And, by default, gtk-look suffers from this problem as well; the docs show up in firefox.

Luckily, Emacs is absurdly configurable, and this problem is solved by installing the w3m-el package (on Fedora) and configuring browse-url-browser-function to display devhelp files in w3m:



(setq browse-url-browser-function

    '(("file:.*/usr/local/share/gtk-doc/html" . w3m-browse-url)

     ("file:.*/usr/share/gtk-doc/html" . w3m-browse-url)

     ("." . browse-url-firefox)))

With this I can easily view the help for gtk, glib, etc, functions in the Emacs-y way that I prefer. If this sounds good to you, you’re only seconds away from installing it via ELPA.

Now what I need is c-lookup-symbol-dwim, which searches all the databases. This shouldn’t be too hard…

Hunk Boundaries

How do we decide the boundaries of reuse in the incremental compiler?

The one crucial thing when looking at reuse is that the signature that you choose must be stable across compilations. And, it is better if the boundaries are cheap to compute and roughly line up with declaration boundaries. In the prototype implementation, I accomplished these by using file change events (conveniently noted by libcpp) as hunk boundaries. File change boundaries are roughly stable (moreso for system headers), are free (libcpp already computes them), and very frequently align with declaration boundaries. This last criterion is typically only broken by constructs where a file is included in the middle of a declaration — for instance, in gcc itself by including a .def file in the middle of an enum, to populate it. I added a special case to the reuse code to account for these situations.

Aside from the convenience — which is very important for prototypes, but much less so for production — there was a historical reason that I chose file boundaries. When I first started thinking about writing an incremental compiler, I focused primarily on header files, and on having a single model which would encompass both the preprocessor and the compiler. After working through a number of cases, with the help of the prototype, I decided that this was “java thinking”; in C the primary unit of reuse is the declaration — not the file. And, it is much, much simpler to reason about C if you look at it after preprocessing.

In earlier posts I described the dependency computation that is needed for sharing declarations. One consequence of this is that the bigger the unit of reuse, the less often it will actually be reusable. At one extreme is gcc’s current PCH implementation, which requires a single header file to be included first in all compilations. With the file-boundary scheme, it turns out that even relatively ordinary programs fail to reuse in many situations. For instance, some headers in gcc have conditional declarations — so, change the order of includes, and an entire header’s worth of declarations must be re-parsed.

Note that failure to reuse is not a bug. Reuse is, after all, a performance and memory optimization. The worst consequence is that gcc will run as slowly, and will use as much memory, as it does now. Still, reuse is desirable. The more we reuse, and the smaller our units of reuse, the less work we have to do when recompiling.

So, this leads to the second implementation, which tries to guess declaration boundaries without parsing. This fast, if somewhat evil-looking, and it yields good results. This still isn’t perfect, though (it does not handle K&R-style definitions properly), so I will probably rewrite it again to do something closer to a quick parse.

I’m not sure whether this same approach makes sense for C++. I assume I will start here, because I understand it and know that it will work at least minimally. But, C++ brings its own problems. For instance, suppose you have an inline function defined in a class definition. In this case, if you change the body of the function, you’d like to be able to reuse the class definition, if possible — otherwise, many kinds of small changes will yield costly compilations. Tom Fitzsimmons suggested that perhaps the hunk computation can be made recursive in some way, meaning that perhaps it could operate inside of classes and namespaces, and not just at the top level. I’ll definitely be researching this once I start on the C++ front end.

The split between front and back end incremental processing means that each front end is free to implement its own approach to incrementality. I’m only looking at C and C++, but other people are free to hack on the other front ends. The actual changes to the middle-end interface are quite small, as it turns out.

What next? I think future posts will cover scalability, multi-processor support, and perhaps what to do about the C preprocessor. If there’s anything you want to know more about, post a comment and I will address it.

Incremental Code Generation

In an earlier post I said that I’d eventually explain how incremental code generation works.

First, to recap: I’ve already explained how incremental compilation works in the front end. Stopping here would still yield some speedups — I regularly see 20% compile-time improvements on ordinary programs; sometimes much larger; and for C++ I expect the effect will be bigger.

But, for many compilations, particularly when compiling C using optimization, it is the back end which is the major time sink. And since we already know from the front end which functions have changed (i.e., have been re-parsed) and which have not, it seems like it should be easy to also run the back-end incrementally.

There are a few plausible ways to implement this. Essentially, we can preserve the compiled form of the program at any point in the pipeline. By choosing a point after most of the work is done, we can minimize the time spent needlessly recompiling.

My current plan is to do the work at the very latest point possible, by writing out object files. The idea here is that each top-level definition in a translation unit — each global variable or function — will get its own tiny object file. Then, these files will be linked together to form the object file that the user actually asked for.

Of course, the current linker is slow and not at all suitable for this sort of thing. So, we’ll use a new, fast, incremental linker — which we’ll need anyway, in order to preserve incrementality through the entire compile cycle. (It isn’t much good to have a fast incremental compiler if you still end up waiting for the linker all the time.)

This approach has the nice property that the gcc work is relatively easy, for the most part. I don’t know yet whether this outweighs the work that must be done elsewhere, so I’m going to implement a prototype and find out. In fact, this is the next major milestone (though it will be a while, because I’m in a bug-fixing phase).

Another fun possibility here is that we can try to be smart about managing the cache of small object files. In particular, we may be able to preserve some benefits of incrementality even across restarts of the server. For C++, this could include template instantiations.

One drawback of the plan is that it may make dealing with debug info somewhat tricky. This ties in to how we compute hunks; if we don’t include source location in the hunk checksum, then we can end up with objects that lie to the debugger about where their corresponding source appears. We’d want to fix that up somehow, but that is not simple. I’m still thinking about this particular sub-problem.

Another oddity is handling inlining. In my planned implementation, gcc would fork and run code generation in a subprocess. (This lets me avoid excessive modifications to gcc.) But, if we inline a function then the compile server needs to know about this in order to properly re-compile when the inlined function is changed.

Future posts should include information about computing hunk boundaries (pretty short, but maybe I’ll touch on ideas for C++), multi-processor support, and maybe some speculation about scalability.