Archive for the ‘software’ Category

Sharing Parsed Headers

One component of the incremental compiler is re-using parsed header files. In fact, for C++ this is where I expect to gain performance improvements in the “normal compilation” case. A couple weeks ago I realized that I could easily try this idea, for C, by implementing parts of it and then testing it using --combine. I took a break from removing globals from the C front end and wrote a trial implementation this week.

This code works by conceiving of a C compilation unit as a sequence of “hunks”, currently defined as “tokens between two file change events”. Then it computes the checksum of the tokens in each hunk. (In order to do this I had to change the C compiler to lex the whole compilation unit at once — this was easy and makes it better match the C++ compiler.)

While parsing a hunk for the first time, we note all its top-level declarations by hooking into bind(). We save these for later.

Then in the top-level parsing loop, if we notice that the next hunk matches a hunk in our map, we simply copy in the bindings we saved, set the token pointer to the end of the hunk, and carry on.

There’s some extra bookkeeping here — we only want “top level” hunks, not ones where we saw a file change event in the middle of a function or structure or something; and we can only reuse a hunk if the declarations used by the code in the hunk were also reused. I haven’t yet implemented the latter case, but I’ll have to since I want to test this on GCC itself, and GCC uses constructs like this.

I was able to run it on a smaller program (zenity). We skipped parsing 1.4 million tokens in this program. Naturally, whether or not this is a performance win (I didn’t look at that yet, and honestly for C I am not expecting it) depends on the overhead of the bookkeeping. However, it should reduce memory use, making IMA a bit more practical. And, it gives a good idea of the kind of textual redundancy common in C programs.

Note that this is experimental. I don’t like --combine all that much, since it requires users to change their Makefiles, and because it doesn’t work with C++. However, it provided a simple testbed.

idutils

I love hearing the odd details of how other people work. Usually I’m hoping to learn something that will help me be more productive, and, if not, to hear about something delightfully awful, convoluted, and obscure.

Talking to Roland at JavaOne yielded a nice mix of both. One interesting thing is that he uses full-text search on his source and basically (AIUI) doesn’t run grep from inside Emacs. I, on the other hand, was a long-time igrep user, switching to rgrep (new in Emacs 22) recently.

I decided to give Roland’s approach a try.

First I took a look at namazu, since I knew it had some sort of Emacs integration, and because it is packaged in Fedora. Unfortunately it doesn’t seem to understand how to index source code.

So then I tried GNU idutils. I had heard of this years ago, and even tried it once or twice, but somehow it didn’t end up in my repertoire. Anyway, it turns out that this is a pretty nice tool. It isn’t packaged for Fedora, but building it only took a couple minutes. The Emacs support code needed some tweaks (email me if you want this), but works ok enough out of the box.

The result is pretty nice. For small projects it may not matter much, but searches over the code I’m indexing (GCC) are much, much faster. This is especially nice since the work I’m doing on GCC at the moment requires repeated searches for all the users of various global variables.

Idutils could use a little love — right now there’s no way to incrementally update the index, which is a pain. If it affects me enough I suppose I’ll set up a cron job or something using inotify. This is just gloss, though. The important thing is that it has removed a pointless wait from my process.

Front End Stuff

I’ve been digging through GCC’s C front end the last couple of weeks, trying to understand how it works and how feasible some parts of my project are.

The parser itself is nice. It is a hand-written recursive descent parser, which means that it is “just plain code” and thus simple to debug. It is written in a typical object-oriented style, where every function in the parser takes a “c_parser” object pointer as its first parameter.

Unfortunately the rest of the C front end is not so nice. There are many (around 200) global variables, and even though the parser itself is quite clean, it implicitly relies on these via functions elsewhere in the front end. This means that making the front end reentrant is going to be a fair amount of work. While much of this will be mechanical, there are some ugly things, like places where lang hooks (a sort of virtual function in the compiler) are designed in a way that implicitly requires a global variable.

I haven’t looked deeply into the C++ front end yet, but a quick look with nm shows that there are about 100 globals there as well.

Making the parser re-entrant is sort of a side plot for me: it will help with part of my project, but it isn’t on the critical path. Still, I’d like to see the C and C++ front ends eventually be written in “gcjx style”: thread-aware reusable libraries, written in an OO style, separated from the back ends.

So why work on this at all right now? I didn’t know the C front end at all, and I thought this would be a way to learn about it without getting bored just reading code. This plan is working pretty well. I’ve also spent a bit of time simply stepping through various parts of the compiler with gdb — another useful technique for learning a foreign code base.

Meanwhile I’ve been working on the design of the more crucial parts of the plan — namely, re-using parsed headers and providing incremental recompilation. My current hope is to have something ready for critique before the GCC Summit.

Word Swirl

Here’s Word Swirl, a dictionary-based game you can play in your web browser. You get points for every word you can find, and if you find a word that uses all the letters you will be able to go to the next round. You can use the space bar to rearrange your letters.

My sister says this doesn’t work in IE — sorry about that. I don’t have Windows here and so I don’t know a way to debug it. If you know what is wrong, let me know. I suppose real web developers must run Windows, or they must have 2-3 machines on their desktop at any given time.

It may take a little while to download, since it downloads its entire dictionary at once. Word Swirl is GPL.

EMMS

This weekend I installed EMMS, the Emacs multimedia system. I also packaged it for ELPA, but I haven’t uploaded it yet — expect it soon.

EMMS is fairly primitive, but that doesn’t matter to me. I barely use any features of existing music players, I typically just play a lot of songs in a row and ignore the UI. So why move this into Emacs? Just so I can start and stop music without moving my hands to the mouse. At this point I’m almost completely living in Emacs, to the extent that I’ve started trying to migrate all my shell use there as well (changing a 15 year habit of having a couple terminals open — not easy).

I did hook EMMS to my hacked zenity (aka Emacs notification area) code. This, I suppose, goes against the “mouse-free” approach. But first, these notification area hacks are fun (and now more than half of the notification icons on my desktop are Emacs-related — haha), and second sometimes my hand is already on the mouse, and it is sometimes nice to be able to pause that way as well.

Emacs 22 is officially released, in case you didn’t hear. It has an absurd number of new features and new packages — a couple IRC clients, a new gdb UI, Tramp (a way to access files via ssh or — very useful — sudo), Calc (by the legendary Dave Gillespie), and a bunch more.

I haven’t published my “project.el” (per-project settings done in an easy way) yet, but that will come soon. I also started hacking on a keyring for Emacs, to save all those pesky Tramp and ange-ftp passwords. The base code works but I need to wire it in to all the places that ask for a password — I suppose this should really go into the Emacs core. ELPA-wise, work continues and I upload a couple packages a week. I started looking at uploading DVC — I think I want to start getting larger, more useful packages into the repository.

Web 2.0.3

Every time I read about a new Web 2.0 application, or Y Combinator boot camp, or anything ending in “oogle”, I lose a little piece of my mind. Over time this adds up and I start thinking strange, crazy thoughts, like “maybe I should rewrite Automake as a set of GNU Make include files”.

Occasionally in these moods I brainstorm web site ideas.

For instance, how about vbeer.com, for all your virtual beer-owing needs? Instead of saying “I owe you a beer”, say it via our new social networking site, dedicated to beer redistribution. It has an irc bot so you can /msg your thanks as well. vbeer connects to the free software calendar of events, and will show you a printable chart of how many rounds you have to buy, for whom, at the events you’ll be attending. Version 2 will include some code for (optional) transitive beer forwarding

More seriously, I keep wishing there were a site for “sensitive programmers” (titled perhaps after Elaine Aron’s book, which btw is pretty good) — say a sort of support group with ideas on how to avoid excessive emotional bruising in a field that, in my view, over-tolerates assholes

Finally, I know this is crazy, but what about a web site dedicated to making it easy to download and install random bits of Emacs Lisp code? Oh, wait, I did that. (And if you’re into this at all, there’s an amusing and somewhat strange discussion of package.el on the Emacs devel list.)

C++ noodling

A while back I spent a little time playing with g++, trying to understand compilation performance. For this experiment I tried to measure how much speedup a “model-based” compiler could expect to achieve.

I compiled my test program a few different ways. I timed compilation of a plain build (more or less a “make”) both with and without PCH. And, I timed compilation of the “all.cc” approach — I’ve heard many times over the years that C++ shops will cat their sources together into one big compilation unit, and that this reduces overall build times.

So far I’ve only done this experiment by compiling gcjx, a moderately sized, fairly ordinary, C++ program with which I am familiar. I plan to redo this with a couple other programs as well (send suggestions).

The results:

Approach Time
Plain build 13m50s
Plain + PCH 8m05s
all.cc 3m18s
all.cc + PCH 3m17s

This basically conforms with things I’ve found previously, but I was a little surprised that PCH is not a bigger win than it is, especially considering that I intentionally wrote gcjx with one big header file that includes most things, precisely to make better use of this feature. (I’ve heard from other folks that in some situations PCH is a net lose and they disable it. Any experiences out there?)

Anyway, I consider this promising data for a compile server approach, since what I’m thinking about essentially models the “all.cc” approach in the compiler.

GCC 4.2

GCC 4.2 was released a few days ago. Here’s the list of changes. Congratulations to everybody who worked on this, and especially to Mark Mitchell for once again doing the release engineering work.

Different parts of GCC evolve at different rates, and due to some quirks of timing, it turns out that the gcj parts of 4.2 are already obsolete. For Fedora we back-ported the 4.3 gcj bits to our base GCC release; I think the other distros are doing the same. Basically what happened is that all our Java 1.5 work came in after 4.2 had closed for major changes.

Post Java One

I’m back from Java One — what a crazy week!

I’m a bit wary of thanking people; there are many people deserving appreciation and I would hate to leave someone out. I suppose I must try, and bear the consequences. So, special thanks to Tom Marble, Onno Kluyt, Rich Sands, Mark Reinhold, Simon Phipps, and Bruno Souza — each of them made a special effort to include us free software hackers in the cool behind-the-scenes goings-on.

I was also happy to meet and talk to many people while there: Per Bothner, Chris Oliver, Casey Marshall (finally! But I’m sad to say there are still a few gcj developers I have not met), Glynn Foster (nice guy — I meant to meet him for a beer at this one party but failed), Jeff Bailey, David Daney, Ean Schussler (always fun!), Erinn Clark, Jacob Applebaum, Ian Taylor, Tom Baxter (long time no see!), Roland McGrath, David Herron, Peter Ahé, Andre Burgoyne, and Johanna Neaderhouser. Whew. Actually I was introduced to many more people (Eben Moglen, James Gosling, John Cage, Neal Gafter, … to drop some names) but the former are the folks I actually had conversations with. And, yes, it was as overwhelming as you might imagine, especially if you are essentially an introvert, as I am.