Archive for October, 2009

Declare + The Jennifer Morgue

I read Declare and The Jennifer Morgue a few months ago, due to responses to my post about The Atrocity Archives.

First, I was expecting Declare to inhabit the same intersection of genres as the others — hackers plus Cthulhu plus spies.  I was mistaken.  It is a supernatural spy novel, but there are no hackers and it is not set in an obviously Lovecraftian world.

One funny thing is that some characters appear in both books.  Since I’m largely ignorant of British spy history, I wasn’t aware that these were real figures.  That’s too bad, I think that sort of knowledge makes the books a bit richer.

The Jennifer Morgue has a great name and is a typically fast-paced Stross book.  However, I generally dislike books that go meta, and the whole geas thing was too cutesy for me.  I finished this book mostly out of stubbornness,.  Also, I find that the frenetic style of Stross or MacLeod wears on me after a while.

Declare was a much different book — slower paced, more detailed, with more history and character development.  It was a bit repetitive at times and perhaps a bit long; but I thought the ending was rather clever and I enjoyed it overall.  For some reason the title Declare has stuck with me and I think of it often.

Fun with rewriting

There’s a fun source rewriting trick that I’ve wanted to try out for a long time — and I finally got a chance to do it while working on the multi-threading patch for Emacs.

The Problem

In the multi-threaded Emacs, a let binding must be thread-local, because this is really the only way to manage dynamic binding in the presence of threads.  Emacs also has a notion of a buffer-local variable, and furthermore some buffer-local variables are stored directly in the internal struct buffer — that is, assignments to the variable in lisp are transformed by the lisp implementation into a field assignment in C.  These fields are freely used elsewhere in the C code.

Our implementation of thread-locals, though, is an alist mapping a thread object to the variable’s value.  So, to keep the C code working properly, we need to rewrite every field access to use a function that finds the proper per-thread value.

The Idea

The idea, of course, is automated rewriting.  However, like many other GNU programs, Emacs is heavily macroized, and furthermore may be the last program in the whole distro that uses K&R-style function definitions.  For these reasons I assumed that existing refactoring tools would not work well.

Luckily, though, this problem doesn’t require a very sophisticated refactoring tool.  Really all we need to do is find the location of each field reference, and then find the start of the left-hand-side, and then rewrite that into the new form.

The Hack

All we really need is to find a series of locations — the rest we can handle with some straightforward elisp scripting.  And what simpler way is there to get locations than to get the compiler to give them to us?

I wrote a batch script in elisp to automate the whole procedure.  Why elisp?  Not only is it a natural, perhaps even required, fit when hacking on Emacs, it also has some nice “sexp” functions which allow skipping over properly-parenthesized expressions.  This means I could do without a whole parser.  And why automate the whole process?  I expected it wouldn’t work properly the first time; having a single script let me git reset after each test run and simply re-run from scratch.

This elisp script first edits struct buffer to rename each field.  Then it runs make to rebuild Emacs.  This causes the compiler to emit an error message for each bad field access.

A critical point here is that I used GCC svn trunk.  Only recent versions of GCC emit correct column numbers in error messages .  GCC 4.4 might have worked, I am not sure — and in the end I needed a small libcpp patch to deal with a certain macro case.

The elisp script reads the output of make and pulls out the error messages.  For each error on a given line, it works in reverse order (so that multiple fixes on one line will work properly without the bother of inserting markers), rewriting the field accesses.  I wrote a bit of ad hoc code to back up to the start of the left-hand-side of the field access; doing this well is a bit funny, like writing a parser that works backwards, but in my case I knew I could get away with something relatively simple (I think this little sub-hack caused the script to miss less than 10 rewrites, i.e., tolerable).

I would guess that this script got 90% of the field accesses.  I had to fix up a few by hand, mostly in macro definitions in header files.  And, I had to revert a few changes as well, mostly in the garbage collector (which wants to see the real underlying alist, not the per-thread value).  Still, diffstat says: 49 files changed, 1305 insertions(+), 1021 deletions(-) — in other words, not something you’d want to do by hand.

So, ok, this is horrible.  But fun!  I think I will end up doing it again, for frame- and keyboard-local variables.  Maybe someday I’ll finish my patch to make libcpp properly track locations through macros, and then the script can even fix up macro definitions for me.

I’m not extremely interested in Eclipse-style refactoring — where the tool provides a couple dozen refactorings for you.  Instead, I think I want my refactoring tool to answer queries for me, so I can feed that information to a customized rewriting script.

Another way I could have done this was writing a GCC plugin with treehydra or MELT, but unfortunately my free time is so limited that I haven’t managed to even build either one yet.  Once plugins are in the Fedora GCC, I think it would be very worthwhile to package up treehydra…

package.el and Emacs

It looks like package.el is going to go into Emacs after all.  I’m psyched!

I made a git branch (still local) to hack on this.

Of course, now I realize that there are bugs to fix and features to add and cleanups to make before this is really feasible…