Fun with rewriting

There’s a fun source rewriting trick that I’ve wanted to try out for a long time — and I finally got a chance to do it while working on the multi-threading patch for Emacs.

The Problem

In the multi-threaded Emacs, a let binding must be thread-local, because this is really the only way to manage dynamic binding in the presence of threads.  Emacs also has a notion of a buffer-local variable, and furthermore some buffer-local variables are stored directly in the internal struct buffer — that is, assignments to the variable in lisp are transformed by the lisp implementation into a field assignment in C.  These fields are freely used elsewhere in the C code.

Our implementation of thread-locals, though, is an alist mapping a thread object to the variable’s value.  So, to keep the C code working properly, we need to rewrite every field access to use a function that finds the proper per-thread value.

The Idea

The idea, of course, is automated rewriting.  However, like many other GNU programs, Emacs is heavily macroized, and furthermore may be the last program in the whole distro that uses K&R-style function definitions.  For these reasons I assumed that existing refactoring tools would not work well.

Luckily, though, this problem doesn’t require a very sophisticated refactoring tool.  Really all we need to do is find the location of each field reference, and then find the start of the left-hand-side, and then rewrite that into the new form.

The Hack

All we really need is to find a series of locations — the rest we can handle with some straightforward elisp scripting.  And what simpler way is there to get locations than to get the compiler to give them to us?

I wrote a batch script in elisp to automate the whole procedure.  Why elisp?  Not only is it a natural, perhaps even required, fit when hacking on Emacs, it also has some nice “sexp” functions which allow skipping over properly-parenthesized expressions.  This means I could do without a whole parser.  And why automate the whole process?  I expected it wouldn’t work properly the first time; having a single script let me git reset after each test run and simply re-run from scratch.

This elisp script first edits struct buffer to rename each field.  Then it runs make to rebuild Emacs.  This causes the compiler to emit an error message for each bad field access.

A critical point here is that I used GCC svn trunk.  Only recent versions of GCC emit correct column numbers in error messages .  GCC 4.4 might have worked, I am not sure — and in the end I needed a small libcpp patch to deal with a certain macro case.

The elisp script reads the output of make and pulls out the error messages.  For each error on a given line, it works in reverse order (so that multiple fixes on one line will work properly without the bother of inserting markers), rewriting the field accesses.  I wrote a bit of ad hoc code to back up to the start of the left-hand-side of the field access; doing this well is a bit funny, like writing a parser that works backwards, but in my case I knew I could get away with something relatively simple (I think this little sub-hack caused the script to miss less than 10 rewrites, i.e., tolerable).

I would guess that this script got 90% of the field accesses.  I had to fix up a few by hand, mostly in macro definitions in header files.  And, I had to revert a few changes as well, mostly in the garbage collector (which wants to see the real underlying alist, not the per-thread value).  Still, diffstat says: 49 files changed, 1305 insertions(+), 1021 deletions(-) — in other words, not something you’d want to do by hand.

So, ok, this is horrible.  But fun!  I think I will end up doing it again, for frame- and keyboard-local variables.  Maybe someday I’ll finish my patch to make libcpp properly track locations through macros, and then the script can even fix up macro definitions for me.

I’m not extremely interested in Eclipse-style refactoring — where the tool provides a couple dozen refactorings for you.  Instead, I think I want my refactoring tool to answer queries for me, so I can feed that information to a customized rewriting script.

Another way I could have done this was writing a GCC plugin with treehydra or MELT, but unfortunately my free time is so limited that I haven’t managed to even build either one yet.  Once plugins are in the Fedora GCC, I think it would be very worthwhile to package up treehydra…

6 Comments

  • For these kinds of tasks, Coccinele (http://coccinelle.lip6.fr) might be useful as well — it’s a ‘semantic patchers’, which allows you to write expressions describing the changes with a bit more intelligence than ‘sed’.

  • Yeah, I tried coccinelle for something else, on gdb. It worked ok, though my very first experiment found a bug in it :-)

    Due to the macros I was not sure whether it would work for the Emacs hack. My impression, perhaps wrong(?), is that coccinelle does not work very well in the presence of heavy macro use.

  • [...] Continue reading here: Fun with rewriting [...]

  • > and furthermore may be the last program in the whole distro that uses K&R-style function definitions

    It’s not. Parts of z88dk also contain that ugly stuff (the preprocessor is written entirely in that obsolete style, the zcc.c compiler driver has some K&R declarations and some standard prototypes mixed). There are probably other packages still using K&R declarations as well.

  • > Emacs is heavily macroized, and furthermore may be the last program in the whole distro that uses K&R-style function definitions.

    The X server still has some, actually. Point taken, though. :)

  • If your software project use “weird” macros, such as FOO(int) as a type in a variable declaration, Coccinelle needs some information about that. If you run Coccinelle on your software project with the option -parse_c, you can find out about how much of the project it was able to parse successfully, and what the common problems were. There is not very much information about how to give information about macro definitions, but examples are in the file standard.h. You can make your own version of standard.h and then use it with the option -macro_file.

    On the other hand, I’m not sure what is the status of K&R style function definitions…

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>