There’s a fun source rewriting trick that I’ve wanted to try out for a long time — and I finally got a chance to do it while working on the multi-threading patch for Emacs.
In the multi-threaded Emacs, a let binding must be thread-local, because this is really the only way to manage dynamic binding in the presence of threads. Emacs also has a notion of a buffer-local variable, and furthermore some buffer-local variables are stored directly in the internal struct buffer — that is, assignments to the variable in lisp are transformed by the lisp implementation into a field assignment in C. These fields are freely used elsewhere in the C code.
Our implementation of thread-locals, though, is an alist mapping a thread object to the variable’s value. So, to keep the C code working properly, we need to rewrite every field access to use a function that finds the proper per-thread value.
The idea, of course, is automated rewriting. However, like many other GNU programs, Emacs is heavily macroized, and furthermore may be the last program in the whole distro that uses K&R-style function definitions. For these reasons I assumed that existing refactoring tools would not work well.
Luckily, though, this problem doesn’t require a very sophisticated refactoring tool. Really all we need to do is find the location of each field reference, and then find the start of the left-hand-side, and then rewrite that into the new form.
All we really need is to find a series of locations — the rest we can handle with some straightforward elisp scripting. And what simpler way is there to get locations than to get the compiler to give them to us?
I wrote a batch script in elisp to automate the whole procedure. Why elisp? Not only is it a natural, perhaps even required, fit when hacking on Emacs, it also has some nice “sexp” functions which allow skipping over properly-parenthesized expressions. This means I could do without a whole parser. And why automate the whole process? I expected it wouldn’t work properly the first time; having a single script let me git reset after each test run and simply re-run from scratch.
This elisp script first edits struct buffer to rename each field. Then it runs make to rebuild Emacs. This causes the compiler to emit an error message for each bad field access.
A critical point here is that I used GCC svn trunk. Only recent versions of GCC emit correct column numbers in error messages . GCC 4.4 might have worked, I am not sure — and in the end I needed a small libcpp patch to deal with a certain macro case.
The elisp script reads the output of make and pulls out the error messages. For each error on a given line, it works in reverse order (so that multiple fixes on one line will work properly without the bother of inserting markers), rewriting the field accesses. I wrote a bit of ad hoc code to back up to the start of the left-hand-side of the field access; doing this well is a bit funny, like writing a parser that works backwards, but in my case I knew I could get away with something relatively simple (I think this little sub-hack caused the script to miss less than 10 rewrites, i.e., tolerable).
I would guess that this script got 90% of the field accesses. I had to fix up a few by hand, mostly in macro definitions in header files. And, I had to revert a few changes as well, mostly in the garbage collector (which wants to see the real underlying alist, not the per-thread value). Still, diffstat says: 49 files changed, 1305 insertions(+), 1021 deletions(-) — in other words, not something you’d want to do by hand.
So, ok, this is horrible. But fun! I think I will end up doing it again, for frame- and keyboard-local variables. Maybe someday I’ll finish my patch to make libcpp properly track locations through macros, and then the script can even fix up macro definitions for me.
I’m not extremely interested in Eclipse-style refactoring — where the tool provides a couple dozen refactorings for you. Instead, I think I want my refactoring tool to answer queries for me, so I can feed that information to a customized rewriting script.
Another way I could have done this was writing a GCC plugin with treehydra or MELT, but unfortunately my free time is so limited that I haven’t managed to even build either one yet. Once plugins are in the Fedora GCC, I think it would be very worthwhile to package up treehydra…