External Refactoring

Many years ago I looked at changing Emacs to have an incremental garbage collector. An incremental GC requires a write barrier, which means that I wanted to insert some instructions at every point that mutated a lisp object. However, Emacs’ style at the time used macros to access fields of lisp objects, and these macros were used as both rvalues and lvalues. So, XCAR(x) would extract the car of a cons, but XCAR(x) = y would act as setcar.

Once you have code like this in C, pretty much your only choice is brute force: find every assignment and change it to use a new macro. One nice trick you can use to make the job simpler is to get the compiler to tell you the locations of all the assignments; you can do this by redefining XCAR to yield an invalid lvalue. Either way, though, you’re still in for a lot of typing.

In my current GCC project I’m running into a somewhat similar need. I want to make parts of GCC run multi-threaded. However, GCC has many global variables, which interact poorly with multiple threads, so something must be done about the globals.

The ideal solution would be to move globals into structures and change GCC to be a bit more object-oriented. Again, though, this is a lot of editing — for instance it would require adding a "this" argument to just about every function.

Problems like these are one reason I generally prefer C++ to C. In C++ you have options that do not involve massive editing.

In C++ the XCAR solution is simple: change the macro to return a new “car reference” object, ensure that this object has a conversion to a lisp object, and define an operator= which calls the incremental GC mark function in addition to modifying the car. With modern C++ compilers this should be as efficient as a macro, much less typing to implement, and (IMO) just as clear and maintainable.

The GCC solution still involves a fair amount of typing. Here I would turn existing functions into methods in a class, and move their global state into the class. The this argument will be invisibly supplied by the compiler, but in some cases I would still have to update calls to provide an object. Still, this is less work than updating every function definition and call, and parts (adding "classname::" to the definitions) can be mostly automated.

The idea behind this is that some C++ features, notably operator overloading, let you change the meaning of a piece of source text. I usually think of this as an ugly cousin of refactoring, useful when making large changes to existing code bases.


  • It is possible to do this kind of refactoring with my Elsa work. The macro LHS is a little tricky since one to has to resolve macros, but probably doable, the this one is a pretty straight-forward.transform.

    Now wouldn’t it be nice if such refactoring tools were scriptable, polished and apt-getable :)

  • You can also do this with clang. Clang can tell you about the macro expansion and we already have code rewriting support. You’d just hack together your own driver and a special purpose rewriter. Supporting refactoring is an explicit goal of clang, why not use it to hack on gcc? :)

  • Defactoring…

  • This post reminds me of program slicing. You effectively want to edit the slice where all those assignment side effects are meaningful. Even if you producd the slice, that’s probably also a lot of typing, and I am not completely sure how this relates to program slicing. But I somehow feel like there might be a solution to this that involves slicing and keeps you from too much typing. :)

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>