This is a followup to my earlier post on converting the Emacs C code into Common Lisp. This one is a bit more technical, diving into some specifics of the conversion process.

Basics

One important fact is that we do not need to convert an arbitrary C program to Common Lisp. This might or might not be efficiently possible — but we do not care. We only need to convert Emacs. This is simpler for two reasons. First, we can just ignore any C construct that Emacs does not use. If the translator barfs after some new update, we can fix it then. Second, Emacs itself is already written in a relatively Lispy style, being a Lisp implementation itself. We further exploit this by allowing the translator to know some details about Emacs. As a trivial example, all the Smumble globals created by the DEFUN marco need not be translated into Common Lisp as structure constants — they are an artifact of the implementation, and will show up directly in the generated defuns instead.

What to ignore

A good portion of Emacs is simply redundant in the CL world. There are a few types (cons, vector, integers, functions) that are shareable — in fact, sharing these is part of the goal of this effort. There are also a number of functions which are effectively identical. There are also entire redundant modules, like the garbage collector, or the bytecode interpreter.

The question is how to have the translator differentiate between what is useful and what is not, without breaking builds of future versions of Emacs.

I don’t currently think there is a high road to solving this problem. For modules like the GC, I plan to have ad hoc translator rules for the particular source files. For functions and data types, I’m adding new GCC attributes that I can use to mark the ignorable definitions.

Types

There are two type-related issues that arise when translating the source.

First, how should Emacs-specific types be represented? Primarily these types are structures, like struct buffer or struct string (we cannot use the CL string type, because Emacs adds properties directly to the string, and Emacs has its own idiosyncratic character handling). My answer here is to just straightforwardly translate them to defstruct.

The other question is when translating a C function, what do we do with the types of local variables? For the most part I am pretending that they don’t exist. This works fine except for local arrays and structures, but these are easily handled by initializing variables properly. My rationale is that while this is slower, it lets me get something working more quickly, and we can always update the translator to emit CL type declarations later on.

This simple approach doesn’t actually cover all the needed cases. For example, there is code in Emacs that takes the address of a local variable and passes it somewhere. This is easy to deal with; much of the remaining work is just digging through the code looking for special cases to clean up.

I’m similarly omitting type declarations from the generated structures. One possible nice side effect of this approach is that it will make it easier to lift Emacs’ file-size restrictions, because there will no longer be any code assuming that the size is a fixnum.

Macros

Many low-level details of the Emacs implementation are hidden in macros. For example, Emacs stuffs some type information into the low-order bits of pointers. It uses macros to add or remove this information. For this build, I redefine these macros to do nothing. This makes the GCC Gimple representation much closer to the abstract meaning of the program, and thus simpler to translate.

There are also some macros that are useful to redefine so that we can more easily hook into them from the translator. For example, Emacs has a C macro INTEGERP that is used to check whether its argument is an integer. Normally this macro uses bit twiddling to get its answer, but I redefine it like so:

#undef INTEGERP
extern Lisp_Object *INTEGERP (Lisp_Object)
    __attribute__((lisp_form("integerp")));

Example

The translator is not nearly complete, but it can already do a fair job at translating simple functions. For example, here is “forward-point” from the Emacs C code:

DEFUN ("forward-point", Fforward_point, Sforward_point, 1, 1, 0,
       doc: /* Return buffer position N characters after (before if N negative) point.  */)
  (Lisp_Object n)
{
  CHECK_NUMBER (n);

  return make_number (PT + XINT (n));
}

Here is what the translator comes up with:

(defun Fforward_point (n)
  (let (
    temp-var-0
    Qintegerp.316
    temp-var-1
    current_buffer.317
    temp-var-2
    )
    (block nil (tagbody
      bb-0
        ; no gimple here
      bb-1
        ; no gimple here
      bb-2
        (setf temp-var-0 (integerp n))
        (if (== temp-var-0 nil)
          (go bb-3)
          (go bb-4))
      bb-3
        (setf Qintegerp.316 Qintegerp)
        (wrong_type_argument Qintegerp.316 n)
      bb-4
        (setf current_buffer.317 current_buffer)
        (setf temp-var-2 (buffer-pt current_buffer.317))
        (setf temp-var-1 (+ temp-var-2 n))
        (return temp-var-1)
  ))))

(defun elisp:forward-point (arg0)
  (Fforward_point arg0))

The output looks pretty weird, because the translator works after GCC’s CFG is built, and so the most straightforward translation is to use this mess with tagbody. I doubt this matters much, but in any case the translator is readily hackable — it is still less than 400 lines of Python, including comments.

One thing to note is the translation of “PT“. This is actually a macro that refers to the current buffer:

#define PT (current_buffer->pt + 0)

The translator properly turns this into a reference to “buffer-pt“.

Another detail is the handling of packages. My plan is to put the Emacs implementation into one package, and then any elisp into a second package called “elisp“. A DEFUN in the C code will actually generate two functions: the internal one, and the elisp-visible one; hence the “elisp:” in the translation.

Next Steps

There’s still a good amount of work to be done. The converter punts on various constructs; type translation is implemented but not actually wired up to anything; the translator should emit definitions for alien functions; and plenty more.

11 Comments

I wonder, are you going to make the source available anywhere soon?

This sounds like a good project to hack on.
- January 25, 2012 6:03 PM
- By Joel McCracken
I find it amusing — or ironic, or something — that you’re using Python to translate C into Lisp :-).
- January 25, 2012 6:55 PM
- By Scott Turner
And if you publish code, choose the version control system that the potential contributors want to use: Git.
- January 25, 2012 6:58 PM
- By dtw
Haha, yeah, the Python thing is funny. The reason is just that the Python plugin for GCC is the best one:

http://tromey.com/blog/?p=714

I did try MELT once but couldn’t even get it to build; plus — yet another idiosyncratic lisp dialect, no thanks, life’s too short.

I’ll make the code available soon.
- January 25, 2012 7:23 PM
- By tom
This is awesome. I eagerly await the results.
- January 25, 2012 11:09 PM
- By Paul Legato
[…] The Cliffs of Inanity › Emacs and Common Lisp, Part 2 A good portion of Emacs is simply redundant in the CL world. There are a few types (cons, vector, integers, functions) that are shareable — in fact, sharing these is part of the goal of this effort. There are also a number of functions which are effectively identical. There are also entire redundant modules, like the garbage collector, or the bytecode interpreter. The question is how to have the translator differentiate between what is useful and what is not, without breaking builds of future versions of Emacs. One important fact is that we do not need to convert an arbitrary C program to Common Lisp. […]
- January 26, 2012 3:09 PM
- By unsorted | Pearltrees
[…] thesis that underlies my project to translate the Emacs C code to Common Lisp is that Emacs Lisp is close enough to Common Lisp that the parts of the Emacs C code that implement […]
- January 28, 2012 3:36 AM
- By The Cliffs of Inanity › Difficulties of elisp
I sincerely applaud you, sir.
- January 28, 2012 7:22 AM
- By Samium Gromoff
So, if this is a repeatable translation is the idea that no CL solution could ever be accepted into the Emacs mainline, so better to periodically sync with mainline?

If this were to be acceptable to GNU, then I assume a GNU CL implementation would be required? The only downside I see there is that most of the benefits of a CL rebase (faster GC, threading, etc…) seem to be limited to non-GNU (e.g., SBCL) common lisps.

I guess I’m asking what do you see as the long term future, and how will this not end up as another branch off the Emacs tree which is eventually abandoned (sorry to sound pessimistic as I would like to see something like this succeed).
- January 30, 2012 5:26 PM
- By Eric Schulte
>> So, if this is a repeatable translation is the idea that no CL solution could ever be accepted into the Emacs mainline, so better to periodically sync with mainline?

Yes. I don’t expect GNU will want to switch to Common Lisp. But, if translation is automated, then it won’t matter: CL is just another way to build Emacs.

>> If this were to be acceptable to GNU, then I assume a GNU CL implementation would be required?

That is for GNU to say. I’m planning to target SBCL, I think.

The requirements for the translation will be a CL system, with FFI and FFI callbacks. I’m not even 100% sure that SBCL has the latter, so I may be looking into other implementations. I’m still just working on the translator.
- January 30, 2012 6:34 PM
- By tom
Is there any moves on this project? Could you post your code, I really want to help ?
- May 31, 2012 11:03 AM
- By hash

Join the Discussion

This site uses Akismet to reduce spam. Learn how your comment data is processed.