Difficulties of elisp

The thesis that underlies my project to translate the Emacs C code to Common Lisp is that Emacs Lisp is close enough to Common Lisp that the parts of the Emacs C code that implement Lisp can be dropped in favor of the generally superior CL implementation.  This is generally true, but there are a few difficult bits.

Symbols

The primary problem is the translation of symbols when used as variable references.  Consider this code:

(defvar global 73)
(defun function (argument)
  (let ((local (something-else))
    (+ local argument global)))

More is going on here than meets the eye.

First, Emacs Lisp uses dynamic binding by default (optional lexical binding is a new feature in Emacs 24).  This applies to function arguments as well as other bindings.  So, you might think you could translate this straightforwardly to:

(defvar global 73)
(declare (special global))
(defun function (argument)
  (declare (special argument))
  (let ((local (something-else))
    (declare (special local))
    (+ local argument global)))

This was the approach taken by elisp.lisp; it defined macros for let and let* (but forgot defun) to do the dirty work:

(defmacro el::let* ((&rest vars) &rest forms)
  "Emacs-Lisp version of `let*' (everything special)."
  `(let* ,vars (declare (special ,@(mapcar #'from-list vars))) ,@forms))

But not so fast!  Emacs also has buffer-local variables.  These are variables where the value is associated with the current buffer; switching buffers makes a different binding visible to Lisp.  These require no special syntax, and a variable can be made buffer-local at any time.  So, we can break the above translation simply by evaluating:

(make-local-variable 'global)
(setq global 0)

Whoops!  Now the function will return the wrong result — the translation will have no way to know that is should refer to the buffer-local value.  (Well, ok, pretend that the setq magically worked somehow…)

My idea for implementing this is pretty convoluted.  Actually I have two ideas, one “user” and one “kernel”:

User

I think it is possible to use define-symbol-macro on all symbols that come from Elisp, so that we can tell the CL compiler about the real implementation.  However, a symbol can either be treated as a variable, or it can be treated as a symbol-macro — not both at the same time.  So, we will need a second location of some kind to store the real value.  Right now I’m thinking a symbol in another package, but maybe a cons or some other object would work better. In either case, we’d need a macro, a setf method for its expansion, and some extra-tricky redefinitions of let and defun to account for this change.

This would look something like:

(define-symbol-macro global (elisp:get-elisp-value 'global))
(defsetf elisp:get-elisp-value elisp:set-elisp-value))
;; Details left as an exercise for the reader.

This solution then has to be applied to buffer-, keyboard-, and frame-local variables.

Kernel

The kernel method is a lot simpler to explain: hack a Common Lisp implementation to directly know about buffer-locals.  SMOP!  But on the whole I think this approach is to be less preferred.

Other Problems

Emacs Lisp also freely extends other typical data types with custom attributes.  I consider this part of the genius of Emacs; a more ordinary program would work within the strictures of some defined, external language, but Emacs is not so cautious or constrained.  (Emacs is sort of a case study in breaking generally accepted rules of programming; which makes one wonder whether those rules are any good at all.)

So, for example, strings in Emacs have properties as a built-in component.  The solution here is simple — we will just translate the Emacs string data type as a whole, something we probably have to do anyway, because Emacs also has its own idiosyncratic approach to different encodings.

In elisp, aref can be used to access elements of other vector-like objects, not just arrays; there are some other odd little cases like this.  This is also easily handled; but it left me wondering why things like aref aren’t generic methods in CL. It often seems to me that a simpler, more orthogonal language lies inside of CL, struggling to get free. I try not to think these thoughts, though, as that way lies Scheme and the ridiculous fragmentation that has left Lisp unpopular.

6 Comments

  • Why can’t you store the buffer-local information in the symbol property list?

    Another solution is to create a new package for each buffer, although for some reason it feels a bit ugly.

  • I think the problem with putting it on the property list is that this would then be visible to elisp. I would rather keep it hidden, so that even weird elisp programs would not change their meaning.

    I don’t see how a new package per buffer would work. If you have an idea, I’d like to hear it.

    Today I’m thinking I will just have a hash table that maps from the elisp symbol to an instance of a particular defstruct; and this will store the needed bits, much like Emacs’s struct Lisp_Symbol.

  • Consider buffer locals like thread locals and treat them simply as dynamic variables whose storage changes on context switch. If a variable is buffer local, then it goes on a list of buffer locals that change on context switch, which in this case is buffer switch. So when you switch buffers, save the values of all bound buffer locals in this buffer in a map for this buffer, and restore the ones for the buffer you’re switching to. This increases buffer-switch cost at a reduction in cost of all variable references. It’s a classic thread-switch idea, but applied to buffers. And you won’t have to know in advance that code is referring to a buffer local.

  • Hello, Tom! What is the current state of your translating Emacs from C to Common Lisp project? I’m very interested in result.

  • Hello Tom! Please publish the code for this.

    This is really a amazing project and we have not heard from you for a while, so
    If you don’t have the time for completing it yourself, please publish the code, so others can work on this.

  • A `defvar’ed variable should definitely turn into two things:

    - an “elisp-var” object. That object should probably not be a symbol.

    - a symbol-macro that expands references into operations on that object.

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>