February — 2018 — The Cliffs of Inanity

There have been a few efforts at writing an Emacs JIT — the original one, Burton Samograd’s, and also Nick LLoyd’s. So, what else to do except write my own?

Like the latter two, I based mine on GNU libjit. I did look at a few other JIT libraries: LLVM, gcc-jit, GNU Lightning, MyJit. libjit seemed like a nice middle ground between a JIT with heavy runtime costs (LLVM, GCC) and one that is too lightweight (Lightning).

All of these Emacs JITs work by compiling bytecode to native code. Now, I don’t actually think that is the best choice — it’s just the easiest — but my other project to do a more complete job in this area isn’t really ready to be released. So bytecode it is.

Emacs implements a somewhat weird stack-based bytecode. Many ordinary things are there, but seemingly obvious stack operations like “swap” do not exist; and there are bytecodes for very specialized Emacs operations like forward-char or point-max.

Samograd describes his implementation as “compiling down the spine”. What he means by this is that the body of each opcode is implemented by some C function, and the JIT compiler emits, essentially, a series of subroutine calls. This used to be called “jsr threading” in the olden days, though maybe it has some newer names by now.

Of course, we can do better than this, and Lloyd’s JIT does. His emits instructions for the bodies of most bytecodes, deferring only a few to helper functions. This is a better approach because many of these operations are only one or two instructions.

However, his approach takes a wrong turn by deferring stack operations to the compiled code. For example, in this JIT, the Bdiscard opcode, which simply drops some items from the stack, is implemented as:

 CASE (Bdiscard):
 {
   JIT_NEED_STACK;
   JIT_INC (ctxt.stack, -sizeof (Lisp_Object));
   JIT_NEXT;
   NEXT;
 }

It turns out, though, that this isn’t needed — at least, for the bytecode generated by the Emacs byte-compiler, the stack depth at any given PC is a constant. This means that the stack adjustments can be done at compile time, not runtime, leading to a performance boost. So, the above opcode doesn’t need to emit code at all.

(And, if you’re worried about hand-crafted bytecode, it’s easy to write a little bytecode verifier to avoid JIT-compiling invalid things. Though of course you shouldn’t worry, since you can already crash Emacs with bad bytecode.)

So, naturally, my implementation does not do this extra work. And, it inlines more operations besides.

Caveat

I’ve only enabled the JIT for bytecode that uses lexical binding. There isn’t any problem enabling it everywhere, I just figured it probably isn’t that useful, and so I didn’t bother.

Results

The results are pretty good. First of all, I have it set up to automatically JIT compile every function, and this doesn’t seem any slower than ordinary Emacs, and it doesn’t crash.

Using the “silly-loop” example from the Emacs Lisp manual, with lexical binding enabled, I get these results:

Mode	Time
Interpreted	4.48
Byte compiled	0.91
JIT	0.26

This is essentially the best case for this JIT, though.

Future Directions

I have a few ideas for how to improve the performance of the generated code. One way to look at this is to look at Emacs’ own C code, to see what advantages it has over JIT-compiled code. There are really three: cheaper function calls, inlining, and unboxing.

Calling a function in Emacs Lisp is quite expensive. A call from the JIT requires marshalling the arguments into an array, then calling Ffuncall; which then might dispatch to a C function (a “subr”), the bytecode interpreter, or the ordinary interpreter. In some cases this may require allocation.

This overhead applies to nearly every call — but the C implementation of Emacs is free to call various primitive functions directly, without using Ffuncall to indirect through some Lisp symbol.

Now, these direct calls aren’t without a cost: they prevent the modification of some functions from Lisp. Sometimes this is a pain (it might be handy to hack on load from Lisp), but in many cases it is unimportant.

So, one idea for the JIT is to keep a list of such functions and then emit direct calls rather than indirect ones.

Even better than this would be to improve the calling convention so that all calls are less expensive. However, because a function can be redefined with different arguments, it is tricky to see how to do this efficiently.

In the Emacs C code, many things are inlined that still aren’t inlined in the JIT — just look through lisp.h for all the inline functions (and/or macros, lisp.h is “unusual”). Many of these things could be done in the JIT, though in some cases it might be more work than it is worth.

Even better, but also even more difficult, would be inlining from one bytecode function into another. High-performance JITs do this when they notice a hot spot in the code.

Finally, unboxing. In the Emacs C code, it’s relatively normal to type-check Lisp objects and then work solely in terms of their C analogues after that point. This is more efficient because it hoists the tag manipulations. Some work like this could be done automatically, by writing optimization passes for libjit that work on libjit’s internal representation of functions.

Getting the Code

The code is on the libjit branch in my Emacs repository on github. You’ll have to build your own libjit, too, and if you want to avoid hacking on the Emacs Makefile, you will need my fork of libjit that adds pkg-config files.

The Cliffs of Inanity

Archive for February, 2018

JIT Compilation for Emacs

Caveat

Results

Future Directions

Getting the Code