JIT Compilation for Emacs

There have been a few efforts at writing an Emacs JIT — the original one, Burton Samograd’s, and also Nick LLoyd’s. So, what else to do except write my own?

Like the latter two, I based mine on GNU libjit. I did look at a few other JIT libraries: LLVM, gcc-jit, GNU Lightning, MyJit.  libjit seemed like a nice middle ground between a JIT with heavy runtime costs (LLVM, GCC) and one that is too lightweight (Lightning).

All of these Emacs JITs work by compiling bytecode to native code.  Now, I don’t actually think that is the best choice — it’s just the easiest — but my other project to do a more complete job in this area isn’t really ready to be released.  So bytecode it is.

Emacs implements a somewhat weird stack-based bytecode.  Many ordinary things are there, but seemingly obvious stack operations like “swap” do not exist; and there are bytecodes for very specialized Emacs operations like forward-char or point-max.

Samograd describes his implementation as “compiling down the spine”.  What he means by this is that the body of each opcode is implemented by some C function, and the JIT compiler emits, essentially, a series of subroutine calls.  This used to be called “jsr threading” in the olden days, though maybe it has some newer names by now.

Of course, we can do better than this, and Lloyd’s JIT does.  His emits instructions for the bodies of most bytecodes, deferring only a few to helper functions.  This is a better approach because many of these operations are only one or two instructions.

However, his approach takes a wrong turn by deferring stack operations to the compiled code.  For example, in this JIT, the Bdiscard opcode, which simply drops some items from the stack, is implemented as:

 CASE (Bdiscard):
 {
   JIT_NEED_STACK;
   JIT_INC (ctxt.stack, -sizeof (Lisp_Object));
   JIT_NEXT;
   NEXT;
 }

It turns out, though, that this isn’t needed — at least, for the bytecode generated by the Emacs byte-compiler, the stack depth at any given PC is a constant.  This means that the stack adjustments can be done at compile time, not runtime, leading to a performance boost.  So, the above opcode doesn’t need to emit code at all.

(And, if you’re worried about hand-crafted bytecode, it’s easy to write a little bytecode verifier to avoid JIT-compiling invalid things.  Though of course you shouldn’t worry, since you can already crash Emacs with bad bytecode.)

So, naturally, my implementation does not do this extra work.  And, it inlines more operations besides.

Caveat

I’ve only enabled the JIT for bytecode that uses lexical binding.  There isn’t any problem enabling it everywhere, I just figured it probably isn’t that useful, and so I didn’t bother.

Results

The results are pretty good.  First of all, I have it set up to automatically JIT compile every function, and this doesn’t seem any slower than ordinary Emacs, and it doesn’t crash.

Using the “silly-loop” example from the Emacs Lisp manual, with lexical binding enabled, I get these results:

Mode Time
Interpreted 4.48
Byte compiled 0.91
JIT 0.26

This is essentially the best case for this JIT, though.

Future Directions

I have a few ideas for how to improve the performance of the generated code.  One way to look at this is to look at Emacs’ own C code, to see what advantages it has over JIT-compiled code.  There are really three: cheaper function calls, inlining, and unboxing.

Calling a function in Emacs Lisp is quite expensive.  A call from the JIT requires marshalling the arguments into an array, then calling Ffuncall; which then might dispatch to a C function (a “subr”), the bytecode interpreter, or the ordinary interpreter.  In some cases this may require allocation.

This overhead applies to nearly every call — but the C implementation of Emacs is free to call various primitive functions directly, without using Ffuncall to indirect through some Lisp symbol.

Now, these direct calls aren’t without a cost: they prevent the modification of some functions from Lisp.  Sometimes this is a pain (it might be handy to hack on load from Lisp), but in many cases it is unimportant.

So, one idea for the JIT is to keep a list of such functions and then emit direct calls rather than indirect ones.

Even better than this would be to improve the calling convention so that all calls are less expensive.  However, because a function can be redefined with different arguments, it is tricky to see how to do this efficiently.

In the Emacs C code, many things are inlined that still aren’t inlined in the JIT — just look through lisp.h for all the inline functions (and/or macros, lisp.h is “unusual”).  Many of these things could be done in the JIT, though in some cases it might be more work than it is worth.

Even better, but also even more difficult, would be inlining from one bytecode function into another.  High-performance JITs do this when they notice a hot spot in the code.

Finally, unboxing.  In the Emacs C code, it’s relatively normal to type-check Lisp objects and then work solely in terms of their C analogues after that point.  This is more efficient because it hoists the tag manipulations.  Some work like this could be done automatically, by writing optimization passes for libjit that work on libjit’s internal representation of functions.

Getting the Code

The code is on the libjit branch in my Emacs repository on github.  You’ll have to build your own libjit, too, and if you want to avoid hacking on the Emacs Makefile, you will need my fork of libjit that adds pkg-config files.

Share if you liked it

18 Comments

  • This looks great! I wonder if you could get this added to evm so that people testing Emacs packages on Travis could add it to their CI tests.

  • Did you consider LuaJIT’s DynASM http://luajit.org/dynasm.html ?

  • I looked at DynASM briefly but didn’t really get further than the preprocessor thing. I will give it another look.

  • Hi – interesting post. I think the problem with libjit and other simple JITs is the lack of optimizations performed by the JIT – especially I understand that register allocation is local to a basic block. This kills performance as I found when trying to use a similar small and simple JIT – NanoJIT.

    Dynasm is an assembler really – not a JIT in the same way as LLVM, NanoJIT or libjit (i.e. each of these perform register allocation, and do not require you to write assembly code).

    Regards
    Dibyendu

  • Once there was a fork of libjit that did linear scan register allocation, but I don’t know if this was ever merged.

    It’s possible to do optimizations on the libjit IR, but I haven’t tried this yet, as the problems I’m facing are much more basic things, like not having a terrible calling convention.

    Probably if I had to move to something heavier, it would be gcc-jit, since it is also a GNU project and thus probably pretty easy to get in.

  • It was not merged, but for sure it is possible to implement a global register allocation for libjit.

  • I can’t figure out how to build emacs with libjit. Here’s what I did:


    cd /usr/local/src
    git clone https://github.com/tromey/libjit.git
    cd libjit
    ./bootstrap
    ./configure
    make
    make install

    cd /usr/local/src
    git clone --branch libjit https://github.com/tromey/emacs.git
    cd emacs
    ./autogen.sh
    ./configure

    The output of ‘configure’ contains:

    ...
    checking for LIBJIT... no
    ...
    Does Emacs have lisp JIT support? no

    What am I missing? Thanks.

  • You may need to set PKG_CONFIG_PATH to point to the libjit install directory containing the libjit “.pc” file.

  • I meant to mention in this post, but forgot, that I took a few bits from Nick Lloyd’s branch. I think it was a helper function or two, plus the configure code.

  • Thank you. I was missing ‘PKG_CONFIG_PATH=/usr/local/lib/pkgconfig’ (on my Arch Linux system), and I only got ‘libjit.pc’ when I cloned the pkg-config branch of libjit: git clone --branch pkg-config https://github.com/tromey/libjit.git.

    Thanks again.

  • What do you think about the effort to host Emacs on the Guile VM? My understanding is that it is slower at present, but is it likely that Guile Emacs would eventually speed up execution of elisp?

  • I am not a fan of guile and I hope it doesn’t get merged. Partly that’s because I have an irrational dislike of Scheme, but also there are aspects of Guile I don’t like, and I think that introducing more languages into Emacs is likely to make it worse, not better.

    Based on discussions on #emacs and elsewhere, I’m pretty much alone in this view.

  • […] JIT Compilation for Emacs (Reddit, Hacker News) […]

  • How can one tell if one is using a libjit? ‘ldd’ indicates that the executable includes the correct libjit.

    (defun silly-loop (n)...)

    (silly-loop 50000000)
    5.673033714294434

    (byte-compile 'silly-loop)
    ...

    (silly-loop 50000000)
    1.2770330905914307 #with or without ji? Probably not since the times are the same
    #as a no-jit version of emacs.

  • In your example maybe you didn’t enable lexical binding. I put the defun into a .el that has “;; -*- lexical-binding: t -*-” at the top, then re-visit the file, then eval the defun.

    One way to see if something is compiled is to try to disassemble it, like “(jit-disassemble (symbol-function ‘silly-loop))”.

  • You’re right: I did not enable lexical bindings. Once I did, silly-loop, bless its heart, executed 4 times faster (roughly).

    Thanks.

  • Your social buttons make this post extremely difficult to read on mobile because they overlap the tex

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>