JIT Compilation for Emacs

There have been a few efforts at writing an Emacs JIT — the original one, Burton Samograd’s, and also Nick LLoyd’s. So, what else to do except write my own?

Like the latter two, I based mine on GNU libjit. I did look at a few other JIT libraries: LLVM, gcc-jit, GNU Lightning, MyJit.  libjit seemed like a nice middle ground between a JIT with heavy runtime costs (LLVM, GCC) and one that is too lightweight (Lightning).

All of these Emacs JITs work by compiling bytecode to native code.  Now, I don’t actually think that is the best choice — it’s just the easiest — but my other project to do a more complete job in this area isn’t really ready to be released.  So bytecode it is.

Emacs implements a somewhat weird stack-based bytecode.  Many ordinary things are there, but seemingly obvious stack operations like “swap” do not exist; and there are bytecodes for very specialized Emacs operations like forward-char or point-max.

Samograd describes his implementation as “compiling down the spine”.  What he means by this is that the body of each opcode is implemented by some C function, and the JIT compiler emits, essentially, a series of subroutine calls.  This used to be called “jsr threading” in the olden days, though maybe it has some newer names by now.

Of course, we can do better than this, and Lloyd’s JIT does.  His emits instructions for the bodies of most bytecodes, deferring only a few to helper functions.  This is a better approach because many of these operations are only one or two instructions.

However, his approach takes a wrong turn by deferring stack operations to the compiled code.  For example, in this JIT, the Bdiscard opcode, which simply drops some items from the stack, is implemented as:

 CASE (Bdiscard):
 {
   JIT_NEED_STACK;
   JIT_INC (ctxt.stack, -sizeof (Lisp_Object));
   JIT_NEXT;
   NEXT;
 }

It turns out, though, that this isn’t needed — at least, for the bytecode generated by the Emacs byte-compiler, the stack depth at any given PC is a constant.  This means that the stack adjustments can be done at compile time, not runtime, leading to a performance boost.  So, the above opcode doesn’t need to emit code at all.

(And, if you’re worried about hand-crafted bytecode, it’s easy to write a little bytecode verifier to avoid JIT-compiling invalid things.  Though of course you shouldn’t worry, since you can already crash Emacs with bad bytecode.)

So, naturally, my implementation does not do this extra work.  And, it inlines more operations besides.

Caveat

I’ve only enabled the JIT for bytecode that uses lexical binding.  There isn’t any problem enabling it everywhere, I just figured it probably isn’t that useful, and so I didn’t bother.

Results

The results are pretty good.  First of all, I have it set up to automatically JIT compile every function, and this doesn’t seem any slower than ordinary Emacs, and it doesn’t crash.

Using the “silly-loop” example from the Emacs Lisp manual, with lexical binding enabled, I get these results:

Mode Time
Interpreted 4.48
Byte compiled 0.91
JIT 0.26

This is essentially the best case for this JIT, though.

Future Directions

I have a few ideas for how to improve the performance of the generated code.  One way to look at this is to look at Emacs’ own C code, to see what advantages it has over JIT-compiled code.  There are really three: cheaper function calls, inlining, and unboxing.

Calling a function in Emacs Lisp is quite expensive.  A call from the JIT requires marshalling the arguments into an array, then calling Ffuncall; which then might dispatch to a C function (a “subr”), the bytecode interpreter, or the ordinary interpreter.  In some cases this may require allocation.

This overhead applies to nearly every call — but the C implementation of Emacs is free to call various primitive functions directly, without using Ffuncall to indirect through some Lisp symbol.

Now, these direct calls aren’t without a cost: they prevent the modification of some functions from Lisp.  Sometimes this is a pain (it might be handy to hack on load from Lisp), but in many cases it is unimportant.

So, one idea for the JIT is to keep a list of such functions and then emit direct calls rather than indirect ones.

Even better than this would be to improve the calling convention so that all calls are less expensive.  However, because a function can be redefined with different arguments, it is tricky to see how to do this efficiently.

In the Emacs C code, many things are inlined that still aren’t inlined in the JIT — just look through lisp.h for all the inline functions (and/or macros, lisp.h is “unusual”).  Many of these things could be done in the JIT, though in some cases it might be more work than it is worth.

Even better, but also even more difficult, would be inlining from one bytecode function into another.  High-performance JITs do this when they notice a hot spot in the code.

Finally, unboxing.  In the Emacs C code, it’s relatively normal to type-check Lisp objects and then work solely in terms of their C analogues after that point.  This is more efficient because it hoists the tag manipulations.  Some work like this could be done automatically, by writing optimization passes for libjit that work on libjit’s internal representation of functions.

Getting the Code

The code is on the libjit branch in my Emacs repository on github.  You’ll have to build your own libjit, too, and if you want to avoid hacking on the Emacs Makefile, you will need my fork of libjit that adds pkg-config files.

29 Comments

  • This looks great! I wonder if you could get this added to evm so that people testing Emacs packages on Travis could add it to their CI tests.

  • Did you consider LuaJIT’s DynASM http://luajit.org/dynasm.html ?

  • I looked at DynASM briefly but didn’t really get further than the preprocessor thing. I will give it another look.

  • Hi – interesting post. I think the problem with libjit and other simple JITs is the lack of optimizations performed by the JIT – especially I understand that register allocation is local to a basic block. This kills performance as I found when trying to use a similar small and simple JIT – NanoJIT.

    Dynasm is an assembler really – not a JIT in the same way as LLVM, NanoJIT or libjit (i.e. each of these perform register allocation, and do not require you to write assembly code).

    Regards
    Dibyendu

  • Once there was a fork of libjit that did linear scan register allocation, but I don’t know if this was ever merged.

    It’s possible to do optimizations on the libjit IR, but I haven’t tried this yet, as the problems I’m facing are much more basic things, like not having a terrible calling convention.

    Probably if I had to move to something heavier, it would be gcc-jit, since it is also a GNU project and thus probably pretty easy to get in.

  • It was not merged, but for sure it is possible to implement a global register allocation for libjit.

  • I can’t figure out how to build emacs with libjit. Here’s what I did:


    cd /usr/local/src
    git clone https://github.com/tromey/libjit.git
    cd libjit
    ./bootstrap
    ./configure
    make
    make install

    cd /usr/local/src
    git clone --branch libjit https://github.com/tromey/emacs.git
    cd emacs
    ./autogen.sh
    ./configure

    The output of ‘configure’ contains:

    ...
    checking for LIBJIT... no
    ...
    Does Emacs have lisp JIT support? no

    What am I missing? Thanks.

  • You may need to set PKG_CONFIG_PATH to point to the libjit install directory containing the libjit “.pc” file.

  • I meant to mention in this post, but forgot, that I took a few bits from Nick Lloyd’s branch. I think it was a helper function or two, plus the configure code.

  • Thank you. I was missing ‘PKG_CONFIG_PATH=/usr/local/lib/pkgconfig’ (on my Arch Linux system), and I only got ‘libjit.pc’ when I cloned the pkg-config branch of libjit: git clone --branch pkg-config https://github.com/tromey/libjit.git.

    Thanks again.

  • What do you think about the effort to host Emacs on the Guile VM? My understanding is that it is slower at present, but is it likely that Guile Emacs would eventually speed up execution of elisp?

  • I am not a fan of guile and I hope it doesn’t get merged. Partly that’s because I have an irrational dislike of Scheme, but also there are aspects of Guile I don’t like, and I think that introducing more languages into Emacs is likely to make it worse, not better.

    Based on discussions on #emacs and elsewhere, I’m pretty much alone in this view.

  • […] JIT Compilation for Emacs (Reddit, Hacker News) […]

  • How can one tell if one is using a libjit? ‘ldd’ indicates that the executable includes the correct libjit.

    (defun silly-loop (n)...)

    (silly-loop 50000000)
    5.673033714294434

    (byte-compile 'silly-loop)
    ...

    (silly-loop 50000000)
    1.2770330905914307 #with or without ji? Probably not since the times are the same
    #as a no-jit version of emacs.

  • In your example maybe you didn’t enable lexical binding. I put the defun into a .el that has “;; -*- lexical-binding: t -*-” at the top, then re-visit the file, then eval the defun.

    One way to see if something is compiled is to try to disassemble it, like “(jit-disassemble (symbol-function ‘silly-loop))”.

  • You’re right: I did not enable lexical bindings. Once I did, silly-loop, bless its heart, executed 4 times faster (roughly).

    Thanks.

  • Your social buttons make this post extremely difficult to read on mobile because they overlap the tex

  • […] been working a bit more on my Emacs JIT, in particular on improving function calling.  This has been a fun project so I thought I’d talk […]

  • […] been working a bit more on my Emacs JIT, in particular on improving function calling. This has been a fun project so I thought I’d talk […]

  • Some things…

    One couple of small typo: His emits -> He emits.

    Second, I’ve starting documenting elisp bytecode or for just the PDF

    Please feel free to add or correct to this. It needs help from the community. Next up will probably be a section on existing elisp bytecode optimization.

    If you look at that, you ‘ll see that the bulk of the opcodes are in fact just calls to C built-in Emacs functions (symbolp, listp, consp,…). So JIT’ing could just be inlining the C functions that are already there and then improving the rest, e.g. removing stack management by allocating registers.

  • With respect to my last comment about: “inlining C functions that are already there and then improving the rest”. I see that path is somewhat trodden when I follow the links to other JITs and read the emacs-devel threads. However in doing this I don’t have a clear sense of how each one differs from the others all that well. The pro’s and con’s.

    I’ll be giving a little talk at our local NYC Emacs group in April, perhaps I’ll sort his out by then and discuss. But if this has already been done…

  • > So JIT’ing could just be inlining the C functions that are already there

    In some cases inlining these is not likely to be a win overall — the underlying C function is just too big. In other cases, though, inlining does make sense. What my JIT does is inline a subset of these calls (things like symbolp or car, which are short) and then for others, it emits a direct call to the C primitive; this at least avoids the overhead of a full funcall.

    > I don’t have a clear sense of how each one differs from the others all that well. The pro’s and con’s.

    I thought I covered that OK in the first part of this post; but if you have specific questions I can try to answer them. Short form: mine gives the best performance.

  • Ah, ok. I reread and now sort of understand. although the devil is in the details.

    So in sum, a more direct lead off would be that you looked at other JITs thought there was room for improvement and then dived in. Early results show there could be a speedup of around 3 over byte-compiled files, although silly loop is hardly a benchmark.

    Looking at the ORM talk mentioned somewhere, and one takeaway from that is that to make JITs work really well, it is desirable go add support from the runtime to note when interesting events have occurred.

    So I imagine there would be a fair amount of work just for that. Any thoughts there?

  • > So I imagine there would be a fair amount of work just for that. Any thoughts there?

    I wonder what sorts of events were meant.

    One thing I saw in an ORM talk — maybe this is what you’re referring to? — is that it is good if the JIT can see the code of the VM itself. This enables better optimization.

    For example, in Emacs, it would be good if a JIT could see into, say, mapcar. Then maybe it could do a couple of levels of inlining and type specialization and turn it into an ordinary loop.

    My long-term goal for that is maybe el-compilador, or maybe the JIT — but in either case, rewriting functions from C to elisp. This is reasonable if elisp is fast enough, so the JIT is a core piece.

    Another part of the puzzle here is moving as much code as possible to use lexical binding.

  • Here are some thoughts for consideration.

    First, an observation: some people like myself live inside Emacs.

    So that means the opportunities for profiling and code improvement are legion. And it means that there are often times when Emacs is very idle and could in the background do incremental code improvement and refactoring.

    And that means if there are profile counts, those could get very large. (Floating point numbers are useful here, but when the numbers get very large incrementing by one doesn’t work. So a trick Rob Pike proposed a while ago to was use fixed numbers but increment randomly based on the log size of the number.)

    Without even a JIT, here are some simple things: look for functions that haven’t been byte-compiled and compile them. Sometimes this happens when I am debugging a function and then turn off debugging that function. A system might also warn me about such things, or it might just do it.

    Now onto things that the ORM talk did mention. The observation is that in the beginning there is instability as Ruby modules get loaded. Perhaps particular classes and constants and methods get redefined a bit. In Ruby, he specifically talks about constants. So he was hoping that Ruby would have a way to indicate transitioning from instability to stability and vice versa.

    In emacs, something analogous would be running .emacs. So there could be a function call that says: hey, I’m about to load .emacs, so so assume some instability. And then at the end of the .emacs profile hey, I’ve finished loading .emacs so assume greater stability. Even without a hint in the .emacs code, one could delay JIT’ing for the first n seconds based on a defcustom variable.

    And individual large packages might be instrumented to add such hints too.

    And wouldn’t it be nice if the profiling information (and/or JIT code) were saved to disk on exit so it could be used in future or parallel emacs sessions?

    As with many things, It’s easy and fun to make up ideas. Getting it done however is a totally different matter. However I do think some thought to what an API should look like should be done. The simplest and most primitive one I can think of is

    1. “turn on/off heavy profiling now”, just as there is something like that for garbage collection.

    2. Set a threshold for idle times for when to do deep analysis

    3. Set optimization/JIT options

    4. Set a warning level on certain kinds of optimization. For example, before automatic byte compilation of LISP functions there might be a flag to warn or query me first. For JIT, I’m not sure if this would be done, but there could be certain kinds of things where warning and or selection might make sense.

  • I am not too worried about execution counts getting very large. I’m mildly worried about the instability problem.

    Some functions are most likely only executed once — setup-type things. So I plan to do the simple thing that many JITs do, and only compile a function after it has been run N times. I’ll probably make N tweakable.

    After the initial compilation I am thinking I will insert a call counter at call sites that are inlining candidates. This is just a subset of function calls; but also it may make sense to limit this to loops. Anyway, if this triggers and we do more optimization, my plan is to just turn off further work for that function. At least for the forseeable future there aren’t really many more optimizations, so there’s no point doing any more work.

    On the whole I’d rather not expose anything to elisp programmers. Maybe it will prove necessary, I don’t know; but I think the more knobs there are, the harder it will be to debug.

    If you have a link to the ORM talk(s) I would be very interested.

    Saving information to disk is difficult. The JITted code could be saved — but then the JIT has to generate PIC code, which is slower. And, when re-loading, you have to verify that you’re loading the correct thing. At this point the JIT is fast enough though that I wouldn’t bother with that. Instead, to write more of Emacs in elisp, we can use el-compilador as a build step.

  • Have any part of this work been upstreamed?

  • > Have any part of this work been upstreamed?

    Just found out: turns out, it has been sent to ML.¹ I’m still reading, but it seems it haven’t been accepted due to libjit not supporting much platforms, nor having much maintainance.

    1: https://lists.gnu.org/archive/html/emacs-devel/2018-08/msg00393.html

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.