Skip to content

import gdb

Occasionally I see questions about how to import gdb from the ordinary Python interpreter.  This turns out to be surprisingly easy to implement.

First, a detour into PIE and symbol visibility.

“PIE” stands for “Position Independent Executable”.  It uses essentially the same approach as a shared library, except it can be applied to the executable.  You can easily build a PIE by compiling the objects with the -fPIE flag, and then linking the resulting executable with -pie.  Normally PIEs are used as a security feature, but in our case we’re going to compile gdb this way so we can have Python dlopen it, following the usual Python approach: we install it as _gdb.so and add a a module initialization function, init_gdb. (We actually name the module “_gdb“, because that is what the gdb C code creates; the “gdb” module itself is already plain Python that happens to “import _gdb“.)

Why install the PIE rather than make a true shared library?  It is just more convenient — it doesn’t require a lot of configure and Makefile hacking, and it doesn’t slow down the build by forcing us to link gdb against a new library.

Next, what about all those functions in gdb?  There are thousands of them… won’t they possibly cause conflicts at dlopen time?  Why yes… but that’s why we have symbol visibility.  Symbol visibility is an ELF feature that lets us hide all of gdb’s symbols from any dlopen caller.  In fact, I found out during this process that you can even hide main, as ld.so seems to ignore visibility bits for this function.

Making this work is as simple as adding -fvisibility=hidden to our CFLAGS, and then marking our Python module initialization function with __attribute__((visibility("default"))).  Two notes here.  First, it’s odd that “default” means “public”; just one of those mysterious details.  Second, Python’s PyMODINIT_FUNC macro ought to do this already, but it doesn’t; there’s a Python bug.

Those are the low-level mechanics.  At this point gdb is a library, albeit an unusual one that has a single entry point.  After this I needed a few tweaks to gdb’s startup process in order to make it work smoothly.  This too was no big deal.  Now I can write scripts from Python to do gdb things:

#!/usr/bin/python
import gdb
gdb.execute('file ./install/bin/gdb')
print 'sizeof = %d' % gdb.lookup_type('struct minimal_symbol').sizeof

Then:

$ python zz.py
72

Soon I’ll polish all the patches and submit this upstream.

Quick Multi-process Debugging Update

In my last post I mentioned that setting breakpoints is a pain when debugging multiple processes in GDB. While there are some bugs here (we’re actively working on them), it isn’t hard to make the basic case work.  In fact, there’s nothing to it.  Some background…

Starting with GDB 7.4, we changed how basic breakpoint specifiers (called “linespecs”) work.  Previously, a linespec applied somewhat randomly to the first matching symbol found in your code.  This behavior probably made sense in 1989, when all you had were statically linked executables; but nowadays it is much more common to have dozens of shared libraries, with the attendant name clashes.

So, instead of having GDB guess which symbol you meant, now a breakpoint just applies to all of them.  Our idea is that we’ll start supplying ways to narrow down exactly which spots you meant to name, say by adding syntax like “break libwhatever.so:function“, or whatever.

Anyway, this new work also applies across inferiors.  Here’s an example of debugging “make“, then setting a breakpoint on a function in libcpp (which itself is linked into a sub-process of gcc):

(gdb) b _cpp_lex_direct
Function "_cpp_lex_direct" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_cpp_lex_direct) pending.
(gdb) run
Starting program: /usr/bin/make
gcc -g -o crasher crasher.c
[New inferior 8761]
[New process 8761]
process 8761 is executing new program: /usr/bin/gcc
[New inferior 8762]
[New process 8762]
process 8762 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1

Breakpoint 1, 0x0000000000b156a0 in _cpp_lex_direct ()

The remaining issues have to do with breakpoint re-setting not doing the right thing with running inferiors. This causes some scary warnings when running, but I think for the time being you can just ignore those.

Well, I should say those are the known issues.  This feature hasn’t had as much use as I would like (judging from the low bug rate — I can’t tell if that is a good insight or a horrible realization).  So, try it out and report problems to GDB Bugzilla.  We’ll be making it work for you.

Debugging multiple programs at once

Consider this Makefile:

all: runit

runit: crasher
	./crasher

crasher: crasher.c
	gcc -g -o crasher crasher.c

And, here is the program it is building:

int *x = 0;

int main ()
{
  *x = 52;
}

Now, if you run “make“, eventually you will see a crash.  But how to debug the crash?

Well, obviously, this is a trivial example so you’d just debug the program.  But what if you had a complex script involving extensive and obscure initialization?  Say, in your test suite?  The traditional answer is logging plus cut and paste into gdb; or perhaps hacking an invocation of gdb --args into your script.  Nowadays you can do better, though.

Let’s start by debugging make:

$ gdb -quiet make
Reading symbols from /usr/bin/make...(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install make-3.82-8.fc16.x86_64

Now set things up for multi-inferior debugging:

(gdb) set detach-on-fork off
(gdb) set target-async on
(gdb) set non-stop on
(gdb) set pagination off

(Yes, it is silly how many settings you have to tweak; and yes, we’re going to fix this.)

Now do it:

(gdb) run
Starting program: /usr/bin/make
gcc -g -o crasher crasher.c
[New inferior 9694]
[New process 9694]
process 9694 is executing new program: /usr/bin/gcc
[New inferior 9695]
[New process 9695]
process 9695 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1
Missing separate debuginfos, use: debuginfo-install gcc-4.6.2-1.fc16.x86_64
[Inferior 3 (process 9695) exited normally]
[Inferior 9695 exited]
Missing separate debuginfos, use: debuginfo-install cpp-4.6.2-1.fc16.x86_64
(gdb) [New inferior 9696]
[New process 9696]
process 9696 is executing new program: /usr/bin/as
[Inferior 4 (process 9696) exited normally]
[Inferior 9696 exited]
[New inferior 9697]
[New process 9697]
process 9697 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/collect2
Missing separate debuginfos, use: debuginfo-install binutils-2.21.53.0.1-6.fc16.x86_64
[New inferior 9698]
[New process 9698]
process 9698 is executing new program: /usr/bin/ld.bfd
Missing separate debuginfos, use: debuginfo-install gcc-4.6.2-1.fc16.x86_64
[Inferior 6 (process 9698) exited normally]
[Inferior 9698 exited]
[Inferior 5 (process 9697) exited normally]
[Inferior 9697 exited]
[Inferior 2 (process 9694) exited normally]
[Inferior 9694 exited]
./crasher
[New inferior 9699]
[New process 9699]
process 9699 is executing new program: /tmp/crasher
Missing separate debuginfos, use: debuginfo-install binutils-2.21.53.0.1-6.fc16.x86_64

Program received signal SIGSEGV, Segmentation fault.
0x000000000040047f in main () at crasher.c:5
5      *x = 52;

Cool stuff.  Now you can inspect the crashed program:

(gdb) info inferior
Num  Description       Executable
  7    process 9699      /tmp/crasher
* 1    process 9691      /usr/bin/make
(gdb) inferior 7
[Switching to inferior 7 [process 9699] (/tmp/crasher)]
[Switching to thread 7 (process 9699)]
#0  0x000000000040047f in main () at crasher.c:5
5      *x = 52;

There is still a lot of work to do here — it is still a bit too slow, setting breakpoints is still a pain, etc. These are all things we’re going to be cleaning up in the coming year.

Difficulties of elisp

The thesis that underlies my project to translate the Emacs C code to Common Lisp is that Emacs Lisp is close enough to Common Lisp that the parts of the Emacs C code that implement Lisp can be dropped in favor of the generally superior CL implementation.  This is generally true, but there are a few difficult bits.

Symbols

The primary problem is the translation of symbols when used as variable references.  Consider this code:

(defvar global 73)
(defun function (argument)
  (let ((local (something-else))
    (+ local argument global)))

More is going on here than meets the eye.

First, Emacs Lisp uses dynamic binding by default (optional lexical binding is a new feature in Emacs 24).  This applies to function arguments as well as other bindings.  So, you might think you could translate this straightforwardly to:

(defvar global 73)
(declare (special global))
(defun function (argument)
  (declare (special argument))
  (let ((local (something-else))
    (declare (special local))
    (+ local argument global)))

This was the approach taken by elisp.lisp; it defined macros for let and let* (but forgot defun) to do the dirty work:

(defmacro el::let* ((&rest vars) &rest forms)
  "Emacs-Lisp version of `let*' (everything special)."
  `(let* ,vars (declare (special ,@(mapcar #'from-list vars))) ,@forms))

But not so fast!  Emacs also has buffer-local variables.  These are variables where the value is associated with the current buffer; switching buffers makes a different binding visible to Lisp.  These require no special syntax, and a variable can be made buffer-local at any time.  So, we can break the above translation simply by evaluating:

(make-local-variable 'global)
(setq global 0)

Whoops!  Now the function will return the wrong result — the translation will have no way to know that is should refer to the buffer-local value.  (Well, ok, pretend that the setq magically worked somehow…)

My idea for implementing this is pretty convoluted.  Actually I have two ideas, one “user” and one “kernel”:

User

I think it is possible to use define-symbol-macro on all symbols that come from Elisp, so that we can tell the CL compiler about the real implementation.  However, a symbol can either be treated as a variable, or it can be treated as a symbol-macro — not both at the same time.  So, we will need a second location of some kind to store the real value.  Right now I’m thinking a symbol in another package, but maybe a cons or some other object would work better. In either case, we’d need a macro, a setf method for its expansion, and some extra-tricky redefinitions of let and defun to account for this change.

This would look something like:

(define-symbol-macro global (elisp:get-elisp-value 'global))
(defsetf elisp:get-elisp-value elisp:set-elisp-value))
;; Details left as an exercise for the reader.

This solution then has to be applied to buffer-, keyboard-, and frame-local variables.

Kernel

The kernel method is a lot simpler to explain: hack a Common Lisp implementation to directly know about buffer-locals.  SMOP!  But on the whole I think this approach is to be less preferred.

Other Problems

Emacs Lisp also freely extends other typical data types with custom attributes.  I consider this part of the genius of Emacs; a more ordinary program would work within the strictures of some defined, external language, but Emacs is not so cautious or constrained.  (Emacs is sort of a case study in breaking generally accepted rules of programming; which makes one wonder whether those rules are any good at all.)

So, for example, strings in Emacs have properties as a built-in component.  The solution here is simple — we will just translate the Emacs string data type as a whole, something we probably have to do anyway, because Emacs also has its own idiosyncratic approach to different encodings.

In elisp, aref can be used to access elements of other vector-like objects, not just arrays; there are some other odd little cases like this.  This is also easily handled; but it left me wondering why things like aref aren’t generic methods in CL. It often seems to me that a simpler, more orthogonal language lies inside of CL, struggling to get free. I try not to think these thoughts, though, as that way lies Scheme and the ridiculous fragmentation that has left Lisp unpopular.

Emacs and Common Lisp, Part 2

This is a followup to my earlier post on converting the Emacs C code into Common Lisp.  This one is a bit more technical, diving into some specifics of the conversion process.

Basics

One important fact is that we do not need to convert an arbitrary C program to Common Lisp.  This might or might not be efficiently possible — but we do not care.  We only need to convert Emacs.  This is simpler for two reasons.  First, we can just ignore any C construct that Emacs does not use.  If the translator barfs after some new update, we can fix it then.  Second, Emacs itself is already written in a relatively Lispy style, being a Lisp implementation itself.  We further exploit this by allowing the translator to know some details about Emacs.  As a trivial example, all the Smumble globals created by the DEFUN marco need not be translated into Common Lisp as structure constants — they are an artifact of the implementation, and will show up directly in the generated defuns instead.

What to ignore

A good portion of Emacs is simply redundant in the CL world.  There are a few types (cons, vector, integers, functions) that are shareable — in fact, sharing these is part of the goal of this effort.  There are also a number of functions which are effectively identical.  There are also entire redundant modules, like the garbage collector, or the bytecode interpreter.

The question is how to have the translator differentiate between what is useful and what is not, without breaking builds of future versions of Emacs.

I don’t currently think there is a high road to solving this problem.  For modules like the GC, I plan to have ad hoc translator rules for the particular source files.  For functions and data types, I’m adding new GCC attributes that I can use to mark the ignorable definitions.

Types

There are two type-related issues that arise when translating the source.

First, how should Emacs-specific types be represented?  Primarily these types are structures, like struct buffer or struct string (we cannot use the CL string type, because Emacs adds properties directly to the string, and Emacs has its own idiosyncratic character handling).  My answer here is to just straightforwardly translate them to defstruct.

The other question is when translating a C function, what do we do with the types of local variables?  For the most part I am pretending that they don’t exist.  This works fine except for local arrays and structures, but these are easily handled by initializing variables properly. My rationale is that while this is slower, it lets me get something working more quickly, and we can always update the translator to emit CL type declarations later on.

This simple approach doesn’t actually cover all the needed cases.  For example, there is code in Emacs that takes the address of a local variable and passes it somewhere.  This is easy to deal with; much of the remaining work is just digging through the code looking for special cases to clean up.

I’m similarly omitting type declarations from the generated structures.  One possible nice side effect of this approach is that it will make it easier to lift Emacs’ file-size restrictions, because there will no longer be any code assuming that the size is a fixnum.

Macros

Many low-level details of the Emacs implementation are hidden in macros.  For example, Emacs stuffs some type information into the low-order bits of pointers.  It uses macros to add or remove this information.  For this build, I redefine these macros to do nothing.  This makes the GCC Gimple representation much closer to the abstract meaning of the program, and thus simpler to translate.

There are also some macros that are useful to redefine so that we can more easily hook into them from the translator.  For example, Emacs has a C macro INTEGERP that is used to check whether its argument is an integer.  Normally this macro uses bit twiddling to get its answer, but I redefine it like so:

#undef INTEGERP
extern Lisp_Object *INTEGERP (Lisp_Object)
    __attribute__((lisp_form("integerp")));

Example

The translator is not nearly complete, but it can already do a fair job at translating simple functions.  For example, here is “forward-point” from the Emacs C code:

DEFUN ("forward-point", Fforward_point, Sforward_point, 1, 1, 0,
       doc: /* Return buffer position N characters after (before if N negative) point.  */)
  (Lisp_Object n)
{
  CHECK_NUMBER (n);

  return make_number (PT + XINT (n));
}

Here is what the translator comes up with:

(defun Fforward_point (n)
  (let (
    temp-var-0
    Qintegerp.316
    temp-var-1
    current_buffer.317
    temp-var-2
    )
    (block nil (tagbody
      bb-0
        ; no gimple here
      bb-1
        ; no gimple here
      bb-2
        (setf temp-var-0 (integerp n))
        (if (== temp-var-0 nil)
          (go bb-3)
          (go bb-4))
      bb-3
        (setf Qintegerp.316 Qintegerp)
        (wrong_type_argument Qintegerp.316 n)
      bb-4
        (setf current_buffer.317 current_buffer)
        (setf temp-var-2 (buffer-pt current_buffer.317))
        (setf temp-var-1 (+ temp-var-2 n))
        (return temp-var-1)
  ))))

(defun elisp:forward-point (arg0)
  (Fforward_point arg0))

The output looks pretty weird, because the translator works after GCC’s CFG is built, and so the most straightforward translation is to use this mess with tagbody.  I doubt this matters much, but in any case the translator is readily hackable — it is still less than 400 lines of Python, including comments.

One thing to note is the translation of “PT“.  This is actually a macro that refers to the current buffer:

#define PT (current_buffer->pt + 0)

The translator properly turns this into a reference to “buffer-pt“.

Another detail is the handling of packages.  My plan is to put the Emacs implementation into one package, and then any elisp into a second package called “elisp“.  A DEFUN in the C code will actually generate two functions: the internal one, and the elisp-visible one; hence the “elisp:” in the translation.

Next Steps

There’s still a good amount of work to be done.  The converter punts on various constructs; type translation is implemented but not actually wired up to anything; the translator should emit definitions for alien functions; and plenty more.

Valgrind and GDB

Valgrind 3.7.0 now includes an embedded gdbserver, which is wired to the valgrind innards in the most useful way possible.  What this means is that you can now run valgrind in a special mode (simply pass --vgdb-error=0), then attach to it from gdb, just as if you were attaching to a remote target.  Valgrind will helpfully tell you exactly how to do this.  Then you can debug as usual, and also query valgrind’s internal state as you do so.  Valgrind will also cause the program to stop if it hits some valgrind event, like a use of an uninitialized value.

For example, consider this incorrect program, e.c:

#include 
int main ()
{
  int x;
  x = x > 0 ? x : x + 1;
  return x;
}

After compiling it (calling it /tmp/e), we can start valgrind:

$ valgrind --vgdb-error=0 err
==20836== Memcheck, a memory error detector
==20836== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==20836== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==20836== Command: /tmp/e
==20836== 
==20836== (action at startup) vgdb me ... 
==20836== 
==20836== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==20836==   /path/to/gdb /tmp/e
==20836== and then give GDB the following command
==20836==   target remote | vgdb --pid=20836
==20836== --pid is optional if only one valgrind process is running
==20836== 

Now, in Emacs (or another console if you insist) we start gdb on /tmp/e and enter the command above. Valgrind has paused our program at the first instruction. Now we can “continue” to let it run:

Reading symbols from /tmp/e...done.
(gdb) target remote | vgdb --pid=20836
Remote debugging using | vgdb --pid=20836
relaying data between gdb and process 20836
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib64/ld-2.14.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
[Switching to Thread 20836]
0x0000003a15c016b0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) c
Continuing.

Now the inferior stops, because we hit a use of an uninitialized value:

Program received signal SIGTRAP, Trace/breakpoint trap.
0x000000000040047c in main () at e.c:5
5	  x = x > 0 ? x : x + 1;
(gdb) 

If we look back at the valgrind window, we see:

==20836== Conditional jump or move depends on uninitialised value(s)
==20836==    at 0x40047C: main (e.c:5)

(It would be nice if this showed up in gdb; I’m not sure why it doesn’t.)

Valgrind also provides ways to examine what is happening, via gdb’s monitor command. This is helpfully documented online:

(gdb) monitor help
general valgrind monitor commands:
  help [debug]             : monitor command help. With debug: + debugging commands
[... lots more here ...]

A few improvements are possible; e.g., right now it is not possible to start a new program using valgrind from inside gdb. This would be a nice addition (I think something like “target valgrind“, but other maintainers have other ideas).

I think this is a major step forward for debugging. Thanks to Philippe Waroquiers and Julian Seward for making it happen.

GCC and Python

When writing an earlier post, I realized I haven’t yet written about the Python plugin for GCC.

This is awesome! It is by far the simplest way to write a GCC plugin. The primary reason is that the author, the amazing David Malcolm, has put a lot of effort into the polish: this plugin is the simplest one to build (“make” works for me, with the Fedora 15 system GCC) and also the one with the best documentation.

Why would you want to write a plugin? Pretty much every program — and especially every C program, as C has such bad metaprogramming support — has rules which cannot be expressed directly in the language. One usually resorts to various tricks, and in extremis patch review by all-knowing maintainers, to preserve these. The plugin offers another way out: write a Python script to automate the checking.

I’ve already written a couple of custom checkers for use on GDB (which is a very idiosyncratic C program, basically written in an oddball dialect of C++), which have found real bugs.  These checkers cover things that no generic static analysis tool would ever correctly check, e.g., for the proper use of GDB’s exception handling system.  The exception checker, which we use to verify that we’re correctly bridging between Python’s exception system and GDB’s, took less than a day to write.

13. Breakpoints

Phil Muldoon added support for breakpoints to the Python API in gdb this past year.  While work here is ongoing, you can already use it to do neat things which can’t be done from the gdb CLI.

The interface to breakpoints is straightforward.  There is a new Breakpoint class which you can instantiate.  Objects of this type have various attributes and methods, corresponding roughly to what is available from the CLI — with one nice exception.

The new bit is that you can subclass Breakpoint and provide a stop method.  This method is called when the breakpoint is hit and gets to determine whether the breakpoint should cause the inferior to stop.  This lets you implement special breakpoints that collect data, but that don’t interfere with other gdb operations.

If you are a regular gdb user, you might think that this is possible by something like:

break file.c:73
commands
  silent
  python collect_some_data()
  cont
end

Unfortunately, however, this won’t work — if you try to “next” over this breakpoint, your “next” will be interrupted, and the “cont” will cause your inferior to start running free again, instead of stopping at the next line as you asked it to.  Whoops!

Here’s some example code that adds a new “lprintf” command.  This is a “logging printf” — you give it a location and (gdb-style) printf arguments, and it arranges to invoke the printf at that location, without ever interrupting other debugging.

This code is a little funny in that the new breakpoint will still show up in “info break“.  Eventually (this is part of the ongoing changes) you’ll be able to make new breakpoints show up there however you like; but meanwhile, it is handy not to mark these as internal breakpoints, so that you can easily delete or disable them (or even make them conditional) using the normal commands.

import gdb

class _LPrintfBreakpoint(gdb.Breakpoint):
    def __init__(self, spec, command):
        super(_LPrintfBreakpoint, self).__init__(spec, gdb.BP_BREAKPOINT,
                                                 internal = False)
        self.command = command

    def stop(self):
        gdb.execute(self.command)
        return False

class _LPrintfCommand(gdb.Command):
    """Log some expressions at a location, using 'printf'.
    Usage: lprintf LINESPEC, FORMAT [, ARG]...
    Insert a breakpoint at the location given by LINESPEC.
    When the breakpoint is hit, do not stop, but instead pass
    the remaining arguments to 'printf' and continue.
    This can be used to easily add dynamic logging to a program
    without interfering with normal debugger operation."""

    def __init__(self):
        super(_LPrintfCommand, self).__init__('lprintf',
                                              gdb.COMMAND_DATA,
                                              # Not really the correct
                                              # completer, but ok-ish.
                                              gdb.COMPLETE_SYMBOL)

    def invoke(self, arg, from_tty):
        (remaining, locations) = gdb.decode_line(arg)
        if remaining is None:
            raise gdb.GdbError('printf format missing')
        remaining = remaining.strip(',')
        if locations is None:
            raise gdb.GdbError('no matching locations found')

        spec = arg[0:- len(remaining)]
        _LPrintfBreakpoint(spec, 'printf ' + remaining)

_LPrintfCommand()

12. Events

There have been many new Python scripting features added to gdb since my last post on the topic.  The one I want to focus on today is event generation.

I wrote a little about events in gdb-python post #9 — but a lot has changed since then.  A Google SoC student, Oguz Kayral, wrote better support for events in 2009.  Then, Sami Wagiaalla substantially rewrote it and put it into gdb.

In the new approach, gdb provides a number of event registries.  An event registry is just an object with connect and disconnect methods.  Your code can use connect to register a callback with a registry; the callback is just any callable object.  The event is passed to the callable as an argument.

Each registry emits specific events — “emitting” an event just means calling all the callables that were connected to the registry.  For example, the gdb.events.stop registry emits events when an inferior or thread has stopped for some reason.  The event describes the reason for the stop — e.g., a breakpoint was hit, or a signal was delivered.

Here’s a script showing this feature in action.  It arranges for a notification to pop up if your program stops unexpectedly — if your program exits normally, nothing is done.  Something like this could be handy for automating testing under gdb; you could augment it by having gdb automatically exit if gdb.events.exited fires.  You could also augment it by setting a conditional breakpoint to catch a rarely-seen condition; then just wait for the notification to appear.

To try this out, just “source” it into gdb.  Then, run your program in various ways.

import gdb, threading, Queue, gtk, glib, os, pynotify

(read_pipe, write_pipe) = os.pipe()

event_queue = Queue.Queue()

def send_to_gtk(func):
    event_queue.put(func)
    os.write(write_pipe, 'x')

def on_stop_event(event):
    n = pynotify.Notification('Your program stopped in gdb')
    n.show()

class GtkThread(threading.Thread):
    def handle_queue(self, source, condition):
        global event_queue
        os.read(source, 1)
        func = event_queue.get()
        func()

    def run(self):
        global read_pipe
        glib.io_add_watch(read_pipe, glib.IO_IN, self.handle_queue)
        gtk.main()

gdb.events.stop.connect(on_stop_event)

gtk.gdk.threads_init()

pynotify.init('gdb')

t = GtkThread()
t.setDaemon(True)
t.start()

Emacs and Common Lisp

Recently I’ve been thinking about how to rebase Emacs on Common Lisp.

First, why rebase?  Performance is the biggest reason.  Emacs Lisp is a very basic lisp implementation.  It has a primitive garbage collector and basic execution model, and due to how it is written, it is quite hard to improve this in place.

Seccond, why Common Lisp?  Two reasons: first, Emacs Lisp resembles Common Lisp in many ways; elisp is like CL’s baby brother.  Second, all of the hard problems in Lisp execution have already been solved excellently by existing, free-software CL implementations.  In particular, the good CL implementations have much better garbage collectors, native compilation, threads, and FFI; we could expose the latter two to elisp in a straightforward way.

By “rebase” I mean something quite ambitious — rewrite the C source of Emacs into Common Lisp.  I think this can largely be automated via a GCC plugin (e.g., written using David Malcolm’s Python plugin).  Full automation would let the CL port be just another way to build Emacs, following the upstream development directly until all the Emacs maintainers can be convinced to drop C entirely (cough, cough).

Part of the rewrite would be dropping code that can be shared with CL.  For example, we don’t need to translate the Emacs implementation of “cons“, we can just use the CL one.

Some CL glue would be needed to make this all work properly.  These days it can’t be quite as small as elisp.lisp, but it still would not need to be very big.  The trickiest problem is dealing with buffer-local variables; but I think that can be done by judicious use of define-symbol-macro in the elisp reader.

Emacs might be the only program in the world that would see a performance improvement from rewriting in CL :-).  The reason for this is simple: Emacs’ performance is largely related to how well it executes lisp code, and how well the GC works.