Archive for the ‘gdb’ Category

Shaggy Dogs and SpiderMonkey Unwinders

A year or so ago I was asked to debug a crash in the Firefox devtools.  Crashes are easy!  I fired up gdb and reproduced the crash… which turned out to be in some code JITted by SpiderMonkey.  I was immediately lost; even a simple bt did not work.  Someone more familiar with the JIT — hi Shu — had to dig out the answer :-(.

I did take the opportunity to get some information from him about how he found the result, though.  He pointed me to the code responsible for laying out JIT stack frames.  It turned out that gdb could not unwind through JIT frames, but it could be done by hand — so I resolved then to eventually fix this.

Phase One

I knew from my gdb hacking that gdb has a JIT unwinding API.  Actually — and isn’t this the way most programs end up working? — it has two.

The first JIT API requires some extra work on the part of the JIT: it constructs an object file, typically ELF and DWARF, in memory, then calls a hook.  GDB sets a breakpoint on this hook and, when hit, it reads the data from the inferior.  This lets the JIT provide basically any kind of information — but it’s pretty heavy.

So, I focused my attention on the second API.  In this mode, the JIT author would provide a shared library that used some callbacks to inform gdb of the details of what was going on.  The set of callbacks was much more limited, but could at least describe how to unwind the registers.  So, I figured that this is what I would do.

But… I didn’t really want to write this in C.  That would be a real pain!  C is fiddly and hard to deal with, and it would mean constant rebuilding of the shared library while debugging, and SpiderMonkey already had a reasonable number of gdb-python scripts — surely this could be done in Python.

So I took the quixotic approach, namely writing a shared library that used the second gdb JIT API but only to expose this API to Python.

Of course, this turned out to be Rube Goldbergian.  Various parts of the gdb Python API could not be called from the JIT shared library, because those bits depended on other state in gdb, which wasn’t set properly when the JIT library was being called.  So, I had gdb calling into my shared library, which called my Python code, which then invoked a new gdb command (written in Python and supplied by my package) — that existed solely for the purpose of setting this internal state properly — and that in turn invoked the code I wanted to run, say to fetch memory or a register or something.

Computer Science!

Well, that took a while.  But it sort of worked!  And maybe I could just keep it in github and not put it in Mozilla Central and avoid learning about the Firefox build system and copying in some gdb header file and license review and whatnot.

So I started writing the actual Python code… OMG.  And see below since you will totally want to know about this.  But meanwhile…

… while I was hacking away on this crazy idea, someone implemented the much more sane idea of just exposing gdb’s unwinder API to gdb’s Python layer.

Hmm… why didn’t I do that?  Well, I left gdb under a bit of a cloud, and didn’t really want to be that involved at the time.  Plus, you know, gdb is a high quality project; which means that if you write a giant patch to expose the unwinding API, you have to be prepared for 17 rounds of patch review (this really happened once), plus writing documentation and tests.  Sometimes it’s just easier to channel one’s inner Rube.

Phase Two

The integrated Python API was a great development.  Now I could delete my shared library and my insane trampoline hacks, and focus on my insane unwinding code.

A lot of this work was straightforward, in the sense that the general outline was clear and just the details remained.  The details amount to things like understanding the SpiderMonkey frame descriptor (which partly describes the previous frame and partly the new frame; there’s one comment explaining this that somehow eluded me for quite a while); duplicating the SpiderMonkey JIT unwinding code in Python; and of course carefully reading the SpiderMonkey code that JITs the “entry frame” code to understand how registers are spilled.

Naturally, while doing this it turned out that I was maybe the first person to use these gdb APIs in anger.  I found some gdb crashes, oops!  The docs would have been impenetrable, except I already knew the underlying C APIs on which they were based… whew!  The Python API was unexpectedly picky in other areas, too.

But then there was also some funny business, one part in gdb, and one part in SpiderMonkey.

GDB is probably more complicated than you realize.  In this case, the complexity is that, in gdb, each stack frame can have its own architecture.  This seemingly weird functionality is actually used; I think it was invented for the SPU, but some other chips have multiple modes as well.  But what this means is that the question “what architecture is this program?” is not well-defined, and anyway gdb’s Python layer doesn’t provide you a way to find whatever approximation it is that would make sense in your specific case.  However, when writing the SpiderMonkey unwinder, it kind of actually is well-defined and we’d like to know the answer to know which unwinder to choose.

For this problem I settled on the probably terrible idea of checking whether a given register is available.  That is, if you see “$rip“, you can guess it’s x86-64.

The other problem here is that gdb thinks that, since you wrote an unwinder, it should get the first stab at unwinding.  That’s very polite!  But for SpiderMonkey, deciding “hey, is this PC in some code the JIT emitted?” is actually a real pain, or at least outside the random bits of it I learned in order to make all this work.

Aha!  I know, there’s probably a Python API to say “is this address associated with some shared library?”  I remembered reading and/or reviewing a patch… but no, gdb.solib_name is close but doesn’t do the right thing for addresses in the main executable.  WAT.

I tried several tricks without success, and in the end I went with parsing /proc/maps to get the mappings to decide whether a given frame should be handled by this unwinder or by gdb.  Horrible.  And fails with remote debugging.

Luckily, nobody does remote debugging.

Remote Debugging

Oh, wait, people do remote debugging at Mozilla all the time.  They don’t call it “remote debugging” though — they call it “using RR“, which while it runs locally, appears to be remote to gdb; and, importantly, during replay mode fakes the PID, and does other deep magic, though not deep enough to extend to making a fake map file that could be read via gdb’s remote get command.

By the way, you should be using RR.  It’s the best advance in debugging since, well, gdb.  It’s a process record-and-replay program, but unlike gdb’s built-in reverse debugging, it handles threads properly and has decent performance.

Oh Well

Oh well.  It just won’t work remotely.  Or at least not until fellow Mozillian (this always seems like it should be “Mozillan” to me, but it’s not, there really is that extra “i”) and all-star Nicolas Pierron wrote some additional Python to read some SpiderMonkey tables to make the decision in a more principled way.  Now it will all work!

Though looking now I wonder if I dreamed this, because the code isn’t checked in.  I know he had a patch but my memory is a bit fuzzy — maybe in the end it didn’t work, because RR didn’t implement the qGetTLSAddr packet, which gdb uses to read thread-local storage.  Did I mention the thread-locals?

The Real Start of the Story

So, way back at the beginning, during my initial foray into this code, I found that a crucial bit of information — the appropriately-named TlsPerThreadData — was stashed away in a thread-local variable.  Information stored here is needed by the unwinder in order to unwind from a C++ frame into a JIT frame.

Only, Firefox didn’t use “real” thread-local variables, the things that so many glibc and gcc hackers put so much effort into micro-optimizing.  No, it just used a template class that wrapped pthread_setspecific and friends in a relatively ergonomic way.

Naturally, for an unwinder this is a disaster.  Why?  Unwinding is basically the dissection of the stack; but in order to compute the value of one of these thread-local-storage objects, the unwinder would have to make some function calls in the inferior (in fact this prevents it from working on OSX).  But these would affect the stack, and also potentially let other inferior code (in other threads — remember, gdb is complicated and you can exert various unusual kinds of control like this) run as well.

So I neglected to mention the very first step: changing Firefox to use __thread.  (Ok, I didn’t really neglect to mention it, I was just being lazy and anyway it’s a shaggy dog story.)

Do Not Use libthread_db

RR did not implement qGetTLSAddr, which we needed, because  lots of people at Mozilla use RR.  So I set out to implement that.  This meant a foray into the dangerous world of libthread_db.

For reasons I do not know, and suspect that I do not want to know, glibc has historically followed many Solaris conventions.  One such Solaris innovation was libthread_db — a library that debuggers use to find certain information from libc, information like the address of a thread-local variable

On the surface this seems like a great idea: don’t bake the implementation details of the C library into the debugger.  Instead, let the debugger use a debugging library that comes with the C library.  And, if you designed it that way, it would be a good idea.

Sadly, though, libthread_db was not designed that way.  Oh no.

For example, libthread_db has a callback interface.  The calling program — gdb or rr — must provide some functions that libthread_db can call, to do some simple things like “read some memory”; or some very complicated things like “find the address of a symbol given its name”.  Normal C programmers might implement these callbacks using a structure containing function pointers.  But not libthread_db!  Instead it uses fixed symbol names that must be provided by the calling application.  Not all of these are required for it to work (you get to figure out which, yay!), but some definitely are.  And, you have to dlopen a libthread_db that matches the libc of the inferior that you’re debugging (or link against it, but that’s also obviously bad).

Wait, you say.  Doesn’t that mess up cross-debugging?  Why yes!  Yes it does!  Which is why qGetTLSAddr has to be in the gdb remote serial protocol to start with.

Hey, maybe the Linux vendors should fix this.  They are — see Gary Benson’s Infinity project — but unfortunately that’s still in development and I wanted RR to work sooner.

Ok, so whew.  I wrote qGetTLSAddr support for RR.  This was a small patch in the end, but an unusual pain in an already painful series.  Hopefully this won’t spill out into other programs.


Hahaha, you are so funny.  Of course it spills out: remember how you have to define a bunch of functions with specific names in your program in order to use libthread_db?  Well, how do you know you got the types correct?

Yeah, you include <proc_service.h> (a name deliberately chosen to confuse, I suppose, why not, it doesn’t bear any obvious relationship to the library).  Only, that was never installed by glibc.  Instead, gdb just copied it into the source tree.

So naturally I went and fixed this in glibc.  And, even more naturally, this broke the gdb build, which was autoconf’d to check for a file that never existed in the past.  LOL.

Thank You Cthulhu

At this point I figured it was only a matter of time until I had to patch the kernel.  Thankfully this hasn’t been necessary yet.

It Says What

In gdb the actual unwinding and the display of frames are separate concerns.

And let me digress here to say that gdb’s unwinder design is excellent.  I believe it was redone by Andrew Cagney (this was well before my active time in gdb, so apologies if you’re reading this and you did it and I’ve misattributed it).  Like much of gdb, many of the details are bizarre and take one back to the byte-counting days of 1987; but the high level design is very solid and has endured with, I think, just one significant change (to support inline functions) in the intervening 15 or so years.  I’ve long thought that this is a remarkable accomplishment in the programming world.

So, yes.  It’s not enough to just unwind.  Simply having an unwinder yields backtraces with lines like:

#5 0xfeefee ???

Better than nothing!  But not yet great.

The second part of the SpiderMonkey unwinder is, therefore, a gdb “frame filter”.  This is an object that takes raw frames and decorates them with information like a function name, or a file name, or arguments.

Work to add this information is ongoing — I landed one patch just yesterday, and another one, to add more information about interpreted frames, is still in the works.  And there are two more bugs filed… maybe this project, like this blog post, will never conclude.  It will just scroll endlessly.

But now, with all the code in place, bt can show something like:

#6 0x00007ffff7ff20f3 in <<JitFrame_BaselineJS "f1">> (this=JSVAL_VOID, arg1=$jsval(4700))

This is the call f1(4700).

Let’s Just Have One More

Of course we still couldn’t enable this unwinder by default.  You have to enable it by hand.

And by the way, in the first release of gdb’s Python unwinder feature, enabling or disabling an unwinder didn’t flush the frame cache, so it wouldn’t actually take effect until some invisible-to-the-user state change took place.  I fixed this bug, but here Pedro Alves also taught me the secret gdb command flushregs, which in fact just flushes the frame cache. (I’m going to go out on a limb and guess that this command predates the already ancient maint prefix command, hence its weird name.)

Anyway, you have to enable it by hand because the unwinder itself doesn’t work properly if the outermost frame is in JIT code.  The JIT, in the interest of performance, doesn’t maintain a frame pointer.  This means that in the outermost frame, there’s no reliable way to find the object that describes this frame and links to the previous frame.

Now, normally in this case gdb would either resort to debug info (not available here), or in extremis its encyclopedic suite of prologue analyzers (yes, gdb can analyze common function prologues for all architectures developed in the last 25 years to figure out stuff) — but naturally JIT compilers go their own way here as well.

Humans, like Shu back at the start of this story, can do this by dumping parts of the stack and guessing which bytes represent the frame header.

But, I’ve been reluctant and a bit afraid to hack a heuristic into the unwinder.

To sum up — in case you missed it — this means that all the code written during this entire saga would still not have helped with my original bug.

The End

GDB Preattach

In firefox development, it’s normal to do most development tasks via the mach command. Build? Use mach. Update UUIDs? Use mach. Run tests? Use mach. Debug tests? Yes, mach mochitest --debugger gdb.

Now, normally I run gdb inside emacs, of course. But this is hard to do when I’m also using mach to set up the environment and invoke gdb.

This is really an Emacs bug. GUD, the Emacs interface to all kinds of debuggers, is written as its own mode, but there’s no really great reason for this. It would be way cooler to have an adaptive shell mode, where running the debugger in the shell would magically change the shell-ish buffer into a gud-ish buffer. And somebody — probably you! — should work on this.

But anyway this is hard and I am lazy. Well, sort of lazy and when I’m not lazy, also unfocused, since I came up with three other approaches to the basic problem. Trying stuff out and all. And these are even the principled ways, not crazy stuff like screenify.

Oh right, the basic problem.  The basic problem with running gdb from mach is that then you’re just stuck in the terminal. And unless you dig the TUI, which I don’t, terminal gdb is not that great to use.

One of the ideas, in fact the one this post is about, since this post isn’t about the one that I couldn’t get to work, or the one that is also pretty cool but that I’m not ready to talk about, was: hey, can’t I just attach gdb to the test firefox? Well, no, of course not, the test program runs too fast (sometimes) and racing to attach is no fun. What would be great is to be able to pre-attach — tell gdb to attach to the next instance of a given program.

This requires kernel support. Once upon a time there were some gdb and kernel patches (search for “global breakpoints”) to do this, but they were never merged. Though hmm! I can do some fun kernel stuff with SystemTap…

Specifically what I did was write a small SystemTap script to look for a specific exec, then deliver a SIGSTOP to the process. Then the script prints the PID of the process. On the gdb side, there’s a new command written in Python that invokes the SystemTap script, reads the PID, and invokes attach. It’s a bit hacky and a bit weird to use (the SIGSTOP appears in gdb to have been delivered multiple times or something like that). But it works!

It would be better to have this functionality directly in the kernel. Somebody — probably you! — should write this. But meanwhile my hack is available, along with a few other gdb scxripts, in my gdb helpers github repository.

import gdb

Occasionally I see questions about how to import gdb from the ordinary Python interpreter.  This turns out to be surprisingly easy to implement.

First, a detour into PIE and symbol visibility.

“PIE” stands for “Position Independent Executable”.  It uses essentially the same approach as a shared library, except it can be applied to the executable.  You can easily build a PIE by compiling the objects with the -fPIE flag, and then linking the resulting executable with -pie.  Normally PIEs are used as a security feature, but in our case we’re going to compile gdb this way so we can have Python dlopen it, following the usual Python approach: we install it as and add a a module initialization function, init_gdb. (We actually name the module “_gdb“, because that is what the gdb C code creates; the “gdb” module itself is already plain Python that happens to “import _gdb“.)

Why install the PIE rather than make a true shared library?  It is just more convenient — it doesn’t require a lot of configure and Makefile hacking, and it doesn’t slow down the build by forcing us to link gdb against a new library.

Next, what about all those functions in gdb?  There are thousands of them… won’t they possibly cause conflicts at dlopen time?  Why yes… but that’s why we have symbol visibility.  Symbol visibility is an ELF feature that lets us hide all of gdb’s symbols from any dlopen caller.  In fact, I found out during this process that you can even hide main, as seems to ignore visibility bits for this function.

Making this work is as simple as adding -fvisibility=hidden to our CFLAGS, and then marking our Python module initialization function with __attribute__((visibility("default"))).  Two notes here.  First, it’s odd that “default” means “public”; just one of those mysterious details.  Second, Python’s PyMODINIT_FUNC macro ought to do this already, but it doesn’t; there’s a Python bug.

Those are the low-level mechanics.  At this point gdb is a library, albeit an unusual one that has a single entry point.  After this I needed a few tweaks to gdb’s startup process in order to make it work smoothly.  This too was no big deal.  Now I can write scripts from Python to do gdb things:

import gdb
gdb.execute('file ./install/bin/gdb')
print 'sizeof = %d' % gdb.lookup_type('struct minimal_symbol').sizeof


$ python

Soon I’ll polish all the patches and submit this upstream.

Quick Multi-process Debugging Update

In my last post I mentioned that setting breakpoints is a pain when debugging multiple processes in GDB. While there are some bugs here (we’re actively working on them), it isn’t hard to make the basic case work.  In fact, there’s nothing to it.  Some background…

Starting with GDB 7.4, we changed how basic breakpoint specifiers (called “linespecs”) work.  Previously, a linespec applied somewhat randomly to the first matching symbol found in your code.  This behavior probably made sense in 1989, when all you had were statically linked executables; but nowadays it is much more common to have dozens of shared libraries, with the attendant name clashes.

So, instead of having GDB guess which symbol you meant, now a breakpoint just applies to all of them.  Our idea is that we’ll start supplying ways to narrow down exactly which spots you meant to name, say by adding syntax like “break“, or whatever.

Anyway, this new work also applies across inferiors.  Here’s an example of debugging “make“, then setting a breakpoint on a function in libcpp (which itself is linked into a sub-process of gcc):

(gdb) b _cpp_lex_direct
Function "_cpp_lex_direct" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_cpp_lex_direct) pending.
(gdb) run
Starting program: /usr/bin/make
gcc -g -o crasher crasher.c
[New inferior 8761]
[New process 8761]
process 8761 is executing new program: /usr/bin/gcc
[New inferior 8762]
[New process 8762]
process 8762 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1

Breakpoint 1, 0x0000000000b156a0 in _cpp_lex_direct ()

The remaining issues have to do with breakpoint re-setting not doing the right thing with running inferiors. This causes some scary warnings when running, but I think for the time being you can just ignore those.

Well, I should say those are the known issues.  This feature hasn’t had as much use as I would like (judging from the low bug rate — I can’t tell if that is a good insight or a horrible realization).  So, try it out and report problems to GDB Bugzilla.  We’ll be making it work for you.

Debugging multiple programs at once

Consider this Makefile:

all: runit

runit: crasher

crasher: crasher.c
	gcc -g -o crasher crasher.c

And, here is the program it is building:

int *x = 0;

int main ()
  *x = 52;

Now, if you run “make“, eventually you will see a crash.  But how to debug the crash?

Well, obviously, this is a trivial example so you’d just debug the program.  But what if you had a complex script involving extensive and obscure initialization?  Say, in your test suite?  The traditional answer is logging plus cut and paste into gdb; or perhaps hacking an invocation of gdb --args into your script.  Nowadays you can do better, though.

Let’s start by debugging make:

$ gdb -quiet make
Reading symbols from /usr/bin/make...(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install make-3.82-8.fc16.x86_64

Now set things up for multi-inferior debugging:

(gdb) set detach-on-fork off
(gdb) set target-async on
(gdb) set non-stop on
(gdb) set pagination off

(Yes, it is silly how many settings you have to tweak; and yes, we’re going to fix this.)

Now do it:

(gdb) run
Starting program: /usr/bin/make
gcc -g -o crasher crasher.c
[New inferior 9694]
[New process 9694]
process 9694 is executing new program: /usr/bin/gcc
[New inferior 9695]
[New process 9695]
process 9695 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1
Missing separate debuginfos, use: debuginfo-install gcc-4.6.2-1.fc16.x86_64
[Inferior 3 (process 9695) exited normally]
[Inferior 9695 exited]
Missing separate debuginfos, use: debuginfo-install cpp-4.6.2-1.fc16.x86_64
(gdb) [New inferior 9696]
[New process 9696]
process 9696 is executing new program: /usr/bin/as
[Inferior 4 (process 9696) exited normally]
[Inferior 9696 exited]
[New inferior 9697]
[New process 9697]
process 9697 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/collect2
Missing separate debuginfos, use: debuginfo-install binutils-
[New inferior 9698]
[New process 9698]
process 9698 is executing new program: /usr/bin/ld.bfd
Missing separate debuginfos, use: debuginfo-install gcc-4.6.2-1.fc16.x86_64
[Inferior 6 (process 9698) exited normally]
[Inferior 9698 exited]
[Inferior 5 (process 9697) exited normally]
[Inferior 9697 exited]
[Inferior 2 (process 9694) exited normally]
[Inferior 9694 exited]
[New inferior 9699]
[New process 9699]
process 9699 is executing new program: /tmp/crasher
Missing separate debuginfos, use: debuginfo-install binutils-

Program received signal SIGSEGV, Segmentation fault.
0x000000000040047f in main () at crasher.c:5
5      *x = 52;

Cool stuff.  Now you can inspect the crashed program:

(gdb) info inferior
Num  Description       Executable
  7    process 9699      /tmp/crasher
* 1    process 9691      /usr/bin/make
(gdb) inferior 7
[Switching to inferior 7 [process 9699] (/tmp/crasher)]
[Switching to thread 7 (process 9699)]
#0  0x000000000040047f in main () at crasher.c:5
5      *x = 52;

There is still a lot of work to do here — it is still a bit too slow, setting breakpoints is still a pain, etc. These are all things we’re going to be cleaning up in the coming year.

Valgrind and GDB

Valgrind 3.7.0 now includes an embedded gdbserver, which is wired to the valgrind innards in the most useful way possible.  What this means is that you can now run valgrind in a special mode (simply pass --vgdb-error=0), then attach to it from gdb, just as if you were attaching to a remote target.  Valgrind will helpfully tell you exactly how to do this.  Then you can debug as usual, and also query valgrind’s internal state as you do so.  Valgrind will also cause the program to stop if it hits some valgrind event, like a use of an uninitialized value.

For example, consider this incorrect program, e.c:

int main ()
  int x;
  x = x > 0 ? x : x + 1;
  return x;

After compiling it (calling it /tmp/e), we can start valgrind:

$ valgrind --vgdb-error=0 err
==20836== Memcheck, a memory error detector
==20836== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==20836== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==20836== Command: /tmp/e
==20836== (action at startup) vgdb me ... 
==20836== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==20836==   /path/to/gdb /tmp/e
==20836== and then give GDB the following command
==20836==   target remote | vgdb --pid=20836
==20836== --pid is optional if only one valgrind process is running

Now, in Emacs (or another console if you insist) we start gdb on /tmp/e and enter the command above. Valgrind has paused our program at the first instruction. Now we can “continue” to let it run:

Reading symbols from /tmp/e...done.
(gdb) target remote | vgdb --pid=20836
Remote debugging using | vgdb --pid=20836
relaying data between gdb and process 20836
Reading symbols from /lib64/ symbols from /usr/lib/debug/lib64/
Loaded symbols for /lib64/
[Switching to Thread 20836]
0x0000003a15c016b0 in _start () from /lib64/
(gdb) c

Now the inferior stops, because we hit a use of an uninitialized value:

Program received signal SIGTRAP, Trace/breakpoint trap.
0x000000000040047c in main () at e.c:5
5	  x = x > 0 ? x : x + 1;

If we look back at the valgrind window, we see:

==20836== Conditional jump or move depends on uninitialised value(s)
==20836==    at 0x40047C: main (e.c:5)

(It would be nice if this showed up in gdb; I’m not sure why it doesn’t.)

Valgrind also provides ways to examine what is happening, via gdb’s monitor command. This is helpfully documented online:

(gdb) monitor help
general valgrind monitor commands:
  help [debug]             : monitor command help. With debug: + debugging commands
[... lots more here ...]

A few improvements are possible; e.g., right now it is not possible to start a new program using valgrind from inside gdb. This would be a nice addition (I think something like “target valgrind“, but other maintainers have other ideas).

I think this is a major step forward for debugging. Thanks to Philippe Waroquiers and Julian Seward for making it happen.

13. Breakpoints

Phil Muldoon added support for breakpoints to the Python API in gdb this past year.  While work here is ongoing, you can already use it to do neat things which can’t be done from the gdb CLI.

The interface to breakpoints is straightforward.  There is a new Breakpoint class which you can instantiate.  Objects of this type have various attributes and methods, corresponding roughly to what is available from the CLI — with one nice exception.

The new bit is that you can subclass Breakpoint and provide a stop method.  This method is called when the breakpoint is hit and gets to determine whether the breakpoint should cause the inferior to stop.  This lets you implement special breakpoints that collect data, but that don’t interfere with other gdb operations.

If you are a regular gdb user, you might think that this is possible by something like:

break file.c:73
  python collect_some_data()

Unfortunately, however, this won’t work — if you try to “next” over this breakpoint, your “next” will be interrupted, and the “cont” will cause your inferior to start running free again, instead of stopping at the next line as you asked it to.  Whoops!

Here’s some example code that adds a new “lprintf” command.  This is a “logging printf” — you give it a location and (gdb-style) printf arguments, and it arranges to invoke the printf at that location, without ever interrupting other debugging.

This code is a little funny in that the new breakpoint will still show up in “info break“.  Eventually (this is part of the ongoing changes) you’ll be able to make new breakpoints show up there however you like; but meanwhile, it is handy not to mark these as internal breakpoints, so that you can easily delete or disable them (or even make them conditional) using the normal commands.

import gdb

class _LPrintfBreakpoint(gdb.Breakpoint):
    def __init__(self, spec, command):
        super(_LPrintfBreakpoint, self).__init__(spec, gdb.BP_BREAKPOINT,
                                                 internal = False)
        self.command = command

    def stop(self):
        return False

class _LPrintfCommand(gdb.Command):
    """Log some expressions at a location, using 'printf'.
    Usage: lprintf LINESPEC, FORMAT [, ARG]...
    Insert a breakpoint at the location given by LINESPEC.
    When the breakpoint is hit, do not stop, but instead pass
    the remaining arguments to 'printf' and continue.
    This can be used to easily add dynamic logging to a program
    without interfering with normal debugger operation."""

    def __init__(self):
        super(_LPrintfCommand, self).__init__('lprintf',
                                              # Not really the correct
                                              # completer, but ok-ish.

    def invoke(self, arg, from_tty):
        (remaining, locations) = gdb.decode_line(arg)
        if remaining is None:
            raise gdb.GdbError('printf format missing')
        remaining = remaining.strip(',')
        if locations is None:
            raise gdb.GdbError('no matching locations found')

        spec = arg[0:- len(remaining)]
        _LPrintfBreakpoint(spec, 'printf ' + remaining)


12. Events

There have been many new Python scripting features added to gdb since my last post on the topic.  The one I want to focus on today is event generation.

I wrote a little about events in gdb-python post #9 — but a lot has changed since then.  A Google SoC student, Oguz Kayral, wrote better support for events in 2009.  Then, Sami Wagiaalla substantially rewrote it and put it into gdb.

In the new approach, gdb provides a number of event registries.  An event registry is just an object with connect and disconnect methods.  Your code can use connect to register a callback with a registry; the callback is just any callable object.  The event is passed to the callable as an argument.

Each registry emits specific events — “emitting” an event just means calling all the callables that were connected to the registry.  For example, the registry emits events when an inferior or thread has stopped for some reason.  The event describes the reason for the stop — e.g., a breakpoint was hit, or a signal was delivered.

Here’s a script showing this feature in action.  It arranges for a notification to pop up if your program stops unexpectedly — if your program exits normally, nothing is done.  Something like this could be handy for automating testing under gdb; you could augment it by having gdb automatically exit if fires.  You could also augment it by setting a conditional breakpoint to catch a rarely-seen condition; then just wait for the notification to appear.

To try this out, just “source” it into gdb.  Then, run your program in various ways.

import gdb, threading, Queue, gtk, glib, os, pynotify

(read_pipe, write_pipe) = os.pipe()

event_queue = Queue.Queue()

def send_to_gtk(func):
    os.write(write_pipe, 'x')

def on_stop_event(event):
    n = pynotify.Notification('Your program stopped in gdb')

class GtkThread(threading.Thread):
    def handle_queue(self, source, condition):
        global event_queue, 1)
        func = event_queue.get()

    def run(self):
        global read_pipe
        glib.io_add_watch(read_pipe, glib.IO_IN, self.handle_queue)



t = GtkThread()


Here’s a homework problem for you: design a static probe point API that:

  • Consists of a single header file,
  • Works for C, C++, and assembly,
  • Allows probes to have arguments,
  • Does not require any overhead for computing the arguments if they are already live,
  • Does not require debuginfo for debug tools to extract argument values,
  • Has overhead no greater than a single nop when no debugger is attached, and
  • Needs no dynamic relocations.

I wouldn’t have accepted this task, but Roland McGrath, in a virtuoso display of ELF and GCC asm wizardy, wrote <sys/sdt.h> for SystemTap.  Version 3 has all the properties listed above.  I’m pretty amazed by it.

This past year, Sergio Durigan Junior and I added support for this to gdb.  It is already in Fedora, of course, and it will be showing up in upstream gdb soon.

The way I think about these probes is that they let you name a place in your code in a way that is relatively independent of source changes.  gdb can already address functions nicely ("break function") or lines ("break file.c:73") — but sometimes I’d like a stable breakpoint location that is not on a function boundary; but using line numbers in a .gdbinit or other script is hard, because line numbers change when I edit.

We’ve also added probes to a few libraries in the distro, for gdb to use internally.  For example, we added probes to the unwind functions in libgcc, so that gdb can properly “next” over code that throws exceptions.  And, we did something similar for longjmp in glibc.  You can dump the probes from a library with readelf -n, or with “info probes” in gdb.

The probes were designed to be source-compatible with DTrace static probes.  So, if you are already using those, you can just install the appropriate header from SystemTap.  Otherwise, adding the probes is quite easy… see the instructions, but be warned that they focus a bit too much on DTrace compability; you probably don’t want the .d file and the semaphore, that just slows things down.  Instead I recommend just including the header and using the macros directly.

11. The End

We’ve covered many of the features of python-gdb:

  • Writing new commands
  • Convenience functions
  • Pretty-printing
  • Auto-loading of Python code
  • Scripting gdb from Python
  • Bringing up a GUI

In fact, that is probably all of the user-visible things right now.  There are classes and methods in the Python API to gdb that we have not covered, but you can read about those when you need to use them.

What next?  There are a few things to do.  There are probably bugs.  As we saw in some earlier sections, support for I/O redirection is not there.  We need better code for tracking the inferior’s state.  Barring the unexpected, all this will be done in the coming months.

Now is an exciting time to be working on gdb.  There are a number of very interesting projects underway:

  • Reversible debugging is being developed.  The idea here is that gdb can record what your program does, and then you can step backward in time to find the bug.
  • Sérgio Durigan Júnior, at IBM, has been working on syscall tracing support.  This will let us do strace-like tracing in gdb.  What’s nice about this is that all the usual gdb facilities will also be available: think of it as a Python-enabled strace, with stack dump capability.
  • The excellent folks at Code Sourcery (I would name names, but I’m afraid of leaving someone out) are working on multi-process support for gdb.  This is the feature I am most looking forward to.  In the foreseeable future, gdb will be able to trace both the parent and the child of a fork.  The particular “wow” use-case is something I read on the frysk web site: run “make check” in gdb, and have the CLI fire up whenever any program SEGVs.  No more futzing with setting up the debug environment!  In fact, no more figuring out how to get past libtool wrapper scripts — we could add a little hack so that you can just run them in gdb and the right thing will happen.

Naturally, we’ll be wiring all this up to Python, one way or another.

I’ve also got some longer-term plans for the Python support.  I’m very interested in extending gdb to debug interpreted languages.  As with most computer problems, this means inserting a layer of indirection in a number of places: into expression parsing, into symbol lookup, into breakpoints, into watchpoints, etc.  The goal here is to be able to write support for, say, debugging Python scripts, as a Python extension to gdb.  Then, users could switch back and forth between “raw” (debugging the C implementation) and “cooked” (debugging their script) views easily.

I have two basic models I use when thinking about python-gdb: valgrind and emacs.

Emacs is a great example of managing the split between the core implementation and scripts.  Emacs developers prefer to write in elisp when possible; the core exists, more or less, to make this possible for a wide range of uses.  I’m trying to steer gdb in this direction.  That is, push Python hooks into all the interesting places in gdb, and then start preferring Python over C.  (Mozilla might have been another good example here — but I am more familiar with Emacs.)

Naturally, we’ll pursue this with extraordinary wisdom and care.  Cough cough.  Seriously, there are many areas of gdb which are not especially performance sensitive.  For example, consider the new commands we wrote during this series.  Even support for a new language would not require anything that could not be comfortably — and excellently — done in Python.

Valgrind taught me the Field of Dreams model: even a fairly esoteric area of programming can attract a development community, provided that you build the needed infrastructure.  In other words, just look at all those cool valgrind skins.  This library orientation, by the way, is something I would like to see GCC pursue more vigorously.

I’m very interested to hear your feedback.  Feel free to post comments here, or drop us a line on the Archer list.

We’ve come to the end of this series of posts.  I’m sad to see it end, but now it is time to stop writing about python-gdb features, and to go back to writing the features themselves.  I’ll write more when there is more to be said.