Archive for November, 2016

Shaggy Dogs and SpiderMonkey Unwinders

A year or so ago I was asked to debug a crash in the Firefox devtools.  Crashes are easy!  I fired up gdb and reproduced the crash… which turned out to be in some code JITted by SpiderMonkey.  I was immediately lost; even a simple bt did not work.  Someone more familiar with the JIT — hi Shu — had to dig out the answer :-(.

I did take the opportunity to get some information from him about how he found the result, though.  He pointed me to the code responsible for laying out JIT stack frames.  It turned out that gdb could not unwind through JIT frames, but it could be done by hand — so I resolved then to eventually fix this.

Phase One

I knew from my gdb hacking that gdb has a JIT unwinding API.  Actually — and isn’t this the way most programs end up working? — it has two.

The first JIT API requires some extra work on the part of the JIT: it constructs an object file, typically ELF and DWARF, in memory, then calls a hook.  GDB sets a breakpoint on this hook and, when hit, it reads the data from the inferior.  This lets the JIT provide basically any kind of information — but it’s pretty heavy.

So, I focused my attention on the second API.  In this mode, the JIT author would provide a shared library that used some callbacks to inform gdb of the details of what was going on.  The set of callbacks was much more limited, but could at least describe how to unwind the registers.  So, I figured that this is what I would do.

But… I didn’t really want to write this in C.  That would be a real pain!  C is fiddly and hard to deal with, and it would mean constant rebuilding of the shared library while debugging, and SpiderMonkey already had a reasonable number of gdb-python scripts — surely this could be done in Python.

So I took the quixotic approach, namely writing a shared library that used the second gdb JIT API but only to expose this API to Python.

Of course, this turned out to be Rube Goldbergian.  Various parts of the gdb Python API could not be called from the JIT shared library, because those bits depended on other state in gdb, which wasn’t set properly when the JIT library was being called.  So, I had gdb calling into my shared library, which called my Python code, which then invoked a new gdb command (written in Python and supplied by my package) — that existed solely for the purpose of setting this internal state properly — and that in turn invoked the code I wanted to run, say to fetch memory or a register or something.

Computer Science!

Well, that took a while.  But it sort of worked!  And maybe I could just keep it in github and not put it in Mozilla Central and avoid learning about the Firefox build system and copying in some gdb header file and license review and whatnot.

So I started writing the actual Python code… OMG.  And see below since you will totally want to know about this.  But meanwhile…

… while I was hacking away on this crazy idea, someone implemented the much more sane idea of just exposing gdb’s unwinder API to gdb’s Python layer.

Hmm… why didn’t I do that?  Well, I left gdb under a bit of a cloud, and didn’t really want to be that involved at the time.  Plus, you know, gdb is a high quality project; which means that if you write a giant patch to expose the unwinding API, you have to be prepared for 17 rounds of patch review (this really happened once), plus writing documentation and tests.  Sometimes it’s just easier to channel one’s inner Rube.

Phase Two

The integrated Python API was a great development.  Now I could delete my shared library and my insane trampoline hacks, and focus on my insane unwinding code.

A lot of this work was straightforward, in the sense that the general outline was clear and just the details remained.  The details amount to things like understanding the SpiderMonkey frame descriptor (which partly describes the previous frame and partly the new frame; there’s one comment explaining this that somehow eluded me for quite a while); duplicating the SpiderMonkey JIT unwinding code in Python; and of course carefully reading the SpiderMonkey code that JITs the “entry frame” code to understand how registers are spilled.

Naturally, while doing this it turned out that I was maybe the first person to use these gdb APIs in anger.  I found some gdb crashes, oops!  The docs would have been impenetrable, except I already knew the underlying C APIs on which they were based… whew!  The Python API was unexpectedly picky in other areas, too.

But then there was also some funny business, one part in gdb, and one part in SpiderMonkey.

GDB is probably more complicated than you realize.  In this case, the complexity is that, in gdb, each stack frame can have its own architecture.  This seemingly weird functionality is actually used; I think it was invented for the SPU, but some other chips have multiple modes as well.  But what this means is that the question “what architecture is this program?” is not well-defined, and anyway gdb’s Python layer doesn’t provide you a way to find whatever approximation it is that would make sense in your specific case.  However, when writing the SpiderMonkey unwinder, it kind of actually is well-defined and we’d like to know the answer to know which unwinder to choose.

For this problem I settled on the probably terrible idea of checking whether a given register is available.  That is, if you see “$rip“, you can guess it’s x86-64.

The other problem here is that gdb thinks that, since you wrote an unwinder, it should get the first stab at unwinding.  That’s very polite!  But for SpiderMonkey, deciding “hey, is this PC in some code the JIT emitted?” is actually a real pain, or at least outside the random bits of it I learned in order to make all this work.

Aha!  I know, there’s probably a Python API to say “is this address associated with some shared library?”  I remembered reading and/or reviewing a patch… but no, gdb.solib_name is close but doesn’t do the right thing for addresses in the main executable.  WAT.

I tried several tricks without success, and in the end I went with parsing /proc/maps to get the mappings to decide whether a given frame should be handled by this unwinder or by gdb.  Horrible.  And fails with remote debugging.

Luckily, nobody does remote debugging.

Remote Debugging

Oh, wait, people do remote debugging at Mozilla all the time.  They don’t call it “remote debugging” though — they call it “using RR“, which while it runs locally, appears to be remote to gdb; and, importantly, during replay mode fakes the PID, and does other deep magic, though not deep enough to extend to making a fake map file that could be read via gdb’s remote get command.

By the way, you should be using RR.  It’s the best advance in debugging since, well, gdb.  It’s a process record-and-replay program, but unlike gdb’s built-in reverse debugging, it handles threads properly and has decent performance.

Oh Well

Oh well.  It just won’t work remotely.  Or at least not until fellow Mozillian (this always seems like it should be “Mozillan” to me, but it’s not, there really is that extra “i”) and all-star Nicolas Pierron wrote some additional Python to read some SpiderMonkey tables to make the decision in a more principled way.  Now it will all work!

Though looking now I wonder if I dreamed this, because the code isn’t checked in.  I know he had a patch but my memory is a bit fuzzy — maybe in the end it didn’t work, because RR didn’t implement the qGetTLSAddr packet, which gdb uses to read thread-local storage.  Did I mention the thread-locals?

The Real Start of the Story

So, way back at the beginning, during my initial foray into this code, I found that a crucial bit of information — the appropriately-named TlsPerThreadData — was stashed away in a thread-local variable.  Information stored here is needed by the unwinder in order to unwind from a C++ frame into a JIT frame.

Only, Firefox didn’t use “real” thread-local variables, the things that so many glibc and gcc hackers put so much effort into micro-optimizing.  No, it just used a template class that wrapped pthread_setspecific and friends in a relatively ergonomic way.

Naturally, for an unwinder this is a disaster.  Why?  Unwinding is basically the dissection of the stack; but in order to compute the value of one of these thread-local-storage objects, the unwinder would have to make some function calls in the inferior (in fact this prevents it from working on OSX).  But these would affect the stack, and also potentially let other inferior code (in other threads — remember, gdb is complicated and you can exert various unusual kinds of control like this) run as well.

So I neglected to mention the very first step: changing Firefox to use __thread.  (Ok, I didn’t really neglect to mention it, I was just being lazy and anyway it’s a shaggy dog story.)

Do Not Use libthread_db

RR did not implement qGetTLSAddr, which we needed, because  lots of people at Mozilla use RR.  So I set out to implement that.  This meant a foray into the dangerous world of libthread_db.

For reasons I do not know, and suspect that I do not want to know, glibc has historically followed many Solaris conventions.  One such Solaris innovation was libthread_db — a library that debuggers use to find certain information from libc, information like the address of a thread-local variable

On the surface this seems like a great idea: don’t bake the implementation details of the C library into the debugger.  Instead, let the debugger use a debugging library that comes with the C library.  And, if you designed it that way, it would be a good idea.

Sadly, though, libthread_db was not designed that way.  Oh no.

For example, libthread_db has a callback interface.  The calling program — gdb or rr — must provide some functions that libthread_db can call, to do some simple things like “read some memory”; or some very complicated things like “find the address of a symbol given its name”.  Normal C programmers might implement these callbacks using a structure containing function pointers.  But not libthread_db!  Instead it uses fixed symbol names that must be provided by the calling application.  Not all of these are required for it to work (you get to figure out which, yay!), but some definitely are.  And, you have to dlopen a libthread_db that matches the libc of the inferior that you’re debugging (or link against it, but that’s also obviously bad).

Wait, you say.  Doesn’t that mess up cross-debugging?  Why yes!  Yes it does!  Which is why qGetTLSAddr has to be in the gdb remote serial protocol to start with.

Hey, maybe the Linux vendors should fix this.  They are — see Gary Benson’s Infinity project — but unfortunately that’s still in development and I wanted RR to work sooner.

Ok, so whew.  I wrote qGetTLSAddr support for RR.  This was a small patch in the end, but an unusual pain in an already painful series.  Hopefully this won’t spill out into other programs.


Hahaha, you are so funny.  Of course it spills out: remember how you have to define a bunch of functions with specific names in your program in order to use libthread_db?  Well, how do you know you got the types correct?

Yeah, you include <proc_service.h> (a name deliberately chosen to confuse, I suppose, why not, it doesn’t bear any obvious relationship to the library).  Only, that was never installed by glibc.  Instead, gdb just copied it into the source tree.

So naturally I went and fixed this in glibc.  And, even more naturally, this broke the gdb build, which was autoconf’d to check for a file that never existed in the past.  LOL.

Thank You Cthulhu

At this point I figured it was only a matter of time until I had to patch the kernel.  Thankfully this hasn’t been necessary yet.

It Says What

In gdb the actual unwinding and the display of frames are separate concerns.

And let me digress here to say that gdb’s unwinder design is excellent.  I believe it was redone by Andrew Cagney (this was well before my active time in gdb, so apologies if you’re reading this and you did it and I’ve misattributed it).  Like much of gdb, many of the details are bizarre and take one back to the byte-counting days of 1987; but the high level design is very solid and has endured with, I think, just one significant change (to support inline functions) in the intervening 15 or so years.  I’ve long thought that this is a remarkable accomplishment in the programming world.

So, yes.  It’s not enough to just unwind.  Simply having an unwinder yields backtraces with lines like:

#5 0xfeefee ???

Better than nothing!  But not yet great.

The second part of the SpiderMonkey unwinder is, therefore, a gdb “frame filter”.  This is an object that takes raw frames and decorates them with information like a function name, or a file name, or arguments.

Work to add this information is ongoing — I landed one patch just yesterday, and another one, to add more information about interpreted frames, is still in the works.  And there are two more bugs filed… maybe this project, like this blog post, will never conclude.  It will just scroll endlessly.

But now, with all the code in place, bt can show something like:

#6 0x00007ffff7ff20f3 in <<JitFrame_BaselineJS "f1">> (this=JSVAL_VOID, arg1=$jsval(4700))

This is the call f1(4700).

Let’s Just Have One More

Of course we still couldn’t enable this unwinder by default.  You have to enable it by hand.

And by the way, in the first release of gdb’s Python unwinder feature, enabling or disabling an unwinder didn’t flush the frame cache, so it wouldn’t actually take effect until some invisible-to-the-user state change took place.  I fixed this bug, but here Pedro Alves also taught me the secret gdb command flushregs, which in fact just flushes the frame cache. (I’m going to go out on a limb and guess that this command predates the already ancient maint prefix command, hence its weird name.)

Anyway, you have to enable it by hand because the unwinder itself doesn’t work properly if the outermost frame is in JIT code.  The JIT, in the interest of performance, doesn’t maintain a frame pointer.  This means that in the outermost frame, there’s no reliable way to find the object that describes this frame and links to the previous frame.

Now, normally in this case gdb would either resort to debug info (not available here), or in extremis its encyclopedic suite of prologue analyzers (yes, gdb can analyze common function prologues for all architectures developed in the last 25 years to figure out stuff) — but naturally JIT compilers go their own way here as well.

Humans, like Shu back at the start of this story, can do this by dumping parts of the stack and guessing which bytes represent the frame header.

But, I’ve been reluctant and a bit afraid to hack a heuristic into the unwinder.

To sum up — in case you missed it — this means that all the code written during this entire saga would still not have helped with my original bug.

The End