Archive for the ‘gdb’ Category

Faster Faster GDB Startup

A while ago, I wrote about my work to speed up GDB’s DWARF reader. I thought I’d write again with a few updates.

Sharding

Back then, I wrote: “maybe GDB could trade memory for performance and shard the resulting index and do separate canonicalizations in each worker thread”.

I did end up doing this. Recall that the canonicalization step goes through all the discovered DWARF entries of interest — basically, all the objects in the program that both have a name and are not in a function scope (except in some languages, there is always an exception with DWARF) — and ensures the names are in a normal form. For Ada, this step includes synthesizing the package hierarchy (something that should probably be done for Go as well, except nobody really works on the Go support in GDB).

As an aside, sometime in the last few years we realized that this canonicalization has to be done for C as well, because in C there are multiple spellings of types like “short”. This is also implemented.

Because GDB already reads DWARF CUs in chunks in separate threads, the sharding idea is that we can speed up canonicalization a bit by doing this separately. Previously, GDB combined all the results before processing. Sharding means that lookups are a little more complicated; but it turns out not to be too hard, because the number of shards is typically low (for reasons I haven’t yet investigated, the reader doesn’t scale past 8 threads or so).

Background Reading

The other major change I made is to do all the DWARF reading in the background. This is a trick to make gdb feel faster to users. The basic idea here is that in many cases, gdb does not immediately need the DWARF from the various files. So, if we push the reading into worker threads, maybe it will be completely read in by the time gdb does need it.

This also somewhat benefits the situation where several shared libraries are loaded at once into the inferior. In this case, gdb already defers breakpoint re-setting until all the DWARF has been read — and with this change, all that work will be done in parallel.

Making this work wasn’t entirely straightforward. The main issue here is that gdb determines the initial language and location for “list” (et al) based on the debug info. The patches arrange to set these things lazily as well. I also had to add some rudimentary thread-safety to BFD.

Now, this can be defeated in a few ways. If you have a .gdbinit that sets a breakpoint, then that will cause the familiar pause, because setting a breakpoint will wait for the workers to complete. Or, if you debug a large executable and type very quickly, you may have to wait for the parsing to finish.

However, when it does work, it feels like gdb starts instantly.

DWARF Abbrevs Use Too Much Space

I was curious about DWARF abbrev table efficiency the other day, so I instrumented gdb to record some simple stats about abbrevs: how many are seen, how many duplicates are seen, and how many bytes are used.

Running gdb on itself, I discovered that abbrevs are largely redundant. In particular, removing redundant abbrevs will remove 95% of abbrevs (10238 unique of 230714 total). Similarly the size of the abbrev tables reduces similarly (230714 bytes needed for the de-duplicated abbrevs, compared to 3848152 as seen in the executable).

Something to think about when you consider the effort DWARF puts in to save a single byte in .debug_info, say by using a 1-byte form rather than a uleb.

I was considering having gdb intern abbrevs and pre-reading all abbrevs so that later steps wouldn’t have to re-read these; but interning turns out to be too slow and so re-reading on demand seems like the way to go.

Faster GDB Startup

After literally years of false starts and failed attempts, last week I finally checked in a series of patches that speed up GDB’s DWARF reader. The speedup for ordinary C++ code is dramatic — I regularly see a 7x performance improvement. For example, on this machine, startup on gdb itself drops from 2.2 seconds to 0.3 seconds. This seems representative, and I’ve seen even better increases on my work machine, which has more cores. Startup on Ada programs is perhaps the worst case for the current code, due to some oddities in Ada debuginfo, but even there it’s a respectable improvement.

GDB Startup

GDB, essentially, had two DWARF readers. They actually shared a surprisingly small amount of code (which was an occasional source of bugs). For example, while abbrev lookup and name generation (more on that later) was shared, the actual DIE data structures were not.

The first DWARF reader created “partial symbols”, which held a name and some associated, easy-to-compute data, like the kind of symbol (variable, function, struct tag, etc). The second DWARF reader (which is still there now) is called when more information was needed about a particular symbol — say, its type. This reader reads all the DIEs in a DWARF compilation unit and expands them into gdb’s symbol table, block, and type data structures.

Both of these scans were slow, but for the time being I’ve only rewritten the first scan, as it was the one that was first encountered and most obviously painful. (I’ve got a plan to fix up the CU expansion as well, but that’s a lengthy project of its own.)

What Was Slow

The partial symbol reader had several slow points. None of them seemed obviously slow if you looked with a profiler, but each one performed unnecessary work, and they combined in an unfortunate way.

The partial DIE cache. GDB did a scan and saved certain DIEs in a cache. There were some helpful comments that I believe were true at one point that explained why this was useful. However, I instrumented GDB and found that less than 10% of the cached DIEs were ever re-used. Computing and allocating them was largely a waste, just to support a few lookups. And, nearly every DIE that was ever looked up was done so on behalf of a single call — so the cache was nearly useless.
Name canonicalization. DWARF says that C++ names should follow the system demangler. The idea here is to provide some kind of normal form without having to really specify it — this matters because there are multiple valid ways to spell certain C++ names. Unfortunately, GCC has never followed this part of DWARF. And, because GDB wants to normalize user input, so that any spelling will work, the partial reader normalized C++ names coming from the DWARF as well. This area has a whole horrible history (for example, the demangler is crash-prone so GDB installs a SEGV handler when invoking it), but the short form here is that the partial symtab reader first constructed a fully-qualified name, and only then normalized it. This meant that any class or namespace prefix (and there are a lot of them) was re-normalized over and over while constructing names.
The bcache. The partial symbol reader made heavy use of a data structure in GDB called a bcache. This is like a string interner, but it works on arbitrary memory chunks. The bcache was used to intern both the names coming from canonicalization, as well as the partial symbols themselves. This in itself isn’t a problem, except that it requires a lock if you want to use it from multiple threads.

The New Reader

The new reader fixes all the above problems, and implements some other optimizations besides.

There is no more partial DIE cache. Instead, GDB simply scans the DWARF and immediately processes what it finds. While working on this, I realized that whether a given DIE is interesting or not is, largely, a static property of its abbrev. For example, if a DIE does not have a name and does not refer back to another DIE (either via “specification” or “origin” — DWARF is weird), then it can simply be skipped without trying to understand it at all. So, in the new reader, this property is computed once per abbrev and then simply consulted in the scanner, avoiding a lot of repeated checks.

The entire scanner is based on the idea of not trying to form the fully qualified name of a symbol. Now, while the rest of GDB wants the fully-qualified name, there’s no need to store it. Instead, the conversion is handled by the name-lookup code, which splits the searched-for name into components. The scanner creates an index data structures that’s similar to what is described by DWARF 5 (modulo bugs in the standard).

As part of this non-qualifying approach, only the “local” name is stored in each entry. Name canonicalization must still be done for C++ (and a more complicated process for Ada), but this is done on much shorter strings. A form of string interning is still used, but it takes advantage of the fact that the original string comes from the DWARF string table, and so simple pointer comparisons can be done (normally the linker combines identical strings, and if not, this just wastes a little memory). Furthermore, the interning is all done in a worker thread, so in most cases the GDB prompt will return before the work is fully complete — this makes an illusion of speed, and a nicer experience as a user.

Speaking of threads, GDB also now scans all DWARF compilation units in parallel. Specifically, GDB has a parameter that sets the number of worker threads, and it uses a parallel for-each to split the list of compilation units into N groups, and each thread works on a group. I experimented a bit and found that setting N to the number of CPUs on the system works well, at least on the machines I have available.

There’s probably still some room to speed things up some more. Maybe there are some micro-optimizations to be done. Maybe GCC could canonicalize C++ names, and we could eliminate an entire step; or maybe GDB could trade memory for performance and shard the resulting index and do separate canonicalizations in each worker thread.

There’s still an unfortunate amount of hair in there to deal with all the peculiarities of DWARF. DWARF is nicely flexible, but sometimes much too flexible, and actively difficult to read. Also, each version of DWARF yields new modes, which complicate the design. In addition to ordinary DWARF, GDB also deals with split DWARF (two or maybe three kinds), dwz-compressed DWARF (which is standard but has very many inter-CU references, where ordinary compiler-generated DWARF has none), the multi-file dwz extension, and the old debug_types section. Each of these needed special code in the new reader.

Future Work

Full CU expansion is still slow. You don’t see this (much) during GDB startup, but if you’ve ever done a ‘next’ or ‘print’ and then waited interminably — congratulations, you’ve found a bad CU expansion case. Normally these occur when GDB encounters some truly enormous CU… in my experience, most CUs are small, but there are some bogglingly huge outliers.

This is probably the next thing to fix.

The current code still shares less code with the second DWARF reader than you may think. For example, the full symbol reader constructs fully-qualified names according to its own, different algorithm.

My current plan here is to reuse the existing index to construct a sort of skeleton symbol table. Then, we’d further change GDB to fill in the bodies of individual symbols on demand — eliminating the need to ever do a full expansion. (Perhaps this could be extended to types as well, but internally in GDB that may be trickier.) As part of this, the fully-qualified names would be constructed from the index itself, which is also much cheaper than re-computing and re-canonicalizing them.

Summary

GDB is a lot faster to start now. This was done through a combination of removing useless work, smarter data structures, and exploiting the wide availability of multi-core machines.

Warning and Sanitizer Retrospective

One of my hobbies in GDB is cleaning things up. A lot of this is modernizing and C++-ifying the code, but I’ve also enabled a number of warnings and other forms of code checking in the last year or two. I thought it might be interesting to look at the impact, on GDB, of these things.

So, I went through my old warning and sanitizer patch series (some of which are still in progress) to see how many bugs were caught.

This list is sorted by least effective first, with caveats.

-fsanitize=undefined; Score: 0 or 10

You can use -fsanitize=undefined when compiling to have GCC detect undefined behavior in your code. This series hasn’t landed yet (it is pending some documentation updates).

We have a caveat already! It’s not completely fair to put UBsan at the top of the list — the point of this is that it detects situations where the compiler might do something bad. As far as I know, none of the undefined behavior that was fixed in this series caused any visible problem (so from this point of view the score is zero); however, who knows what future compilers might do (and from this point of view it found 10 bugs). So maybe UBSan should be last on the list.

Most of the bugs found were due to integer overflow, for example decoding ULEB128 in a signed type. There were also a couple cases of passing NULL to memcpy with a length of 0, which is undefined but should probably just be changed in the standard.

-Wsuggest-override; Score: 0

This warning will fire if you have a method that could have been marked override, but was not. This did not catch any gdb bugs. It does still have value, like everything on this list, because it may prevent a future bug.

-Wduplicated-cond; Score: 1

This warning detects duplicated conditions in an if-else chain. Normally, I suppose, these would arise from typos or copy/paste in similar conditions. The one bug this caught in GDB was of that form — two identical conditions in an instruction decoder.

GCC has a related -Wduplicated-branches warning, which warns when the arms of an if have identical code; but it turns out that there are some macro expansions in one of GDB’s supporting libraries where this triggers, but where the code is in fact ok.

-Wunused-variable; Score: 2

When I added this warning to the build, I thought the impact would be removing some dead code, and perhaps a bit of fiddling with #ifs. However, it caught a couple of real bugs: cases where a variable was unused, but should have been used.

-D_GLIBCXX_DEBUG; Score: 2

libstdc++ has a debug mode that enables extra checking in various parts of the C++ library. For example, enabling this will check the irreflexivity rule for operator<. While the patch to enable this still hasn’t gone in — I think, actually, it is still pending some failure investigation on some builds — enabling the flag locally has caught a couple of bugs. The fixes for these went in.

-Wimplicit-fallthrough; Score: 3

C made a bad choice in allowing switch cases to fall through by default. This warning rectifies this old error by requiring you to explicitly mark fall-through cases.

Apparently I tried this twice; the first time didn’t detect any bugs, but the second time — and I don’t recall what, if anything, changed — this warning found three bugs: a missing break in the process recording code, and two in MI.

-Wshadow=local; Score: 3

Shadowing is when a variable in some inner scope has the same name as a variable in an outer scope. Often this is harmless, but sometimes it is confusing, and sometimes actively bad.

For a long time, enabling a warning in this area was controversial in GDB, because GCC didn’t offer enough control over exactly when to warn, the canonical example being that GCC would warn about a local variable named “index“, which shadowed a deprecated C library function.

However, now GCC can warn about shadowing within a single function; so I wrote a series (still not checked in) to add -Wshadow=local.

This found three bugs. One of the bugs was found by happenstance: it was in the vicinity of an otherwise innocuous shadowing problem. The other two bugs were cases where the shadowing variable caused incorrect behavior, and removing the inner declaration was enough to fix the problem.

-fsanitize=address; Score: 6

The address sanitizer checks various typical memory-related errors: buffer overflows, use-after-free, and the like. This series has not yet landed (I haven’t even written the final fix yet), but meanwhile it has found 6 bugs in GDB.

Conclusion

I’m generally a fan of turning on warnings, provided that they rarely have false positives.

There’s been a one-time cost for most warnings — a lot of grunge work to fix up all the obvious spots. Once that is done, though, the cost seems small: GDB enables warnings by default when built from git (not when built from a release), and most regular developers use GCC, so build failures are caught quickly.

The main surprise for me is how few bugs were caught. I suppose this is partly because the analysis done for new warnings is pretty shallow. In cases like the address sanitizer, more bugs were found; but at the same time there have already been passes done over GDB using Valgrind and memcheck, so perhaps the number of such bugs was already on the low side.

FOSDEM, Rust, and Debugging

I’ve recently switched groups at Mozilla to start working full-time on improving Rust debugging. To kick this off and to meet people from the various projects I’m working on — the Rust compiler, lldb, llvm, gdb, and (eventually) the DWARF standard — I will speak about this work at FOSDEM. If you’re going and want to meet up, drop me a line.

Shaggy Dogs and SpiderMonkey Unwinders

A year or so ago I was asked to debug a crash in the Firefox devtools. Crashes are easy! I fired up gdb and reproduced the crash… which turned out to be in some code JITted by SpiderMonkey. I was immediately lost; even a simple bt did not work. Someone more familiar with the JIT — hi Shu — had to dig out the answer :-(.

I did take the opportunity to get some information from him about how he found the result, though. He pointed me to the code responsible for laying out JIT stack frames. It turned out that gdb could not unwind through JIT frames, but it could be done by hand — so I resolved then to eventually fix this.

Phase One

I knew from my gdb hacking that gdb has a JIT unwinding API. Actually — and isn’t this the way most programs end up working? — it has two.

The first JIT API requires some extra work on the part of the JIT: it constructs an object file, typically ELF and DWARF, in memory, then calls a hook. GDB sets a breakpoint on this hook and, when hit, it reads the data from the inferior. This lets the JIT provide basically any kind of information — but it’s pretty heavy.

So, I focused my attention on the second API. In this mode, the JIT author would provide a shared library that used some callbacks to inform gdb of the details of what was going on. The set of callbacks was much more limited, but could at least describe how to unwind the registers. So, I figured that this is what I would do.

But… I didn’t really want to write this in C. That would be a real pain! C is fiddly and hard to deal with, and it would mean constant rebuilding of the shared library while debugging, and SpiderMonkey already had a reasonable number of gdb-python scripts — surely this could be done in Python.

So I took the quixotic approach, namely writing a shared library that used the second gdb JIT API but only to expose this API to Python.

Of course, this turned out to be Rube Goldbergian. Various parts of the gdb Python API could not be called from the JIT shared library, because those bits depended on other state in gdb, which wasn’t set properly when the JIT library was being called. So, I had gdb calling into my shared library, which called my Python code, which then invoked a new gdb command (written in Python and supplied by my package) — that existed solely for the purpose of setting this internal state properly — and that in turn invoked the code I wanted to run, say to fetch memory or a register or something.

Computer Science!

Well, that took a while. But it sort of worked! And maybe I could just keep it in github and not put it in Mozilla Central and avoid learning about the Firefox build system and copying in some gdb header file and license review and whatnot.

So I started writing the actual Python code… OMG. And see below since you will totally want to know about this. But meanwhile…

… while I was hacking away on this crazy idea, someone implemented the much more sane idea of just exposing gdb’s unwinder API to gdb’s Python layer.

Hmm… why didn’t I do that? Well, I left gdb under a bit of a cloud, and didn’t really want to be that involved at the time. Plus, you know, gdb is a high quality project; which means that if you write a giant patch to expose the unwinding API, you have to be prepared for 17 rounds of patch review (this really happened once), plus writing documentation and tests. Sometimes it’s just easier to channel one’s inner Rube.

Phase Two

The integrated Python API was a great development. Now I could delete my shared library and my insane trampoline hacks, and focus on my insane unwinding code.

A lot of this work was straightforward, in the sense that the general outline was clear and just the details remained. The details amount to things like understanding the SpiderMonkey frame descriptor (which partly describes the previous frame and partly the new frame; there’s one comment explaining this that somehow eluded me for quite a while); duplicating the SpiderMonkey JIT unwinding code in Python; and of course carefully reading the SpiderMonkey code that JITs the “entry frame” code to understand how registers are spilled.

Naturally, while doing this it turned out that I was maybe the first person to use these gdb APIs in anger. I found some gdb crashes, oops! The docs would have been impenetrable, except I already knew the underlying C APIs on which they were based… whew! The Python API was unexpectedly picky in other areas, too.

But then there was also some funny business, one part in gdb, and one part in SpiderMonkey.

GDB is probably more complicated than you realize. In this case, the complexity is that, in gdb, each stack frame can have its own architecture. This seemingly weird functionality is actually used; I think it was invented for the SPU, but some other chips have multiple modes as well. But what this means is that the question “what architecture is this program?” is not well-defined, and anyway gdb’s Python layer doesn’t provide you a way to find whatever approximation it is that would make sense in your specific case. However, when writing the SpiderMonkey unwinder, it kind of actually is well-defined and we’d like to know the answer to know which unwinder to choose.

For this problem I settled on the probably terrible idea of checking whether a given register is available. That is, if you see “$rip“, you can guess it’s x86-64.

The other problem here is that gdb thinks that, since you wrote an unwinder, it should get the first stab at unwinding. That’s very polite! But for SpiderMonkey, deciding “hey, is this PC in some code the JIT emitted?” is actually a real pain, or at least outside the random bits of it I learned in order to make all this work.

Aha! I know, there’s probably a Python API to say “is this address associated with some shared library?” I remembered reading and/or reviewing a patch… but no, gdb.solib_name is close but doesn’t do the right thing for addresses in the main executable. WAT.

I tried several tricks without success, and in the end I went with parsing /proc/maps to get the mappings to decide whether a given frame should be handled by this unwinder or by gdb. Horrible. And fails with remote debugging.

Luckily, nobody does remote debugging.

Remote Debugging

Oh, wait, people do remote debugging at Mozilla all the time. They don’t call it “remote debugging” though — they call it “using RR“, which while it runs locally, appears to be remote to gdb; and, importantly, during replay mode fakes the PID, and does other deep magic, though not deep enough to extend to making a fake map file that could be read via gdb’s remote get command.

By the way, you should be using RR. It’s the best advance in debugging since, well, gdb. It’s a process record-and-replay program, but unlike gdb’s built-in reverse debugging, it handles threads properly and has decent performance.

Oh Well

Oh well. It just won’t work remotely. Or at least not until fellow Mozillian (this always seems like it should be “Mozillan” to me, but it’s not, there really is that extra “i”) and all-star Nicolas Pierron wrote some additional Python to read some SpiderMonkey tables to make the decision in a more principled way. Now it will all work!

Though looking now I wonder if I dreamed this, because the code isn’t checked in. I know he had a patch but my memory is a bit fuzzy — maybe in the end it didn’t work, because RR didn’t implement the qGetTLSAddr packet, which gdb uses to read thread-local storage. Did I mention the thread-locals?

The Real Start of the Story

So, way back at the beginning, during my initial foray into this code, I found that a crucial bit of information — the appropriately-named TlsPerThreadData — was stashed away in a thread-local variable. Information stored here is needed by the unwinder in order to unwind from a C++ frame into a JIT frame.

Only, Firefox didn’t use “real” thread-local variables, the things that so many glibc and gcc hackers put so much effort into micro-optimizing. No, it just used a template class that wrapped pthread_setspecific and friends in a relatively ergonomic way.

Naturally, for an unwinder this is a disaster. Why? Unwinding is basically the dissection of the stack; but in order to compute the value of one of these thread-local-storage objects, the unwinder would have to make some function calls in the inferior (in fact this prevents it from working on OSX). But these would affect the stack, and also potentially let other inferior code (in other threads — remember, gdb is complicated and you can exert various unusual kinds of control like this) run as well.

So I neglected to mention the very first step: changing Firefox to use __thread. (Ok, I didn’t really neglect to mention it, I was just being lazy and anyway it’s a shaggy dog story.)

Do Not Use libthread_db

RR did not implement qGetTLSAddr, which we needed, because lots of people at Mozilla use RR. So I set out to implement that. This meant a foray into the dangerous world of libthread_db.

For reasons I do not know, and suspect that I do not want to know, glibc has historically followed many Solaris conventions. One such Solaris innovation was libthread_db — a library that debuggers use to find certain information from libc, information like the address of a thread-local variable

On the surface this seems like a great idea: don’t bake the implementation details of the C library into the debugger. Instead, let the debugger use a debugging library that comes with the C library. And, if you designed it that way, it would be a good idea.

Sadly, though, libthread_db was not designed that way. Oh no.

For example, libthread_db has a callback interface. The calling program — gdb or rr — must provide some functions that libthread_db can call, to do some simple things like “read some memory”; or some very complicated things like “find the address of a symbol given its name”. Normal C programmers might implement these callbacks using a structure containing function pointers. But not libthread_db! Instead it uses fixed symbol names that must be provided by the calling application. Not all of these are required for it to work (you get to figure out which, yay!), but some definitely are. And, you have to dlopen a libthread_db that matches the libc of the inferior that you’re debugging (or link against it, but that’s also obviously bad).

Wait, you say. Doesn’t that mess up cross-debugging? Why yes! Yes it does! Which is why qGetTLSAddr has to be in the gdb remote serial protocol to start with.

Hey, maybe the Linux vendors should fix this. They are — see Gary Benson’s Infinity project — but unfortunately that’s still in development and I wanted RR to work sooner.

Ok, so whew. I wrote qGetTLSAddr support for RR. This was a small patch in the end, but an unusual pain in an already painful series. Hopefully this won’t spill out into other programs.

glibc

Hahaha, you are so funny. Of course it spills out: remember how you have to define a bunch of functions with specific names in your program in order to use libthread_db? Well, how do you know you got the types correct?

Yeah, you include <proc_service.h> (a name deliberately chosen to confuse, I suppose, why not, it doesn’t bear any obvious relationship to the library). Only, that was never installed by glibc. Instead, gdb just copied it into the source tree.

So naturally I went and fixed this in glibc. And, even more naturally, this broke the gdb build, which was autoconf’d to check for a file that never existed in the past. LOL.

Thank You Cthulhu

At this point I figured it was only a matter of time until I had to patch the kernel. Thankfully this hasn’t been necessary yet.

It Says What

In gdb the actual unwinding and the display of frames are separate concerns.

And let me digress here to say that gdb’s unwinder design is excellent. I believe it was redone by Andrew Cagney (this was well before my active time in gdb, so apologies if you’re reading this and you did it and I’ve misattributed it). Like much of gdb, many of the details are bizarre and take one back to the byte-counting days of 1987; but the high level design is very solid and has endured with, I think, just one significant change (to support inline functions) in the intervening 15 or so years. I’ve long thought that this is a remarkable accomplishment in the programming world.

So, yes. It’s not enough to just unwind. Simply having an unwinder yields backtraces with lines like:

#5 0xfeefee ???

Better than nothing! But not yet great.

The second part of the SpiderMonkey unwinder is, therefore, a gdb “frame filter”. This is an object that takes raw frames and decorates them with information like a function name, or a file name, or arguments.

Work to add this information is ongoing — I landed one patch just yesterday, and another one, to add more information about interpreted frames, is still in the works. And there are two more bugs filed… maybe this project, like this blog post, will never conclude. It will just scroll endlessly.

But now, with all the code in place, bt can show something like:

#6 0x00007ffff7ff20f3 in <<JitFrame_BaselineJS "f1">> (this=JSVAL_VOID, arg1=$jsval(4700))

This is the call f1(4700).

Let’s Just Have One More

Of course we still couldn’t enable this unwinder by default. You have to enable it by hand.

And by the way, in the first release of gdb’s Python unwinder feature, enabling or disabling an unwinder didn’t flush the frame cache, so it wouldn’t actually take effect until some invisible-to-the-user state change took place. I fixed this bug, but here Pedro Alves also taught me the secret gdb command flushregs, which in fact just flushes the frame cache. (I’m going to go out on a limb and guess that this command predates the already ancient maint prefix command, hence its weird name.)

Anyway, you have to enable it by hand because the unwinder itself doesn’t work properly if the outermost frame is in JIT code. The JIT, in the interest of performance, doesn’t maintain a frame pointer. This means that in the outermost frame, there’s no reliable way to find the object that describes this frame and links to the previous frame.

Now, normally in this case gdb would either resort to debug info (not available here), or in extremis its encyclopedic suite of prologue analyzers (yes, gdb can analyze common function prologues for all architectures developed in the last 25 years to figure out stuff) — but naturally JIT compilers go their own way here as well.

Humans, like Shu back at the start of this story, can do this by dumping parts of the stack and guessing which bytes represent the frame header.

But, I’ve been reluctant and a bit afraid to hack a heuristic into the unwinder.

To sum up — in case you missed it — this means that all the code written during this entire saga would still not have helped with my original bug.

The End

GDB Preattach

In firefox development, it’s normal to do most development tasks via the mach command. Build? Use mach. Update UUIDs? Use mach. Run tests? Use mach. Debug tests? Yes, mach mochitest --debugger gdb.

Now, normally I run gdb inside emacs, of course. But this is hard to do when I’m also using mach to set up the environment and invoke gdb.

This is really an Emacs bug. GUD, the Emacs interface to all kinds of debuggers, is written as its own mode, but there’s no really great reason for this. It would be way cooler to have an adaptive shell mode, where running the debugger in the shell would magically change the shell-ish buffer into a gud-ish buffer. And somebody — probably you! — should work on this.

But anyway this is hard and I am lazy. Well, sort of lazy and when I’m not lazy, also unfocused, since I came up with three other approaches to the basic problem. Trying stuff out and all. And these are even the principled ways, not crazy stuff like screenify.

Oh right, the basic problem. The basic problem with running gdb from mach is that then you’re just stuck in the terminal. And unless you dig the TUI, which I don’t, terminal gdb is not that great to use.

One of the ideas, in fact the one this post is about, since this post isn’t about the one that I couldn’t get to work, or the one that is also pretty cool but that I’m not ready to talk about, was: hey, can’t I just attach gdb to the test firefox? Well, no, of course not, the test program runs too fast (sometimes) and racing to attach is no fun. What would be great is to be able to pre-attach — tell gdb to attach to the next instance of a given program.

This requires kernel support. Once upon a time there were some gdb and kernel patches (search for “global breakpoints”) to do this, but they were never merged. Though hmm! I can do some fun kernel stuff with SystemTap…

Specifically what I did was write a small SystemTap script to look for a specific exec, then deliver a SIGSTOP to the process. Then the script prints the PID of the process. On the gdb side, there’s a new command written in Python that invokes the SystemTap script, reads the PID, and invokes attach. It’s a bit hacky and a bit weird to use (the SIGSTOP appears in gdb to have been delivered multiple times or something like that). But it works!

It would be better to have this functionality directly in the kernel. Somebody — probably you! — should write this. But meanwhile my hack is available, along with a few other gdb scxripts, in my gdb helpers github repository.

import gdb

Occasionally I see questions about how to import gdb from the ordinary Python interpreter. This turns out to be surprisingly easy to implement.

First, a detour into PIE and symbol visibility.

“PIE” stands for “Position Independent Executable”. It uses essentially the same approach as a shared library, except it can be applied to the executable. You can easily build a PIE by compiling the objects with the -fPIE flag, and then linking the resulting executable with -pie. Normally PIEs are used as a security feature, but in our case we’re going to compile gdb this way so we can have Python dlopen it, following the usual Python approach: we install it as _gdb.so and add a a module initialization function, init_gdb. (We actually name the module “_gdb“, because that is what the gdb C code creates; the “gdb” module itself is already plain Python that happens to “import _gdb“.)

Why install the PIE rather than make a true shared library? It is just more convenient — it doesn’t require a lot of configure and Makefile hacking, and it doesn’t slow down the build by forcing us to link gdb against a new library.

Next, what about all those functions in gdb? There are thousands of them… won’t they possibly cause conflicts at dlopen time? Why yes… but that’s why we have symbol visibility. Symbol visibility is an ELF feature that lets us hide all of gdb’s symbols from any dlopen caller. In fact, I found out during this process that you can even hide main, as ld.so seems to ignore visibility bits for this function.

Making this work is as simple as adding -fvisibility=hidden to our CFLAGS, and then marking our Python module initialization function with __attribute__((visibility("default"))). Two notes here. First, it’s odd that “default” means “public”; just one of those mysterious details. Second, Python’s PyMODINIT_FUNC macro ought to do this already, but it doesn’t; there’s a Python bug.

Those are the low-level mechanics. At this point gdb is a library, albeit an unusual one that has a single entry point. After this I needed a few tweaks to gdb’s startup process in order to make it work smoothly. This too was no big deal. Now I can write scripts from Python to do gdb things:

#!/usr/bin/python
import gdb
gdb.execute('file ./install/bin/gdb')
print 'sizeof = %d' % gdb.lookup_type('struct minimal_symbol').sizeof

Then:

$ python zz.py
72

Soon I’ll polish all the patches and submit this upstream.

Quick Multi-process Debugging Update

In my last post I mentioned that setting breakpoints is a pain when debugging multiple processes in GDB. While there are some bugs here (we’re actively working on them), it isn’t hard to make the basic case work. In fact, there’s nothing to it. Some background…

Starting with GDB 7.4, we changed how basic breakpoint specifiers (called “linespecs”) work. Previously, a linespec applied somewhat randomly to the first matching symbol found in your code. This behavior probably made sense in 1989, when all you had were statically linked executables; but nowadays it is much more common to have dozens of shared libraries, with the attendant name clashes.

So, instead of having GDB guess which symbol you meant, now a breakpoint just applies to all of them. Our idea is that we’ll start supplying ways to narrow down exactly which spots you meant to name, say by adding syntax like “break libwhatever.so:function“, or whatever.

Anyway, this new work also applies across inferiors. Here’s an example of debugging “make“, then setting a breakpoint on a function in libcpp (which itself is linked into a sub-process of gcc):

(gdb) b _cpp_lex_direct
Function "_cpp_lex_direct" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_cpp_lex_direct) pending.
(gdb) run
Starting program: /usr/bin/make
gcc -g -o crasher crasher.c
[New inferior 8761]
[New process 8761]
process 8761 is executing new program: /usr/bin/gcc
[New inferior 8762]
[New process 8762]
process 8762 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1

Breakpoint 1, 0x0000000000b156a0 in _cpp_lex_direct ()

The remaining issues have to do with breakpoint re-setting not doing the right thing with running inferiors. This causes some scary warnings when running, but I think for the time being you can just ignore those.

Well, I should say those are the known issues. This feature hasn’t had as much use as I would like (judging from the low bug rate — I can’t tell if that is a good insight or a horrible realization). So, try it out and report problems to GDB Bugzilla. We’ll be making it work for you.

Debugging multiple programs at once

Consider this Makefile:

all: runit

runit: crasher
	./crasher

crasher: crasher.c
	gcc -g -o crasher crasher.c

And, here is the program it is building:

int *x = 0;

int main ()
{
  *x = 52;
}

Now, if you run “make“, eventually you will see a crash. But how to debug the crash?

Well, obviously, this is a trivial example so you’d just debug the program. But what if you had a complex script involving extensive and obscure initialization? Say, in your test suite? The traditional answer is logging plus cut and paste into gdb; or perhaps hacking an invocation of gdb --args into your script. Nowadays you can do better, though.

Let’s start by debugging make:

$ gdb -quiet make
Reading symbols from /usr/bin/make...(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install make-3.82-8.fc16.x86_64

Now set things up for multi-inferior debugging:

(gdb) set detach-on-fork off
(gdb) set target-async on
(gdb) set non-stop on
(gdb) set pagination off

(Yes, it is silly how many settings you have to tweak; and yes, we’re going to fix this.)

Now do it:

(gdb) run
Starting program: /usr/bin/make
gcc -g -o crasher crasher.c
[New inferior 9694]
[New process 9694]
process 9694 is executing new program: /usr/bin/gcc
[New inferior 9695]
[New process 9695]
process 9695 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1
Missing separate debuginfos, use: debuginfo-install gcc-4.6.2-1.fc16.x86_64
[Inferior 3 (process 9695) exited normally]
[Inferior 9695 exited]
Missing separate debuginfos, use: debuginfo-install cpp-4.6.2-1.fc16.x86_64
(gdb) [New inferior 9696]
[New process 9696]
process 9696 is executing new program: /usr/bin/as
[Inferior 4 (process 9696) exited normally]
[Inferior 9696 exited]
[New inferior 9697]
[New process 9697]
process 9697 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/collect2
Missing separate debuginfos, use: debuginfo-install binutils-2.21.53.0.1-6.fc16.x86_64
[New inferior 9698]
[New process 9698]
process 9698 is executing new program: /usr/bin/ld.bfd
Missing separate debuginfos, use: debuginfo-install gcc-4.6.2-1.fc16.x86_64
[Inferior 6 (process 9698) exited normally]
[Inferior 9698 exited]
[Inferior 5 (process 9697) exited normally]
[Inferior 9697 exited]
[Inferior 2 (process 9694) exited normally]
[Inferior 9694 exited]
./crasher
[New inferior 9699]
[New process 9699]
process 9699 is executing new program: /tmp/crasher
Missing separate debuginfos, use: debuginfo-install binutils-2.21.53.0.1-6.fc16.x86_64

Program received signal SIGSEGV, Segmentation fault.
0x000000000040047f in main () at crasher.c:5
5      *x = 52;

Cool stuff. Now you can inspect the crashed program:

(gdb) info inferior
Num  Description       Executable
  7    process 9699      /tmp/crasher
* 1    process 9691      /usr/bin/make
(gdb) inferior 7
[Switching to inferior 7 [process 9699] (/tmp/crasher)]
[Switching to thread 7 (process 9699)]
#0  0x000000000040047f in main () at crasher.c:5
5      *x = 52;

There is still a lot of work to do here — it is still a bit too slow, setting breakpoints is still a pain, etc. These are all things we’re going to be cleaning up in the coming year.

Older Entries