Archive for March, 2025

Faster Faster GDB Startup

A while ago, I wrote about my work to speed up GDB’s DWARF reader. I thought I’d write again with a few updates.

Sharding

Back then, I wrote: “maybe GDB could trade memory for performance and shard the resulting index and do separate canonicalizations in each worker thread”.

I did end up doing this. Recall that the canonicalization step goes through all the discovered DWARF entries of interest — basically, all the objects in the program that both have a name and are not in a function scope (except in some languages, there is always an exception with DWARF) — and ensures the names are in a normal form. For Ada, this step includes synthesizing the package hierarchy (something that should probably be done for Go as well, except nobody really works on the Go support in GDB).

As an aside, sometime in the last few years we realized that this canonicalization has to be done for C as well, because in C there are multiple spellings of types like “short”. This is also implemented.

Because GDB already reads DWARF CUs in chunks in separate threads, the sharding idea is that we can speed up canonicalization a bit by doing this separately. Previously, GDB combined all the results before processing. Sharding means that lookups are a little more complicated; but it turns out not to be too hard, because the number of shards is typically low (for reasons I haven’t yet investigated, the reader doesn’t scale past 8 threads or so).

Background Reading

The other major change I made is to do all the DWARF reading in the background. This is a trick to make gdb feel faster to users. The basic idea here is that in many cases, gdb does not immediately need the DWARF from the various files. So, if we push the reading into worker threads, maybe it will be completely read in by the time gdb does need it.

This also somewhat benefits the situation where several shared libraries are loaded at once into the inferior. In this case, gdb already defers breakpoint re-setting until all the DWARF has been read — and with this change, all that work will be done in parallel.

Making this work wasn’t entirely straightforward. The main issue here is that gdb determines the initial language and location for “list” (et al) based on the debug info. The patches arrange to set these things lazily as well. I also had to add some rudimentary thread-safety to BFD.

Now, this can be defeated in a few ways. If you have a .gdbinit that sets a breakpoint, then that will cause the familiar pause, because setting a breakpoint will wait for the workers to complete. Or, if you debug a large executable and type very quickly, you may have to wait for the parsing to finish.

However, when it does work, it feels like gdb starts instantly.

DWARF Abbrevs Use Too Much Space

I was curious about DWARF abbrev table efficiency the other day, so I instrumented gdb to record some simple stats about abbrevs: how many are seen, how many duplicates are seen, and how many bytes are used.

Running gdb on itself, I discovered that abbrevs are largely redundant. In particular, removing redundant abbrevs will remove 95% of abbrevs (10238 unique of 230714 total). Similarly the size of the abbrev tables reduces similarly (230714 bytes needed for the de-duplicated abbrevs, compared to 3848152 as seen in the executable).

Something to think about when you consider the effort DWARF puts in to save a single byte in .debug_info, say by using a 1-byte form rather than a uleb.

I was considering having gdb intern abbrevs and pre-reading all abbrevs so that later steps wouldn’t have to re-read these; but interning turns out to be too slow and so re-reading on demand seems like the way to go.