After literally years of false starts and failed attempts, last week I finally checked in a series of patches that speed up GDB’s DWARF reader. The speedup for ordinary C++ code is dramatic — I regularly see a 7x performance improvement. For example, on this machine, startup on gdb itself drops from 2.2 seconds to 0.3 seconds. This seems representative, and I’ve seen even better increases on my work machine, which has more cores. Startup on Ada programs is perhaps the worst case for the current code, due to some oddities in Ada debuginfo, but even there it’s a respectable improvement.
GDB Startup
GDB, essentially, had two DWARF readers. They actually shared a surprisingly small amount of code (which was an occasional source of bugs). For example, while abbrev lookup and name generation (more on that later) was shared, the actual DIE data structures were not.
The first DWARF reader created “partial symbols”, which held a name and some associated, easy-to-compute data, like the kind of symbol (variable, function, struct tag, etc). The second DWARF reader (which is still there now) is called when more information was needed about a particular symbol — say, its type. This reader reads all the DIEs in a DWARF compilation unit and expands them into gdb’s symbol table, block, and type data structures.
Both of these scans were slow, but for the time being I’ve only rewritten the first scan, as it was the one that was first encountered and most obviously painful. (I’ve got a plan to fix up the CU expansion as well, but that’s a lengthy project of its own.)
What Was Slow
The partial symbol reader had several slow points. None of them seemed obviously slow if you looked with a profiler, but each one performed unnecessary work, and they combined in an unfortunate way.
- The partial DIE cache. GDB did a scan and saved certain DIEs in a cache. There were some helpful comments that I believe were true at one point that explained why this was useful. However, I instrumented GDB and found that less than 10% of the cached DIEs were ever re-used. Computing and allocating them was largely a waste, just to support a few lookups. And, nearly every DIE that was ever looked up was done so on behalf of a single call — so the cache was nearly useless.
- Name canonicalization. DWARF says that C++ names should follow the system demangler. The idea here is to provide some kind of normal form without having to really specify it — this matters because there are multiple valid ways to spell certain C++ names. Unfortunately, GCC has never followed this part of DWARF. And, because GDB wants to normalize user input, so that any spelling will work, the partial reader normalized C++ names coming from the DWARF as well. This area has a whole horrible history (for example, the demangler is crash-prone so GDB installs a SEGV handler when invoking it), but the short form here is that the partial symtab reader first constructed a fully-qualified name, and only then normalized it. This meant that any class or namespace prefix (and there are a lot of them) was re-normalized over and over while constructing names.
- The bcache. The partial symbol reader made heavy use of a data structure in GDB called a bcache. This is like a string interner, but it works on arbitrary memory chunks. The bcache was used to intern both the names coming from canonicalization, as well as the partial symbols themselves. This in itself isn’t a problem, except that it requires a lock if you want to use it from multiple threads.
The New Reader
The new reader fixes all the above problems, and implements some other optimizations besides.
There is no more partial DIE cache. Instead, GDB simply scans the DWARF and immediately processes what it finds. While working on this, I realized that whether a given DIE is interesting or not is, largely, a static property of its abbrev. For example, if a DIE does not have a name and does not refer back to another DIE (either via “specification” or “origin” — DWARF is weird), then it can simply be skipped without trying to understand it at all. So, in the new reader, this property is computed once per abbrev and then simply consulted in the scanner, avoiding a lot of repeated checks.
The entire scanner is based on the idea of not trying to form the fully qualified name of a symbol. Now, while the rest of GDB wants the fully-qualified name, there’s no need to store it. Instead, the conversion is handled by the name-lookup code, which splits the searched-for name into components. The scanner creates an index data structures that’s similar to what is described by DWARF 5 (modulo bugs in the standard).
As part of this non-qualifying approach, only the “local” name is stored in each entry. Name canonicalization must still be done for C++ (and a more complicated process for Ada), but this is done on much shorter strings. A form of string interning is still used, but it takes advantage of the fact that the original string comes from the DWARF string table, and so simple pointer comparisons can be done (normally the linker combines identical strings, and if not, this just wastes a little memory). Furthermore, the interning is all done in a worker thread, so in most cases the GDB prompt will return before the work is fully complete — this makes an illusion of speed, and a nicer experience as a user.
Speaking of threads, GDB also now scans all DWARF compilation units in parallel. Specifically, GDB has a parameter that sets the number of worker threads, and it uses a parallel for-each to split the list of compilation units into N groups, and each thread works on a group. I experimented a bit and found that setting N to the number of CPUs on the system works well, at least on the machines I have available.
There’s probably still some room to speed things up some more. Maybe there are some micro-optimizations to be done. Maybe GCC could canonicalize C++ names, and we could eliminate an entire step; or maybe GDB could trade memory for performance and shard the resulting index and do separate canonicalizations in each worker thread.
There’s still an unfortunate amount of hair in there to deal with all the peculiarities of DWARF. DWARF is nicely flexible, but sometimes much too flexible, and actively difficult to read. Also, each version of DWARF yields new modes, which complicate the design. In addition to ordinary DWARF, GDB also deals with split DWARF (two or maybe three kinds), dwz-compressed DWARF (which is standard but has very many inter-CU references, where ordinary compiler-generated DWARF has none), the multi-file dwz extension, and the old debug_types section. Each of these needed special code in the new reader.
Future Work
Full CU expansion is still slow. You don’t see this (much) during GDB startup, but if you’ve ever done a ‘next’ or ‘print’ and then waited interminably — congratulations, you’ve found a bad CU expansion case. Normally these occur when GDB encounters some truly enormous CU… in my experience, most CUs are small, but there are some bogglingly huge outliers.
This is probably the next thing to fix.
The current code still shares less code with the second DWARF reader than you may think. For example, the full symbol reader constructs fully-qualified names according to its own, different algorithm.
My current plan here is to reuse the existing index to construct a sort of skeleton symbol table. Then, we’d further change GDB to fill in the bodies of individual symbols on demand — eliminating the need to ever do a full expansion. (Perhaps this could be extended to types as well, but internally in GDB that may be trickier.) As part of this, the fully-qualified names would be constructed from the index itself, which is also much cheaper than re-computing and re-canonicalizing them.
Summary
GDB is a lot faster to start now. This was done through a combination of removing useless work, smarter data structures, and exploiting the wide availability of multi-core machines.
11 Comments
Nice to see this continue to improve over time! It all seemed like a good idea when I wrote the pieces you’ve now removed 🙂
[…] his blog, Tom Tromey writes about speeding up the startup of the GDB debugger. He sees 7x improvements in startup time (e.g. […]
[…] his blog, Tom Tromey writes about speeding up the startup of the GDB debugger. He sees 7x improvements in startup time (e.g. […]
[…] Read More […]
I click on the “movies” tag and see the last entry was in 2009. Allow me to recommend “Everything Everywhere All At Once”, though I may have enjoyed the same directors’ “Swiss Army Man” even more. Pretty weird that you didn’t see “Tampopo” until 2009; I hope you have since seen Itami’s “A Taxing Woman”, with largely the same actors. (Or Kurosawa’s “High and Low” from 1961, if you want to see a young Gun.)
[…] submitted by /u/ouyawei [link] […]
Is this improvement going to be part of the gdb 12 release?
No, it was a bit late for 12 and so it landed shortly after the 12 branch was made. It will be in 13.
Does this make any difference for people who already use gdb index?
No. However, for me, running gdb on itself now starts as fast as if I had made an index. So, you can try just not making an index, which will save some time somewhere.
[…] parsing approach in drgn, Meta’s open-source debugger, to GDB. Parallel parsing had already been implemented for the index loader, leaving only the full loader and the index parser as potential next candidates in line for […]