Archive for March, 2008

Using Python in Gdb

Today I used the Python support in gdb to do real work. Until now I’ve just been playing around, adding functionality that looked fun.

My application was something I’ve wanted to be able to do in gdb for a long, long time: set a breakpoint in one function, but have the breakpoint be conditional on the caller. In my case, I’m debugging the compile server, and I want to examine calls to c_parser_lookup_callback which pass a complicated test (that is, the breakpoint is in the true branch of an if statement); but only calls originating in declspecs_add_type are interesting.

Roland taught me that you can fake this in plain gdb by using convenience variables and breakpoint commands. However, it is rather painful to get this right, as you have to remember to reset the convenience variable in some cases (in my case, if we reach the outer breakpoint but not the inner one).

With the Python support, this kind of thing is easy. First I define a Python function to do a little work:

(gdb) python
Type python script
End with a line saying just "end".
>def check():
>  frame = gdb.current_frame().get_prev()
>  return frame.get_name() == "declspecs_add_type"
>end

As you can see this just checks the parent frame’s function name.

Now, I make the breakpoint conditional on this function’s value:

(gdb) cond 1 $(check())

That’s all there is to it!

I suppose I could have written this as a one-liner, like:

(gdb) cond 1 $(gdb.current_frame().get_prev().get_name() == "declspecs_add_type")

Both of the above formulations still seem a little clunky to me. But, it is still early days for the Python integration; I think we’ll find some nice ways to simplify common tasks.

What’s your missing feature?

Emacs and Threading

While once again waiting for Gnus to contact a news server, I thought: I’ll never be able to move my RSS reading into Gnus, because the delays will skyrocket. Sure, there’s nntp//rss, but that means configuring a separate program and keeping it running — and I’ve heard that this program can have a memory footprint as big as Emacs itself. (As an aside: remember the old days when Emacs was routinely the program with the largest footprint on your desktop? For me it is never number one, and sometimes even slips to 3 or 4.)

Maybe some savior will come along and make Gnus fetch RSS feeds in the background, using a process filter. I assume, without looking, that retro-fitting this into Gnus would be very hard. For new Emacs code, though, this is the way to go; you can set things up so that most mode-specific operations report “working…” back to the user when background operations are happening — while still letting the user switch buffers and work on other things. For instance, nowadays vc-annotate works this way, which is very nice, since annotate is fairly slow in most version control systems.

Even better would be to make Emacs capable of multi-threading. Most people arrive at this idea eventually. Unfortunately, I think it is just not possible; partly due to bad language choices: dynamic scope is very handy, but having only dynamic scope is terrible; but also partly due to consequent design choices for the rest of Emacs: buffers are big global objects, maintaining compatibility for the enormous body of existing lisp is crucial, and auditing even the built-in body of elisp is, in difficulty, somewhere between daunting and impossible.

A few weeks ago I heard a funny idea in this area. Instead of trying to handle multi-threading, how about old fashioned multi-process support, with some kind of message passing? Emacs could fork(), and then the child could wander off with its own copy of everything; and then the subprocess could send up messages and data which would be integrated into the Emacs event loop. This is basically the same idea as process filters, only with the benefit that the process could be expressed in the same lisp form as the handler, and the subprocess would have access to all the relevant lisp state.

Naturally, most of these messages would just be elisp; but perhaps it would be worthwhile to add a way to transfer the contents of a buffer wholesale.

Vantage Point

This was a decent action movie, though not one I would see a second time. I suppose that puts it in the second or maybe third tier. It is film with a gimmick — it rewinds and shows the action, again, from someone else’s viewpoint. Sometimes these gimmicks are irritating, though this one didn’t bother me so much. There are a number of twists, which is fun. Plus, Forest Whittaker is in it, and I really like him.

Dark Tort versus The Little Sister

A while ago I read The Little Sister, by Raymond Chandler, and also Dark Tort, by Dianne Mott Davidson.

I had sworn not to read any more of Davidson’s books, but, like Marlowe, boredom and angst got the better of me. It was just sitting there, on Elyn’s side of the bed, promising relief. “Look at me”, said the cover. “I am not the terrible books you have already read. I am different.”

Naturally it lied.

I tried to picture Marlowe living in Aspen Meadows, working for a caterer. Anything to make it through the book, the reading of which, for some reason, had become like a duel. I could best Davidson: her bad writing, her undistinguished observations of Colorado life.

She tried to wear me down. First she had all the characters phrase statements as questions? Over and over? As if she had learned a new writing trick? I perservered.

Next she enumerated the many ultimate comfort foods — a specialized torture which had successfully broken George Will. I was stronger than that, more flexible. I can accept that Apple Betty is the ultimate one day, but Mac and Cheese the next. I have three ultimate comfort foods before breakfast.

Wily, evil Davidson tried repetition as well. Perhaps she could lull me into complacency with warm, fresh bread. Never just bread, only warm, fresh bread, a mantra to destroy my reading skills.

But what drove me to picturing Marlowe was a vignette picturing Boulderites. It’s as if she were writing for me, trying to probe my pet peeves. We Bouldarians are flighty. We’re paranoid. We think that garbage trucks are evil. We’re little old ladies. Sure, Boulder has its whatevers and et ceteras; but wasn’t Traven rumored to live on Spruce Street? That should be dark enough for anybody.

Someday, I hope, Davidson will lose it a little and write her own anti-novel, something that will annihilate her previous work. We’ll see Aspen Grove as it truly is; perhaps a corrupt small town with a machiavellian caterer pulling the strings. Marlowe will move there from Los Angeles to cure his vapors, and proceed to confront the yokel sociopaths and fight and shoot his way through the cafes and dog-washing businesses. Someday.

Miss Pettigrew Lives for a Day

We had read a lukewarm review of this on imdb, but we went anyway — and loved it. The plot is a bit thin, perhaps, but the movie has a sweet heart, the cast is good, and the sets and costumes are fantastic.

Gold is released

Ian Taylor checked in the long-awaited “gold“. Gold is a new ELF-only linker written in C++. It is designed for performance and is much faster than the current binutils ld.

I’m very happy about this for a few reasons. First, we’ve needed a new linker for a long, long time. Second, this will help the incremental compiler.

I looked through the gold sources a bit. I wish everything in the GNU toolchain were written this way. It is very clean code, nicely commented, and easy to follow. It shows pretty clearly, I think, the ways in which C++ can be better than C when it is used well.

Congratulations, Ian!

Compile Server Scalability

There are a few aspects to compile server scalability that are important to address.

First, and most obviously, memory use. Because we want to be able to send real programs through the compile server, and because we want it to remain live for relatively long periods of time, it is important that memory use be “acceptably bounded”. Naturally, the server process will grow with each additional compilation unit. At least in the straightforward implementation, there’s no way around that (but see below). However, it is important that the server not leak memory, and that recompilations generally not increase memory use. Also, ideally, all that work on decl sharing will keep memory use in check.

For the most part, this did not take any effort to achieve. GCC has a built-in garbage collector, and most nontrivial data structures are allocated using the GC. This is not a silver bullet, of course, but it has yielded good results with little effort in practice.

In the case of recompilation, we employ a simple heuristic — we store all parsed hunks keyed off the name of the requested object file (note: not the input file; it is common for a project to compile a given source file multiple times, but it is rare to see the same object file name more than once). When recompiling an object, we assume that there will be a lot of reuse against the object’s previous version, so we store those hunks temporarily, but then discard the old ones at the end of compilation. This way, we reuse, but we can also free hunks which are no longer in use.

Results from a few tests are very encouraging here. I compiled gdb with the compile server, then deleted the object files and re-compiled. Memory use (as reported by -fmem-report) stayed flat at around 51M — meaning that recompilation doesn’t grow the image, and the collection approach is working as desired.

I also built gdb using the compiler in “normal” mode, and looked at the -fmem-report totals. If you sum them up, which I naively expect gives a rough idea of how much memory --combine would use, you get 1.2G. Or, in other words, decl sharing appears to make a huge difference (I’m not completely confident in this particular number).

If memory use does become a problem for very large compiles, we could look at scaling another way: writing out hunks and reading them back in. Maybe we could use machinery from the LTO project to do this. This would only be useful if it is cheaper to read decls via LTO than it is to parse the source; if this is not cheaper then we could instead try to flush out (and force re-parsing of) objects which are rarely re-used. One special case of this is getting rid of non-inlineable function bodies — when we have incremental code-generation, we’ll never compile a function like that more than once anyway.

Another scalability question is how to exploit multiple processors, either multi-core machines, or compile farms. In an earlier post, I discussed making the compile server multi-threaded. However, that interacts poorly with our code generation approach (fork and do the work in the child), so I am probably not going to pursue it. Instead, for the multi-core case, it looks straightforward to simply run multiple servers — in other words, you would just invoke “gcc --server -j5“. Something similar can be done for compile farms.

An ideal result for this project would be for small changes to result in compilation times beneath your perceptual threshold. I doubt that is likely to happen, but the point is, the absolute turnaround time is important. (This is not really a question of scalability, but I felt like talking about it anyway.)

In the current code, though, we always run the preprocessor for any change. So, even once incremental code generation is implemented, the turnaround time will be bound by the time it takes to preprocess the source. This might turn out to be a problem.

In an earlier design (and in some other designs I have heard of), this is handled by making a model of compilation that includes preprocessing. That seems too complicated to me, though, and instead I think that it should be possible to also make an incremental preprocessor (say, one that uses inotify to decide what work must be re-done), and then use it without excessive cooperation from the parser.

Python and Gdb

Recently I’ve been hacking on integrating Python scripting support into gdb. For years now I’ve been wanting better scripting in gdb, but until I saw Volodya’s patch I never did anything about it. So, the other night I made a git repository (thanks gitorious!) and started hacking away. Thiago Bauermann did some nice updates on Volodya’s value-inspecting code, too.

A decent number of things are working. See the wiki page for details on cloning the repository.

Since I basically live in Emacs nowadays, I wanted to install the Python documentation in info form. Am I the only person who still loves info? It isn’t beautiful, to be sure, but it is amazingly convenient inside Emacs — simple to navigate, call up, and dismiss; with info-lookup it can function as a low-rent context-sensitive help; no messy fussing with the mouse.

Anyway, I couldn’t find this readily available anywhere, so in the end I checked out python myself and built the docs. That was sort of a pain… I’m half considering making an ELPA package out of the info pages. Come to think of it there are probably a number of potential info-only packages out there.