Shaggy Dogs and SpiderMonkey Unwinders

A year or so ago I was asked to debug a crash in the Firefox devtools.  Crashes are easy!  I fired up gdb and reproduced the crash… which turned out to be in some code JITted by SpiderMonkey.  I was immediately lost; even a simple bt did not work.  Someone more familiar with the JIT — hi Shu — had to dig out the answer :-(.

I did take the opportunity to get some information from him about how he found the result, though.  He pointed me to the code responsible for laying out JIT stack frames.  It turned out that gdb could not unwind through JIT frames, but it could be done by hand — so I resolved then to eventually fix this.

Phase One

I knew from my gdb hacking that gdb has a JIT unwinding API.  Actually — and isn’t this the way most programs end up working? — it has two.

The first JIT API requires some extra work on the part of the JIT: it constructs an object file, typically ELF and DWARF, in memory, then calls a hook.  GDB sets a breakpoint on this hook and, when hit, it reads the data from the inferior.  This lets the JIT provide basically any kind of information — but it’s pretty heavy.

So, I focused my attention on the second API.  In this mode, the JIT author would provide a shared library that used some callbacks to inform gdb of the details of what was going on.  The set of callbacks was much more limited, but could at least describe how to unwind the registers.  So, I figured that this is what I would do.

But… I didn’t really want to write this in C.  That would be a real pain!  C is fiddly and hard to deal with, and it would mean constant rebuilding of the shared library while debugging, and SpiderMonkey already had a reasonable number of gdb-python scripts — surely this could be done in Python.

So I took the quixotic approach, namely writing a shared library that used the second gdb JIT API but only to expose this API to Python.

Of course, this turned out to be Rube Goldbergian.  Various parts of the gdb Python API could not be called from the JIT shared library, because those bits depended on other state in gdb, which wasn’t set properly when the JIT library was being called.  So, I had gdb calling into my shared library, which called my Python code, which then invoked a new gdb command (written in Python and supplied by my package) — that existed solely for the purpose of setting this internal state properly — and that in turn invoked the code I wanted to run, say to fetch memory or a register or something.

Computer Science!

Well, that took a while.  But it sort of worked!  And maybe I could just keep it in github and not put it in Mozilla Central and avoid learning about the Firefox build system and copying in some gdb header file and license review and whatnot.

So I started writing the actual Python code… OMG.  And see below since you will totally want to know about this.  But meanwhile…

… while I was hacking away on this crazy idea, someone implemented the much more sane idea of just exposing gdb’s unwinder API to gdb’s Python layer.

Hmm… why didn’t I do that?  Well, I left gdb under a bit of a cloud, and didn’t really want to be that involved at the time.  Plus, you know, gdb is a high quality project; which means that if you write a giant patch to expose the unwinding API, you have to be prepared for 17 rounds of patch review (this really happened once), plus writing documentation and tests.  Sometimes it’s just easier to channel one’s inner Rube.

Phase Two

The integrated Python API was a great development.  Now I could delete my shared library and my insane trampoline hacks, and focus on my insane unwinding code.

A lot of this work was straightforward, in the sense that the general outline was clear and just the details remained.  The details amount to things like understanding the SpiderMonkey frame descriptor (which partly describes the previous frame and partly the new frame; there’s one comment explaining this that somehow eluded me for quite a while); duplicating the SpiderMonkey JIT unwinding code in Python; and of course carefully reading the SpiderMonkey code that JITs the “entry frame” code to understand how registers are spilled.

Naturally, while doing this it turned out that I was maybe the first person to use these gdb APIs in anger.  I found some gdb crashes, oops!  The docs would have been impenetrable, except I already knew the underlying C APIs on which they were based… whew!  The Python API was unexpectedly picky in other areas, too.

But then there was also some funny business, one part in gdb, and one part in SpiderMonkey.

GDB is probably more complicated than you realize.  In this case, the complexity is that, in gdb, each stack frame can have its own architecture.  This seemingly weird functionality is actually used; I think it was invented for the SPU, but some other chips have multiple modes as well.  But what this means is that the question “what architecture is this program?” is not well-defined, and anyway gdb’s Python layer doesn’t provide you a way to find whatever approximation it is that would make sense in your specific case.  However, when writing the SpiderMonkey unwinder, it kind of actually is well-defined and we’d like to know the answer to know which unwinder to choose.

For this problem I settled on the probably terrible idea of checking whether a given register is available.  That is, if you see “$rip“, you can guess it’s x86-64.

The other problem here is that gdb thinks that, since you wrote an unwinder, it should get the first stab at unwinding.  That’s very polite!  But for SpiderMonkey, deciding “hey, is this PC in some code the JIT emitted?” is actually a real pain, or at least outside the random bits of it I learned in order to make all this work.

Aha!  I know, there’s probably a Python API to say “is this address associated with some shared library?”  I remembered reading and/or reviewing a patch… but no, gdb.solib_name is close but doesn’t do the right thing for addresses in the main executable.  WAT.

I tried several tricks without success, and in the end I went with parsing /proc/maps to get the mappings to decide whether a given frame should be handled by this unwinder or by gdb.  Horrible.  And fails with remote debugging.

Luckily, nobody does remote debugging.

Remote Debugging

Oh, wait, people do remote debugging at Mozilla all the time.  They don’t call it “remote debugging” though — they call it “using RR“, which while it runs locally, appears to be remote to gdb; and, importantly, during replay mode fakes the PID, and does other deep magic, though not deep enough to extend to making a fake map file that could be read via gdb’s remote get command.

By the way, you should be using RR.  It’s the best advance in debugging since, well, gdb.  It’s a process record-and-replay program, but unlike gdb’s built-in reverse debugging, it handles threads properly and has decent performance.

Oh Well

Oh well.  It just won’t work remotely.  Or at least not until fellow Mozillian (this always seems like it should be “Mozillan” to me, but it’s not, there really is that extra “i”) and all-star Nicolas Pierron wrote some additional Python to read some SpiderMonkey tables to make the decision in a more principled way.  Now it will all work!

Though looking now I wonder if I dreamed this, because the code isn’t checked in.  I know he had a patch but my memory is a bit fuzzy — maybe in the end it didn’t work, because RR didn’t implement the qGetTLSAddr packet, which gdb uses to read thread-local storage.  Did I mention the thread-locals?

The Real Start of the Story

So, way back at the beginning, during my initial foray into this code, I found that a crucial bit of information — the appropriately-named TlsPerThreadData — was stashed away in a thread-local variable.  Information stored here is needed by the unwinder in order to unwind from a C++ frame into a JIT frame.

Only, Firefox didn’t use “real” thread-local variables, the things that so many glibc and gcc hackers put so much effort into micro-optimizing.  No, it just used a template class that wrapped pthread_setspecific and friends in a relatively ergonomic way.

Naturally, for an unwinder this is a disaster.  Why?  Unwinding is basically the dissection of the stack; but in order to compute the value of one of these thread-local-storage objects, the unwinder would have to make some function calls in the inferior (in fact this prevents it from working on OSX).  But these would affect the stack, and also potentially let other inferior code (in other threads — remember, gdb is complicated and you can exert various unusual kinds of control like this) run as well.

So I neglected to mention the very first step: changing Firefox to use __thread.  (Ok, I didn’t really neglect to mention it, I was just being lazy and anyway it’s a shaggy dog story.)

Do Not Use libthread_db

RR did not implement qGetTLSAddr, which we needed, because  lots of people at Mozilla use RR.  So I set out to implement that.  This meant a foray into the dangerous world of libthread_db.

For reasons I do not know, and suspect that I do not want to know, glibc has historically followed many Solaris conventions.  One such Solaris innovation was libthread_db — a library that debuggers use to find certain information from libc, information like the address of a thread-local variable

On the surface this seems like a great idea: don’t bake the implementation details of the C library into the debugger.  Instead, let the debugger use a debugging library that comes with the C library.  And, if you designed it that way, it would be a good idea.

Sadly, though, libthread_db was not designed that way.  Oh no.

For example, libthread_db has a callback interface.  The calling program — gdb or rr — must provide some functions that libthread_db can call, to do some simple things like “read some memory”; or some very complicated things like “find the address of a symbol given its name”.  Normal C programmers might implement these callbacks using a structure containing function pointers.  But not libthread_db!  Instead it uses fixed symbol names that must be provided by the calling application.  Not all of these are required for it to work (you get to figure out which, yay!), but some definitely are.  And, you have to dlopen a libthread_db that matches the libc of the inferior that you’re debugging (or link against it, but that’s also obviously bad).

Wait, you say.  Doesn’t that mess up cross-debugging?  Why yes!  Yes it does!  Which is why qGetTLSAddr has to be in the gdb remote serial protocol to start with.

Hey, maybe the Linux vendors should fix this.  They are — see Gary Benson’s Infinity project — but unfortunately that’s still in development and I wanted RR to work sooner.

Ok, so whew.  I wrote qGetTLSAddr support for RR.  This was a small patch in the end, but an unusual pain in an already painful series.  Hopefully this won’t spill out into other programs.

glibc

Hahaha, you are so funny.  Of course it spills out: remember how you have to define a bunch of functions with specific names in your program in order to use libthread_db?  Well, how do you know you got the types correct?

Yeah, you include <proc_service.h> (a name deliberately chosen to confuse, I suppose, why not, it doesn’t bear any obvious relationship to the library).  Only, that was never installed by glibc.  Instead, gdb just copied it into the source tree.

So naturally I went and fixed this in glibc.  And, even more naturally, this broke the gdb build, which was autoconf’d to check for a file that never existed in the past.  LOL.

Thank You Cthulhu

At this point I figured it was only a matter of time until I had to patch the kernel.  Thankfully this hasn’t been necessary yet.

It Says What

In gdb the actual unwinding and the display of frames are separate concerns.

And let me digress here to say that gdb’s unwinder design is excellent.  I believe it was redone by Andrew Cagney (this was well before my active time in gdb, so apologies if you’re reading this and you did it and I’ve misattributed it).  Like much of gdb, many of the details are bizarre and take one back to the byte-counting days of 1987; but the high level design is very solid and has endured with, I think, just one significant change (to support inline functions) in the intervening 15 or so years.  I’ve long thought that this is a remarkable accomplishment in the programming world.

So, yes.  It’s not enough to just unwind.  Simply having an unwinder yields backtraces with lines like:

#5 0xfeefee ???

Better than nothing!  But not yet great.

The second part of the SpiderMonkey unwinder is, therefore, a gdb “frame filter”.  This is an object that takes raw frames and decorates them with information like a function name, or a file name, or arguments.

Work to add this information is ongoing — I landed one patch just yesterday, and another one, to add more information about interpreted frames, is still in the works.  And there are two more bugs filed… maybe this project, like this blog post, will never conclude.  It will just scroll endlessly.

But now, with all the code in place, bt can show something like:

#6 0x00007ffff7ff20f3 in <<JitFrame_BaselineJS "f1">> (this=JSVAL_VOID, arg1=$jsval(4700))

This is the call f1(4700).

Let’s Just Have One More

Of course we still couldn’t enable this unwinder by default.  You have to enable it by hand.

And by the way, in the first release of gdb’s Python unwinder feature, enabling or disabling an unwinder didn’t flush the frame cache, so it wouldn’t actually take effect until some invisible-to-the-user state change took place.  I fixed this bug, but here Pedro Alves also taught me the secret gdb command flushregs, which in fact just flushes the frame cache. (I’m going to go out on a limb and guess that this command predates the already ancient maint prefix command, hence its weird name.)

Anyway, you have to enable it by hand because the unwinder itself doesn’t work properly if the outermost frame is in JIT code.  The JIT, in the interest of performance, doesn’t maintain a frame pointer.  This means that in the outermost frame, there’s no reliable way to find the object that describes this frame and links to the previous frame.

Now, normally in this case gdb would either resort to debug info (not available here), or in extremis its encyclopedic suite of prologue analyzers (yes, gdb can analyze common function prologues for all architectures developed in the last 25 years to figure out stuff) — but naturally JIT compilers go their own way here as well.

Humans, like Shu back at the start of this story, can do this by dumping parts of the stack and guessing which bytes represent the frame header.

But, I’ve been reluctant and a bit afraid to hack a heuristic into the unwinder.

To sum up — in case you missed it — this means that all the code written during this entire saga would still not have helped with my original bug.

The End

The Deletion of gcj

I originally posted this on G+ but I thought maybe I should expand it a little and archive it here.

The patch to delete gcj went in recently.

When I was put on the gcj project at Cygnus, I remember thinking that Java was just a fad and that this was just a temporary thing for me. I wasn’t that interested in it. Then I ended up working on it for 10 years.

In some ways it was the high point of my career.

Socially it was fantastic, especially once we merged with the Classpath community — I’ve always considered Mark Wielaard’s leadership in that community as the thing that made it so great.  I worked with and met many great people while working on gcj and Classpath, but I especially wanted to mention Andrew Haley, who is the single best debugger I’ve ever met, and who stayed in the Java world, now working on OpenJDK.

We also did some cool technical things in gcj. The binary compatibility ABI was great, and the split verifier was very fun to come up with.  Per Bothner’s early vision for gcj drove us for quite a while, long after he left Cygnus and stopped working on it.

On the downside, gcj was never quite up to spec with Java. I’ve met Java developers even as recently as last year who harbor a grudge against gcj.

I don’t apologize for that, though. We were trying something difficult: to make a free Java with a relatively small team.

When OpenJDK came out, the Sun folks at FOSDEM were very nice to say that gcj had influenced the opening of the JDK. Now, I never truly believed this — I’m doubtful that Sun ever felt any heat from our ragtag operation — but it was very gracious of them to say so.

Since the gcj days I’ve been searching for basically the same combination that kept me hacking on gcj all those years: cool technology, great social environment, and a worthwhile mission.

This turned out to be harder than I expected. I’m still searching. I never thought it was possible to go back, though, and with this deletion, this is clearer than ever.

There’s a joy in deleting code (though in this case I didn’t get to do the deletion… grrr); but mainly this weekend I’m feeling sad about the final close of this chapter of my life.

GDB Preattach

In firefox development, it’s normal to do most development tasks via the mach command. Build? Use mach. Update UUIDs? Use mach. Run tests? Use mach. Debug tests? Yes, mach mochitest --debugger gdb.

Now, normally I run gdb inside emacs, of course. But this is hard to do when I’m also using mach to set up the environment and invoke gdb.

This is really an Emacs bug. GUD, the Emacs interface to all kinds of debuggers, is written as its own mode, but there’s no really great reason for this. It would be way cooler to have an adaptive shell mode, where running the debugger in the shell would magically change the shell-ish buffer into a gud-ish buffer. And somebody — probably you! — should work on this.

But anyway this is hard and I am lazy. Well, sort of lazy and when I’m not lazy, also unfocused, since I came up with three other approaches to the basic problem. Trying stuff out and all. And these are even the principled ways, not crazy stuff like screenify.

Oh right, the basic problem.  The basic problem with running gdb from mach is that then you’re just stuck in the terminal. And unless you dig the TUI, which I don’t, terminal gdb is not that great to use.

One of the ideas, in fact the one this post is about, since this post isn’t about the one that I couldn’t get to work, or the one that is also pretty cool but that I’m not ready to talk about, was: hey, can’t I just attach gdb to the test firefox? Well, no, of course not, the test program runs too fast (sometimes) and racing to attach is no fun. What would be great is to be able to pre-attach — tell gdb to attach to the next instance of a given program.

This requires kernel support. Once upon a time there were some gdb and kernel patches (search for “global breakpoints”) to do this, but they were never merged. Though hmm! I can do some fun kernel stuff with SystemTap…

Specifically what I did was write a small SystemTap script to look for a specific exec, then deliver a SIGSTOP to the process. Then the script prints the PID of the process. On the gdb side, there’s a new command written in Python that invokes the SystemTap script, reads the PID, and invokes attach. It’s a bit hacky and a bit weird to use (the SIGSTOP appears in gdb to have been delivered multiple times or something like that). But it works!

It would be better to have this functionality directly in the kernel. Somebody — probably you! — should write this. But meanwhile my hack is available, along with a few other gdb scxripts, in my gdb helpers github repository.

Emacs hint for Firefox hacking

I started hacking on firefox recently. And, of course, I’ve configured emacs a bit to make hacking on it more pleasant.

The first thing I did was create a .dir-locals.el file with some customizations. Most of the tree has local variable settings in the source files — but some are missing and it is useful to set some globally. (Whether they are universally correct is another matter…)

Also, I like to use bug-reference-url-mode. What this does is automatically highlight references to bugs in the source code. That is, if you see “bug #1050501”, it will be buttonized and you can click (or C-RET) and open the bug in the browser. (The default regexp doesn’t capture quite enough references so my settings hack this too; but I filed an Emacs bug for it.)

I put my .dir-locals.el just above my git checkout, so I don’t end up deleting it by mistake. It should probably just go directly in-tree, but I haven’t tried to do that yet. Here’s that code:

(
 ;; Generic settings.
 (nil .
      ;; See C-h f bug-reference-prog-mode, e.g, for using this.
      ((bug-reference-url-format . "https://bugzilla.mozilla.org/show_bug.cgi?id=%s")
       (bug-reference-bug-regexp . "\\([Bb]ug ?#?\\|[Pp]atch ?#\\|RFE ?#\\|PR [a-z-+]+/\\)\\([0-9]+\\(?:#[0-9]+\\)?\\)")))

 ;; The built-in javascript mode.
 (js-mode .
     ((indent-tabs-mode . nil)
      (js-indent-level . 2)))

 (c++-mode .
	   ((indent-tabs-mode . nil)
	    (c-basic-offset . 2)))

 (idl-mode .
	   ((indent-tabs-mode . nil)
	    (c-basic-offset . 2)))

)

In programming modes I enable bug-reference-prog-mode. This enables highlighting only in comments and strings. This would easily be done from prog-mode-hook, but I made my choice of minor modes depend on the major mode via find-file-hook.

I’ve also found that it is nice to enable this minor mode in diff-mode and log-view-mode. This way you get bug references in diffs and when viewing git logs. The code ends up like:

(defun tromey-maybe-enable-bug-url-mode ()
  (and (boundp 'bug-reference-url-format)
       (stringp bug-reference-url-format)
       (if (or (derived-mode-p 'prog-mode)
	       (eq major-mode 'tcl-mode)	;emacs 23 bug
	       (eq major-mode 'makefile-mode)) ;emacs 23 bug
	   (bug-reference-prog-mode t)
	 (bug-reference-mode t))))

(add-hook 'find-file-hook #'tromey-maybe-enable-bug-url-mode)
(add-hook 'log-view-mode-hook #'tromey-maybe-enable-bug-url-mode)
(add-hook 'diff-mode-hook #'tromey-maybe-enable-bug-url-mode)

Emacs Modules

I’ve been working on an odd Emacs package recently — not ready for release — which has turned into more than the usual morass of prefixed names and double hyphens.

So, I took another look at Nic Ferrier’s namespace proposal.

Suddenly it didn’t seem all that hard to implement something along these lines, and after a bit of poking around I wrote emacs-module.

The basic idea is to continue to follow the Emacs approach of prefixing symbol names — but not to require you to actually write out the full names of everything.  Instead, the module system intercepts load and friends to rewrite symbol names as lisp is loaded.

The symbol renaming is done in a simple way, following existing Emacs conventions.  This gives the nice result that existing code doesn’t need to be updated to use the module system directly.  That is, the module system recognizes name prefixes as “implicit” modules, based purely on the module name.

I’d say this is still a proof-of-concept.  I haven’t tried hairier cases, like defclass, and at least declare-function does not work but should.

Here’s the example from the docs:

(define-module testmodule :export (somevar))
(defvar somevar nil)
(defvar private nil)
(provide 'testmodule)

This defines the public variable testmodule-somevar and the “private” function testmodule--private.

Emacs verus notification area, again

Ages and ages I wrote about letting Emacs code access the notification area.  I have more to say about it now, but first I want to bore you with some rambling thoughts and some history.

The “notification area” is also called the “status icon area” or the “systray” — it is a spot that holds some icons that are under control of various applications.

I was a fan of the notification area since it first showed up in Gnome.  I recognized it instantly as the thing I wanted that I hadn’t realized I wanted.

Now, as you know, the notification area has fallen on hard times.  It’s been removed in Gnome 3… I searched a bit for the rationale for this deletion, which as far as I can tell is just that some applications abused it, whatever that means; or that it was used inconsistently, which I think the web has conclusively proven is fine by users.  Coming from the Emacs perspective, where one can customize the somewhat-equivalent of the status area (see those recent posts on diminishing minor-mode lighters in the mode line…), and where a certain amount of per-mode idiosyncrasy is the norm, these seem like an inadequate reasons.

However, the reason doesn’t really matter.  I love the notification area!  When I moved more of my daily desktop use back into Emacs (the tides are strong but slow, and take years to come in or go out), I hooked Emacs up to it, and made it a part of my basic configuration.

It’s indispensable now.  What I particularly like about it is that it is both noticeable and unobtrusive — the former because I can have the icons blink on important events, and the latter because the icons don’t move around or obscure other windows.

Ok!  You should use it!  And I totally plan to tell you how, but first some boring history.

My original post relied on a hacked version of the Gnome zenity utility.  This turned out to be a real pain over time.  I had to rebuild it periodically, adding hacks (once removing chunks), etc.  Sharing it with others was hard.  And, for whatever reason, the patches in Gnome bugzilla were completely ignored.  Bah.

A bit later I wrote a big patch to Emacs to put all this into the core.  That patch was rejected, more or less.  Bah two.

Then even later I flirted with KDE for a bit.  Yes.  KDE had the nice idea to expose the notification area via dbus, and Emacs could talk dbus… so I did the obvious thing in elisp.  However the KDE notification area was pretty buggy and in the end I had to abandon it as well.

So, it was back to zenity… until this week, during my funemployment.  I rewrote my hacks in Python.  This was so easy I wish I’d done it years and years ago.

I’m not sure what the moral of this story is.  Maybe that my obsession is your gain.  Or maybe that I have trouble letting go.

Anyway, the result is here, on github, or in marmalade.  You’ll need Python and the new (introspection-based) Python Gtk interfaces.  This of course is no trouble to install.  The package includes the base status icon API, plus basic UIs for ERC and EMMS.  Try it out and let me know what you think.

Another Mode Line Hack

While streamlining my mode line, I wrote another little mode-line feature that I thought of ages ago — using the background of the mode-line to indicate the current position in the buffer. I didn’t like this enough to use it, but I thought I’d post it since it was a fun hack.

First, make sure the current mode line is kept:

(defvar tromey-real-mode-line-format mode-line-format)

Now, make a little function that format the mode line using the standard rules and then applies a property depending on the current position in the buffer:

(defun tromey-compute-mode-line ()
  (let* ((width (frame-width))
     (line (substring 
        (concat (format-mode-line tromey-real-mode-line-format)
            (make-string width ? ))
        0 width)))
    ;; Quote "%"s.
    (setq line
      (mapconcat (lambda (c)
               (if (eq c ?%)
               "%%"
             ;; It's absurd that we must wrap this.
             (make-string 1 c)))
             line ""))

    (let ((start (window-start))
      (end (or (window-end) (point))))
      (add-face-text-property (round (* (/ (float start)
                       (point-max))
                    (length line)))
                  (round (* (/ (float end)
                       (point-max))
                    (length line)))
                  'region nil line))
    line))

We have to do this funny wrapping and “%”-quoting business here because the :eval form returns a mode line format — not just text — and because the otherwise appealing :propertize form doesn’t allow computations.

Also, I’ve never understood why mapconcat can’t handle a character result from the map function.  Anybody?

Now set this to be the mode line:

(setq-default mode-line-format '((:eval (tromey-compute-mode-line))))

The function above changes the background of the mode line corresponding to the current window’s start and end positions.  So, for example, here we are in the middle of a buffer that is bigger than the window:

Screenshot - 08222014 - 12:52:19 PM

I left this on for a bit but found it too distracting.  If you like it, use it. You might like to remove the mode-line-position stuff from the mode line, as it seems redundant with the visual display.

Streamlined Mode Line

The default mode line looks like this:

Screenshot - 08112014 - 01:57:07 PM

At least, it looks sort of like this if you ignore the lamenesses in the screenshot. If you’re like me you probably don’t remember what all these things mean, at least not without looking them up.  What’s that “U”?  Why the “:” or why three hyphens?

At a local Emacs meetup with Damon Haley and Greg Pfeil, Greg mentioned that he’d done some experiments on using unicode characters in his mode-line.  I decided to give it a try.

I took a good look at the above.  I rarely use any of it — I normally don’t care about the coding system or the line ending style.  I can’t remember the last time I had a buffer that was both read-only and modified.  The VC information, when it appears, is generally too verbose and doesn’t show me the one thing I need to know (see below).  And, though I do like to see the name of the major mode, I don’t really need to see most minor mode names; furthermore I like to have a bit of extra space so that I can use other modes that display information that I do want to see in the mode line.

What’s that VC thing?  Well, ordinarily you may see something like Git-master in the mode line. But, usually I already know the version control system being used — or even if I don’t know, I probably don’t care if I am using VC. By default the branch name is in there too. This can be quite long and seems to get stale when I switch branches; and anyway because I do a lot of work via vc-dir, I don’t really need this in every buffer anyway.

However, what is missing is that the mode-line won’t tell me if a buffer should be registered with version control but is not.  This is a pretty common source of errors.

So, first the code to deal with the VC state.  We need a bit more code than you might think, because the information we need isn’t already computed, and my tries to compute it while updating the mode line caused weird behavior.  Our rule for “should be registered” is “a VC back end claims this file, but the file isn’t actually registered”:

(defvar tromey-vc-mode nil)
(make-variable-buffer-local 'tromey-vc-mode)

(require 'vc)
(defun tromey-vc-command-hook (&rest args)
  (let ((file-name (buffer-file-name)))
    (setq tromey-vc-mode (and file-name
                  (not (vc-registered file-name))
                  (ignore-errors
                (vc-responsible-backend file-name))))))

(add-hook 'vc-post-command-functions #'tromey-vc-command-hook)
(add-hook 'find-file-hook #'tromey-vc-command-hook)

(defun tromey-vc-info ()
  (if tromey-vc-mode
      (propertize (string #x26c3 32) 'face 'error)
    " "))

We’ll use that final function in the mode line. Note the odd character in there — my choice was U+26C3 (BLACK DRAUGHTS KING), since I thought it looked disk-drive-like — but you can easily replace it with something else. (Also note the weirdness of using string rather than a string constant. This is just for WordPress’ benefit as its editor kept mangling the actual character.)

To deal with minor modes, I used diminish. This made it easy to remove any display of some modes that I don’t care to know about, and replace the name of some others with a single character:

(require 'diminish)
(diminish 'abbrev-mode)
(diminish 'projectile-mode)
(diminish 'eldoc-mode)
(diminish 'flyspell-mode (string 32 #x2708))
(diminish 'auto-fill-function (string 32 #xa7))
(diminish 'isearch-mode (string 32 #x279c))

Here flyspell is U+2708 (AIRPLANE), auto-fill is U+00A7 (SECTION SIGN), and isearch is U+279C (HEAVY ROUND-TIPPED RIGHTWARDS ARROW).  Haha, Unicode names.

I wanted to try out which-func-mode, now that I had extra space on the mode line, so:

(setq which-func-unknown "")
(which-function-mode)

Finally, we can use all the above and remove some other things from the mode line at the same time:

(setq-default mode-line-format
	      '("%e"
		(:eval (if (buffer-modified-p)
			   (propertize (string #x21a7) 'face 'error)
			 " "))
		(:eval (tromey-vc-info))
		" " mode-line-buffer-identification
		"   " mode-line-position
		"  " mode-line-modes
		mode-line-misc-info))

The “modified” character in there is U+21A7 (DOWNWARDS ARROW FROM BAR).

Here’s how it looks normally (another badly cropped screenshot):

Screenshot - 08112014 - 08:42:33 PM

Here’s how it looks when the file is not registered with the version control system:

Screenshot - 08112014 - 08:43:04 PM

And here’s how it looks when the file is also modified:

Screenshot - 08112014 - 08:43:39 PM

Occasionally I run into some other minor mode I want to diminish, but this is easily done by editing it into my .emacs and evaluating it for immediate effect.

import gdb

Occasionally I see questions about how to import gdb from the ordinary Python interpreter.  This turns out to be surprisingly easy to implement.

First, a detour into PIE and symbol visibility.

“PIE” stands for “Position Independent Executable”.  It uses essentially the same approach as a shared library, except it can be applied to the executable.  You can easily build a PIE by compiling the objects with the -fPIE flag, and then linking the resulting executable with -pie.  Normally PIEs are used as a security feature, but in our case we’re going to compile gdb this way so we can have Python dlopen it, following the usual Python approach: we install it as _gdb.so and add a a module initialization function, init_gdb. (We actually name the module “_gdb“, because that is what the gdb C code creates; the “gdb” module itself is already plain Python that happens to “import _gdb“.)

Why install the PIE rather than make a true shared library?  It is just more convenient — it doesn’t require a lot of configure and Makefile hacking, and it doesn’t slow down the build by forcing us to link gdb against a new library.

Next, what about all those functions in gdb?  There are thousands of them… won’t they possibly cause conflicts at dlopen time?  Why yes… but that’s why we have symbol visibility.  Symbol visibility is an ELF feature that lets us hide all of gdb’s symbols from any dlopen caller.  In fact, I found out during this process that you can even hide main, as ld.so seems to ignore visibility bits for this function.

Making this work is as simple as adding -fvisibility=hidden to our CFLAGS, and then marking our Python module initialization function with __attribute__((visibility("default"))).  Two notes here.  First, it’s odd that “default” means “public”; just one of those mysterious details.  Second, Python’s PyMODINIT_FUNC macro ought to do this already, but it doesn’t; there’s a Python bug.

Those are the low-level mechanics.  At this point gdb is a library, albeit an unusual one that has a single entry point.  After this I needed a few tweaks to gdb’s startup process in order to make it work smoothly.  This too was no big deal.  Now I can write scripts from Python to do gdb things:

#!/usr/bin/python
import gdb
gdb.execute('file ./install/bin/gdb')
print 'sizeof = %d' % gdb.lookup_type('struct minimal_symbol').sizeof

Then:

$ python zz.py
72

Soon I’ll polish all the patches and submit this upstream.

Quick Multi-process Debugging Update

In my last post I mentioned that setting breakpoints is a pain when debugging multiple processes in GDB. While there are some bugs here (we’re actively working on them), it isn’t hard to make the basic case work.  In fact, there’s nothing to it.  Some background…

Starting with GDB 7.4, we changed how basic breakpoint specifiers (called “linespecs”) work.  Previously, a linespec applied somewhat randomly to the first matching symbol found in your code.  This behavior probably made sense in 1989, when all you had were statically linked executables; but nowadays it is much more common to have dozens of shared libraries, with the attendant name clashes.

So, instead of having GDB guess which symbol you meant, now a breakpoint just applies to all of them.  Our idea is that we’ll start supplying ways to narrow down exactly which spots you meant to name, say by adding syntax like “break libwhatever.so:function“, or whatever.

Anyway, this new work also applies across inferiors.  Here’s an example of debugging “make“, then setting a breakpoint on a function in libcpp (which itself is linked into a sub-process of gcc):

(gdb) b _cpp_lex_direct
Function "_cpp_lex_direct" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_cpp_lex_direct) pending.
(gdb) run
Starting program: /usr/bin/make
gcc -g -o crasher crasher.c
[New inferior 8761]
[New process 8761]
process 8761 is executing new program: /usr/bin/gcc
[New inferior 8762]
[New process 8762]
process 8762 is executing new program: /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1

Breakpoint 1, 0x0000000000b156a0 in _cpp_lex_direct ()

The remaining issues have to do with breakpoint re-setting not doing the right thing with running inferiors. This causes some scary warnings when running, but I think for the time being you can just ignore those.

Well, I should say those are the known issues.  This feature hasn’t had as much use as I would like (judging from the low bug rate — I can’t tell if that is a good insight or a horrible realization).  So, try it out and report problems to GDB Bugzilla.  We’ll be making it work for you.