Archive for the ‘software’ Category

Emacs and Threading, Take 2

I’ve recanted. Contrary to my earlier post on this topic, I now think implementing threading in Emacs is possible. A patch from Giuseppe Scrivano inspired me, and I started my own patch to do it.

This was sort of fun. I wrote a batch script in elisp to rewrite some of the Emacs sources — yay semantic patching!

Thanks to Giuseppe, this is now hosted on Gitorious. We’re both working there, on different branches, merging code back and forth. I’ve mostly been working on variable bindings, and he’s very active, both with low-level changes and cool things getting Gnus to work in a separate thread.

If you’re interested in helping out, we discuss it on emacs-devel, but really we’d welcome any sort of contact.

Emacs 23

Much to my surprise, the Fedora Emacs maintainers pushed Emacs 23 into the (ostensibly stable) Fedora 11 repository.  I was a bit afraid to upgrade, since Emacs really is the cornerstone of my entire workflow.  My desire for new features quickly overcame my fear, though.

The first thing you will notice is that Emacs is much prettier.  It now uses XFT to render, so you get antialiasing.  For normal work, I don’t really care much, but this is why I used CVS Emacs last year for presentations: it makes a huge difference in situations where prettiness matters.  Unfortunately this seems to have negatively affected redisplay performance.

Another major feature I have been loving is support for multiple terminals.  I use this in two ways.

I run my Emacs on my main machine, of course.  This is the centerpiece of my desktop: I use it for hacking, for mail and news, and for irc.  Previously, if I used my laptop, I couldn’t easily access all this state; but now I can ssh to my main machine, run emacsclient -t, and have access to everything.

I’ve also set EDITOR to emacsclient -t.  This means that when I run git commit in a shell, the commit message shows up in a new emacs frame on that terminal.  This is very convenient for “quickie” edits, because it means not having to switch my focus. (If I had to pick a single reason that Emacs improves my productivity, this would be it: it makes it very easy to keep one’s focus.)

Funnily, though, I don’t actually run git commit in a shell very often any more, because the new vc-dir mode is good enough that I can do some common git operations without leaving Emacs.  If you tried VC in earlier versions of Emacs, then you probably remember it as a horrible joke — it worked fine for RCS, but was miserable at anything else.  vc-dir is something like a generalized pcl-cvs, so you can work on a whole directory tree at once (and do so efficiently, unlike the old vc-dired).  vc-dir is still pretty new, and there are some necessary operations that aren’t exposed (git push), but it is still a very nice step forward.

This release is definitely worth upgrading to.

Wish List Item

I’ve been trying for a while to figure out how best to read blogs.

Right now I use three different methods — I use iGoogle for some things, plain old web browsing for some, and then gnus for one feed.  What a pain!  I’ve also tried other readers in the past — a couple web-based one, Azureus, maybe something else.

None of these are ideal for me.  I think what I would really like is to use Gnus for everything, except Gnus blocks annoyingly while fetching the feeds.  So, I could use nntp//rss.  But then I am setting up and configuring yet another program, setting it up to run when I log in, forgetting to copy its configuration to my laptop, etc.

I wish there were “gmane for rss” — a site that ran nttp//rss for me and let me subscribe to any old feed using my news reader.  Anybody know of one?

Wait!  I have other complaints too!  I’ll save those for later… I’m turning into the sort of person who wishes RSS were NNTP and that Common Lisp were popular again.  What is happening to me?!?

Gold

I finally got around to trying gold (the new linker) for real.  Today I built it and tried linking gdb.  Sure enough, it is a lot faster — it was more than twice as fast at linking gdb as the F9 system linker.  The link went from 48 seconds to 22 seconds.

It is very simple to build and use.  Just check out binutils and configure with --enable-gold.  Then build and install it, put it in your path, and you’re done.

11. The End

We’ve covered many of the features of python-gdb:

  • Writing new commands
  • Convenience functions
  • Pretty-printing
  • Auto-loading of Python code
  • Scripting gdb from Python
  • Bringing up a GUI

In fact, that is probably all of the user-visible things right now.  There are classes and methods in the Python API to gdb that we have not covered, but you can read about those when you need to use them.

What next?  There are a few things to do.  There are probably bugs.  As we saw in some earlier sections, support for I/O redirection is not there.  We need better code for tracking the inferior’s state.  Barring the unexpected, all this will be done in the coming months.

Now is an exciting time to be working on gdb.  There are a number of very interesting projects underway:

  • Reversible debugging is being developed.  The idea here is that gdb can record what your program does, and then you can step backward in time to find the bug.
  • Sérgio Durigan Júnior, at IBM, has been working on syscall tracing support.  This will let us do strace-like tracing in gdb.  What’s nice about this is that all the usual gdb facilities will also be available: think of it as a Python-enabled strace, with stack dump capability.
  • The excellent folks at Code Sourcery (I would name names, but I’m afraid of leaving someone out) are working on multi-process support for gdb.  This is the feature I am most looking forward to.  In the foreseeable future, gdb will be able to trace both the parent and the child of a fork.  The particular “wow” use-case is something I read on the frysk web site: run “make check” in gdb, and have the CLI fire up whenever any program SEGVs.  No more futzing with setting up the debug environment!  In fact, no more figuring out how to get past libtool wrapper scripts — we could add a little hack so that you can just run them in gdb and the right thing will happen.

Naturally, we’ll be wiring all this up to Python, one way or another.

I’ve also got some longer-term plans for the Python support.  I’m very interested in extending gdb to debug interpreted languages.  As with most computer problems, this means inserting a layer of indirection in a number of places: into expression parsing, into symbol lookup, into breakpoints, into watchpoints, etc.  The goal here is to be able to write support for, say, debugging Python scripts, as a Python extension to gdb.  Then, users could switch back and forth between “raw” (debugging the C implementation) and “cooked” (debugging their script) views easily.

I have two basic models I use when thinking about python-gdb: valgrind and emacs.

Emacs is a great example of managing the split between the core implementation and scripts.  Emacs developers prefer to write in elisp when possible; the core exists, more or less, to make this possible for a wide range of uses.  I’m trying to steer gdb in this direction.  That is, push Python hooks into all the interesting places in gdb, and then start preferring Python over C.  (Mozilla might have been another good example here — but I am more familiar with Emacs.)

Naturally, we’ll pursue this with extraordinary wisdom and care.  Cough cough.  Seriously, there are many areas of gdb which are not especially performance sensitive.  For example, consider the new commands we wrote during this series.  Even support for a new language would not require anything that could not be comfortably — and excellently — done in Python.

Valgrind taught me the Field of Dreams model: even a fairly esoteric area of programming can attract a development community, provided that you build the needed infrastructure.  In other words, just look at all those cool valgrind skins.  This library orientation, by the way, is something I would like to see GCC pursue more vigorously.

I’m very interested to hear your feedback.  Feel free to post comments here, or drop us a line on the Archer list.

We’ve come to the end of this series of posts.  I’m sad to see it end, but now it is time to stop writing about python-gdb features, and to go back to writing the features themselves.  I’ll write more when there is more to be said.

10. Wacky stuff

Last time I promised something flashy in this post.  What could be flashier than a GUI?

Here’s some code to get you started:

from threading import Thread
import gtk

def printit ():
    print "Hello hacker"

class TestGtkThread (Thread):
    def destroy (self, *args):
        self.window.hide()

    def hello (self, *args):
        gdb.post_event (printit)

    def run (self):
        gtk.gdk.threads_init()

        self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)
        self.window.connect("destroy", self.destroy)
        self.window.set_border_width(10)

        button = gtk.Button("Hello World")
        # connects the 'hello' function to the clicked signal from the button
        button.connect("clicked", self.hello)
        self.window.add(button)
        button.show()

        self.window.show_all()
        gtk.main()

class TestGtk (gdb.Command):
    def __init__ (self):
        super (TestGtk, self).__init__ ("testgtk", gdb.COMMAND_NONE,
                                         gdb.COMPLETE_NONE)
        self.init = False

    def invoke (self, arg, from_tty):
        self.dont_repeat()
        if not self.init:
            self.init = True
            v = TestGtkThread()
            v.setDaemon (True)
            v.start ()

TestGtk()

Note that we finesse the problem of main loop integration by simply starting a separate thread.  My thinking here is to just use message passing: keep gdb operations in the gdb thread, and gtk operations in the GUI thread, and send active objects back and forth as needed to do work.  The function gdb.post_event (git pull to get this) arranges to run a function during the gdb event loop; I haven’t really investigated sending events the other direction.

The above isn’t actually useful — in fact it is just a simple transcription of a python-gtk demo I found somewhere in /usr/share.  However, the point is that the addition of Python cracks gdb open: now you can combine gdb’s inferior-inspection capabilities with Python’s vast suite of libraries.  You aren’t tied to the capabilities of a given gdb GUI; you can write custom visualizers, auto-load them or load them on demand, and use them in parallel with the CLI.  If your GUI provides a CLI, you can do this without any hacks there at all; for example, this kind of thing works great from inside Emacs.

The next post is the final one in this series, I’m sorry to say.

9. Scripting gdb

So far we’ve concentrated on way to use Python to extend gdb: writing new commands, writing new functions, and customized pretty-printing.  In this post I want to look at gdb from a different angle: as a library.  I’ve long thought it would be pretty useful to be able to use gdb as a kind of scriptable tool for messing around with running programs, or even just symbol tables and debug info; the Python work enables this.

One word of warning before we begin: we’re starting to get into the work-in-progress parts of python-gdb.  If you play around here, don’t be surprised if it is not very polished.  And, as always, we’re interested in your feedback; drop us a line on the Archer list.

For historical and technical reasons, it is pretty hard to turn gdb into an actual loadable Python library.  This might be nice to do someday; meanwhile we’ve made it possible to invoke gdb as an interpreter: add the “-P” (or “--python“) option.  Anything after this option will be passed to Python as sys.argv.  For example, try this script:

#!/home/YOURNAME/archer/install/bin/gdb -P
print "hello from python"

Ok… so far so good.  Now what?  How about a little app to print the size of a type?

#!/home/YOURNAME/archer/install/bin/gdb -P
import sys
import gdb
gdb.execute("file " + sys.argv[1])
type = gdb.Type (sys.argv[0])
print "sizeof %s = %d" % (sys.argv[0], type.sizeof ())

You can script that with gdb today, though the invocation is uglier unless you write a wrapper script.  More complicated examples are undeniably better.  For instance, you can write a “pahole” clone in Python without much effort.

That invocation of gdb.execute is a bit ugly.  In the near future (I was going to do it last week, but I got sick) we are going to add a new class to represent the process (and eventually processes) being debugged.  This class will also expose some events related to the state of the process — e.g., an event will be sent when the process stops due to a signal.

The other unfinished piece in this area is nicer I/O control.  The idea here is to defer gdb acquiring the tty until it is really needed.  With these two pieces, you could run gdb invisibly in a pipeline and have it bring up the CLI only if something goes wrong.

It will look something like:

#!/home/YOURNAME/archer/install/bin/gdb -P
import sys
import gdb

def on_stop(p):
  (status, value) = p.status
  if status != gdb.EXIT:
    gdb.cli ()
  else:
    sys.exit (value)

process = gdb.Inferior(sys.argv)
process.connect ("stop", on_stop)
process.run ()

I’ll probably use python-gobject-like connect calls, unless Python experts speak up and say I should do something different.

The next post will cover a flashier use of Python in gdb.  Stay tuned.

8. Pretty printing, Part 2

In the previous entry we covered the basics of pretty-printing: how printers are found, the use of the to_string method to customize display of a value, and the usefulness of autoloading.  This is sufficient for simple objects, but there are a few additions which are helpful with more complex data types.  This post will explain the other printer methods used by gdb, and will explain how pretty-printing interacts with MI, the gdb machine interface.

Python-gdb’s internal model is that a value can be printed in two parts: its immediate value, and its children.  The immediate value is whatever is returned by the to_string method.  Children are any sub-objects associated with the current object; for instance, a structure’s children would be its fields, while an array’s children would be its elements.

When pretty-printing from the CLI, gdb will call a printer’s “children” method to fetch a list of children, which it will then print.  This method can return any iterable object which, when iterated over, returns pairs. The first item in the pair is the “name” of the child, which gdb might print to give the user some help, and the second item in the pair is a value. This value can be be a string, or a Python value, or an instance of gdb.Value.

Notice how “pretty-printers” don’t actually print anything?  Funny.  The reason for this is to separate the printing logic from the data-structure-dissection logic.  This way, we can easily implement support for gdb options like “set print pretty” (which itself has nothing to do with this style of pretty-printing — sigh. Maybe we need a new name) or “set print elements“, or even add new print-style options, without having to modify every printer object in existence.

Gdb tries to be smart about how it iterates over the children returned by the children method.  If your data structure potentially has many children, you should write an iterator which computes them lazily.  This way, only the children which will actually be printed will be computed.

There’s one more method that a pretty-printer can provide: display_hint.  This method can return a string that gives gdb (or the MI user, see below) a hint as to how to display this object.  Right now the only recognizedd hint is “map”, which means that the children represent a map-like data structure.  In this case, gdb will assume that the elements of children alternate between keys and values, and will print appropriately.

We’ll probably define a couple more hint types.  I’ve been thinking about “array” and maybe “string”; I assume we’ll find we want more in the future.

Here’s a real-life printer showing the new features.  It prints a C++ map, specifically a std::tr1::unordered_map.  Please excuse the length — it is real code, printing a complex data structure, so there’s a bit to it.  Note that we define a generic iterator for the libstdc++ hash table implementation — this is for reuse in other printers.

import gdb
import itertools

class Tr1HashtableIterator:
    def __init__ (self, hash):
        self.count = 0
        self.n_buckets = hash['_M_bucket_count']
        if self.n_buckets == 0:
            self.node = False
        else:
            self.bucket = hash['_M_buckets']
            self.node = self.bucket[0]
            self.update ()

    def __iter__ (self):
        return self

    def update (self):
        # If we advanced off the end of the chain, move to the next
        # bucket.
        while self.node == 0:
            self.bucket = self.bucket + 1
            self.node = self.bucket[0]
            self.count = self.count + 1
            # If we advanced off the end of the bucket array, then
            # we're done.
            if self.count == self.n_buckets:
                self.node = False

    def next (self):
        if not self.node:
            raise StopIteration
        result = self.node.dereference()['_M_v']
        self.node = self.node.dereference()['_M_next']
        self.update ()
        return result

class Tr1UnorderedMapPrinter:
    "Print a tr1::unordered_map"

    def __init__ (self, typename, val):
        self.typename = typename
        self.val = val

    def to_string (self):
        return '%s with %d elements' % (self.typename, self.val['_M_element_count'])

    @staticmethod
    def flatten (list):
        for elt in list:
            for i in elt:
                yield i

    @staticmethod
    def format_one (elt):
        return (elt['first'], elt['second'])

    @staticmethod
    def format_count (i):
        return '[%d]' % i

    def children (self):
        counter = itertools.imap (self.format_count, itertools.count())
        # Map over the hash table and flatten the result.
        data = self.flatten (itertools.imap (self.format_one, Tr1HashtableIterator (self.val)))
        # Zip the two iterators together.
        return itertools.izip (counter, data)

    def display_hint (self):
        return 'map'

If you plan to write lazy children methods like this, I recommend reading up on the itertools package.

Here’s how a map looks when printed.  Notice the effect of the “map” hint:

(gdb) print uomap
$1 = std::tr1::unordered_map with 2 elements = {
  [23] = 0x804f766 "maude",
  [5] = 0x804f777 "liver"
}

The pretty-printer API was designed so that it could be used from MI.  This means that the same pretty-printer code that works for the CLI will also work in IDEs and other gdb GUIs — sometimes the GUI needs a few changes to make this work properly, but not many.  If you are an MI user, just note that the to_string and children methods are wired directly to varobjs; the change you may have to make is that a varobj‘s children can change dynamically.  We’ve also added new varobj methods to request raw printing (bypassing pretty-printers), to allow efficient selection of a sub-range of children, and to expose the display_hint method so that a GUI may take advantage of customized display types.  (This stuff is all documented in the manual.)

Next we’ll learn a bit about scripting gdb.  That is, instead of using Python to extend gdb from the inside, we’ll see how to use Python to drive gdb.

7. Pretty printing, part 1

Consider this simple C++ program:

#include <string>
std::string str = "hello world";
int main ()
{
  return 0;
}

Compile it and start it under gdb.  Look what happens when you print the string:

(gdb) print str
$1 = {static npos = 4294967295,
  _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x804a014 "hello world"}}

Crazy!  And worse, if you’ve done any debugging of a program using libstdc++, you’ll know this is one of the better cases — various clever implementation techniques in the library will send you scrambling to the gcc source tree, just to figure out how to print the contents of some container.  At least with string, you eventually got to see the contents.

Here’s how that looks in python-gdb:

(gdb) print str
$1 = hello world

Aside from the missing quotes (oops on me), you can see this is much nicer.  And, if you really want to see the raw bits, you can use “print /r“.

So, how do we do this?  Python, of course!  More concretely, you can register a pretty-printer class by matching the name of a type; any time gdb tries to print a value whose type matches that regular expression, your printer will be used instead.

Here’s a quick implementation of the std::string printer (the real implementation is more complicated because it handles wide strings, and encodings — but those details would obscure more than they reveal):

class StdStringPrinter:
    def __init__(self, val):
        self.val = val

    def to_string(self):
        return self.val['_M_dataplus']['_M_p'].string()
gdb.pretty_printers['^std::basic_string<char,.*>$'] = StdStringPrinter

The printer itself is easy to follow — an initializer that takes a value as an argument, and stores it for later; and a to_string method that returns the appropriate bit of the object.

This example also shows registration.  We associate a regular expression, matching the full type name, with the constructor.

One thing to note here is that the pretty-printer knows the details of the implementation of the class.  This means that, in the long term, printers must be maintained alongside the applications and libraries they work with.  (Right now, the libstdc++ printers are in archer.  But, that will change.)

Also, you can see how useful this will be with the auto-loading feature.  If your program uses libstdc++ — or uses a library that uses libstdc++ — the helpful pretty-printers will automatically be loaded, and by default you will see the contents of containers, not their implementation details.

See how we registered the printer in gdb.pretty_printers?  It turns out that this is second-best — it is nice for a demo or a quick hack, but in production code we want something more robust.

Why?  In the near future, gdb will be able to debug multiple processes at once.  In that case, you might have different processes using different versions of the same library.  But, since printers are registered by type name, and since different versions of the same library probably use the same type names, you need another way to differentiate printers.

Naturally, we’ve implemented this.  Each gdb.Objfile — the Python wrapper class for gdb’s internal objfile structure (which we briefly discussed in an earlier post) — has its own pretty_printers dictionary.  When the “-gdb.py” file is auto-loaded, gdb makes sure to set the “current objfile”, which you can retrieve with “gdb.get_current_objfile“.  Pulling it all together, your auto-loaded code could look something like:

import gdb.libstdcxx.v6.printers
gdb.libstdcxx.v6.printers.register_libstdcxx_printers(gdb.get_current_objfile())

Where the latter is defined as:

def register_libstdcxx_printers(objfile):
   objfile.pretty_printers['^std::basic_string<char,.*>$'] = StdStringPrinter

When printing a value, gdb first searches the pretty_printers dictionaries associated with the program’s objfiles — and when gdb has multiple inferiors, it will restrict its search to the current one, which is exactly what you want.  A program using libstdc++.so.6 will print using the v6 printers, and (presumably) a program using libstdc++.so.7 will use the v7 printers.

As I mentioned in the previous post, we don’t currently have a good solution for statically-linked executables.  That is, we don’t have an automatic way to pick up the correct printers.  You can always write a custom auto-load file that imports the right library printers.  I think at the very least we’ll publish some guidelines for naming printer packages and registration functions, so that this could be automated by an IDE.

The above is just the simplest form of a pretty-printer.  We also have special support for pretty-printing containers.  We’ll learn about that, and about using pretty-printers with the MI interface, next time.

6. Auto-loading Python code

I think the idea of backtrace filters (the topic of the previous post) is a pretty cool one.  And, as I mentioned before, extending gdb with application-specific behavior is a compelling use for the Python scripting capability.

Remembering to source these snippets is a bit of a pain.  You could, of course, stick a command into your ~/.gdbinit — that is pretty easy.  I like things to be more automatic, though.  Suppose someone writes a new filter — it would be nice to get it without having to edit anything.

Naturally, we provide an automatic mechanism for loading code — or I wouldn’t be writing this, would I?

Internally, gdb has an structure called an “objfile“.  There is one of these for the inferior’s executable, and another one for each shared library that the inferior has loaded.  A new one is also created when gdb loads separate debug info (typical for distros — not so typical for your own builds).

When gdb creates a new objfile, it takes the objfile‘s file name, appends “-gdb.py“, and looks for that file.  If it exists, it is evaluated as Python code.

Here’s a simple way to see this in action.  Assuming you’ve been using the directory names I’ve used throughout this series, put the following into ~/archer/install/bin/gdb-gdb.py:

import gdb
print "hi from %s" % gdb.get_current_objfile().get_filename()

Now run gdb on itself (remember — you should still have the archer install directory in your PATH):

$ gdb gdb

I get:

[...]
hi from /home/tromey/archer/install/bin/gdb
(gdb)

This naming scheme is ok-ish for stuff you just built, but not so for distros.  We’ll be augmenting the search capability a bit so that we can hide the Python files away in a subdirectory of /usr/lib.  I’m not sure exactly what we’ll do here, but it shouldn’t be hard to come up with something reasonable.

Another wrinkle is that this scheme does not work transparently for statically-linked executables.  Ideally, we would have a way to automatically find these snippets even in this case.  One idea that has been mentioned a few times is to put the Python code directly into the executable.  Or, we could put the code next to the source.  Both of these ideas have some drawbacks, though.

Note that one of these files might be loaded multiple times in a given gdb session — gdb does not track which ones it has loaded.  So, I recommend that for “real” projects (something you ship, not just a local hack) you only put import commands (and a couple other idempotent operations, one of which we’ll discuss soon) into the auto-load file, and install the bulk of the Python code somewhere on sys.path.

Our next topic is something that many people have asked for over the years: application-specific pretty-printing.  And, as we’ll see, this provides another use for auto-loading of Python code.