Frysk in the Present and Future

I read Federico’s post about finding poll() wakeups and got really excited about the idea of using frysk to do this.

Unfortunately I looked and frysk doesn’t yet have symbol table lookups, so there’s really no good way to set a breakpoint yet, or to fetch the local callback variables.

What a bummer. frysk offers a couple nice things for this problem: it is very easy to trace many programs (including through forks and multi-threaded programs) at once; this is one of its biggest advantages over gdb for this problem. It is also very easy to write a little custom code to pick exactly what you’re interested in — e.g., any program that calls poll more than 5 times per second — as if you have a scriptable strace.

I didn’t want to leave empty-handed though. So, I refactored ftrace a little and wrote my first actually-functioning frysk-using jython script:

import sys
import frysk
import java

class fdtracer(frysk.ftrace.SyscallHandler):
def handle(self, task, syscall, what):
frysk.ftrace.Util.printStackTrace(java.lang.System.out, task)

tracer = frysk.ftrace.Ftrace()

This will trace a process (identified by pid) and print a stack trace every time a system call in the process exits. (Bah, can’t figure out how to make this blog software not delete leading spaces. So much for posting python code, dammit.)

This is silly of course. Still, writing something like the bad close catcher is an obvious extension of this. And Federico’s case is not really all that far off… I want that to be a nice little script that will automatically find gnome-session, trace all its descendants, notice over-eager polling, and finally print the callback functions that are triggered.


  • Silly? Not at all… hard drive seeks are one of the most expensive operations there are, these days, but there are no decent profiling tools. We’ve been experimenting with dtrace for this… you can stick a probe on “a syscall block on IO” (which there is a dtrace hook for), and whenever that happens, dump out the time spent blocked and the stack trace, post-process a bit, and feed it into kcachegrind — and get real time spent on IO graphical profiles.

    If frysk is good enough to do this, that would sure save a lot of hassle — I don’t really _want_ to touch solaris!

    It sounds like frysk can probably do even better — with dtrace AFAICT you have to dump out all the stack traces and aggregate things into call-pair costs when post-processing, which adds a lot of overhead (not that I’ve measured it). Frysk probably can aggregate on the fly?

  • Hmm, but apparently not only is frysk unavailable in debian, but a bunch of complicated-sounding java libraries it depends on (parts of java-gnome) appear to be missing too.

    Maybe I’ll try again in a few months…

  • I was unclear — by “silly” I was just referring to that particular python script. I think frysk in general has a lot of great potential and automating Federico’s test looks both useful and fun.
    Frysk is not really like DTrace. It is a user-space thing only, as far as I know there’s no plan to do kernel-level tracing. For that you should check out the SystemTap project.
    The user-space-ness of it does have benefits. For instance, yeah, with frysk you could aggregate results at any time and in any way you want — you can write relatively simple java programs (or simple python programs :-) to do all kinds of things. Tracking time spent in system calls of various flavors would be quite easy for instance; in the python script posted you could add a syscall enter handler which recorded the start time, etc.
    As for building on Debian … I looked at the frysk web page and it does look pretty painful. Hopefully someone will package it up. I’m building on FC5 myself, but on FC5 and 6 you can also “yum install” it.

  • Yes — I meant that that particular python script is about 2 lines away from being the best IO profiler available on Linux :-). (As you say, you have to add time tracking.)

    I’m aware of systemtap. AFAIK dtrace is essentially systemtap + frysk in one package, which makes it hard to compare — systemtap might be able to probe on “blocked on IO”, which frysk can’t, but frsyk can give user backtraces, which systemtap can’t. For this application, the backtraces are critical, and the more precise probing is merely nice. I’m not aware of any prospects for getting both together on linux anytime soon, unfortunately.

    I just filed a request-for-package for frsyk, maybe that will do something eventually…

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>