Archive for May, 2006

A Couple libgcj Updates

I’ve been working steadily on replacing gcj’s front end with the
Java compiler from Eclipse. This is largely working now. I have a
new main program for the Eclipse compiler, and I have gcc set up to
invoke this (via the magic of gcc specs — an evil little ad hoc
scripting language that you should hope you never have to learn).

The new driver is a little funny. When ecj compiles a file, it
writes the classes to a jar which gcj compiles. This way we don’t
have to have an arbitrary number of temporary files for communication,
e.g. for all the inner classes. This takes advantage of java’s
built-in ability to make jars, and gcj’s existing ability to compile a
jar file all at one. I thought this was amusing, anyway… maybe I’ve
been working too much.

I thought this would be very simple, but I should have realized
that this would reveal every bizarre class file compilation bug in
gcj, some of which can only be seen if you are compiling the core
class library this way. For example, the bytecode verifier needed a
special case to handle the constructor in Object.

In any case, I can now build all of libgcj this way. I’m
debugging a few runtime failures now.

LLVM-based JIT

Aside from the whole exception handling mess — which experts more
expert than I are, hopefully, busily hacking on — the JIT seems to be
working reasonably well. I’m just about ready to clean up the API and
check it in (as an experimental preview), I don’t think any more
changes in that area will be needed in the near future.

There are still a few lurking code generation bugs. Nothing too
hard, just mishandling jsr a little.

I recently added the first bits of recompilation to the JIT. Now
it will realize when it resolves a constant pool entry, and mark the
method as ready for re-jitting. The idea here is that before linking,
a constant pool reference requires a method call to incrementally link
the class, whereas after linking, a constant pool reference is simply
a constant. Another similar optimization is that when we initialize a
class, we mark the method as ready for re-jitting — the code to lower
from bytecode to LLVM will check a class’ state and avoid emitting
initialization calls as needed.

This still hasn’t seen much testing. And, to really do a good job
here we have to add profiling code of some kind. I really need to
read the literature here. If only there were time.

Eclipse and Teams

In Classpath we’ve narrowly avoided most of the team-related
Eclipse problems Daniel
Berrange
mentions encountering.

Partly this has been because we’re all mad free software hackers,
and so we don’t even try to ensure we all have the same tools.
Instead folks doing development in Eclipse have a choice of operating
systems and of virtual machines (both jamvm and cacao work nicely in
this environment).

Over time Eclipse has gotten better at separating the personal
from the project — Eclipse 3 is much better than Eclipse 2 was here
(e.g. now you can store the coding style for the formatter in the
project). And, we do make an effort to set things up so that typical
uses won’t require any changes to the project metadata; that both
avoids unpleasant accidents and lets people in different environments
continue to get sane results. Partly we’re able to do this because we
have an unusual hybrid build which includes autoconf (this isn’t
without its bad qualities as well).

We put a fair amount of effort into making our setup turnkey.
There’s a
nice web page on the wiki
which explains, step-by-step, how to get
set up for working on Classpath. This is still a lot harder than we’d
like, but it can be done in thirty minutes or so if you’re already
familiar with Eclipse. I demoed this at FOSDEM this year; most of the
time during the demo was spent waiting for the build (Classpath is
over a million lines of code, so that can be forgiven). In real life
you also spend some time waiting for the network — my demo was with a
local cvs server.

Also, supposedly team sets help a bit here. As I understand it a
team set is a way to specify a group of projects to check out. My
brief experiment here was a little un-promising; I think it picked the
wrong user for CVS repositories by default (I was hoping it would know
that when I used “anonymous”, I wanted the end user not to have to
specify a username). Someone in the Classpath community (I forgot
who, sorry) did make a team set but then it never ended up on the
wiki… (ping).

As a user you do have to watch out a bit. I once installed the
FindBugs plugin, and then to my dismay found out that it modified the
project builders. This is unacceptable; I had to remove the plugin to
avoid accidentally contaminating the shared build. (Any plugin
architecture suffers from problems like this, though. Allowing
plugins means giving up centralized quality control.)

For what it’s worth, our “initial setup” problems with Eclipse
haven’t been much better or worse than the problems we’ve had with
shell-based builds. Eclipse could definitely improve here. (Once
you’re set up, of course, the difference is startling.)

Ideally Speaking

My ideal setup for these things would involve more integration
between Eclipse, the OS, and the organization.

It would be nice if a project could specify required plugins; when
checking out the project Eclipse would first launch the update manager
to install them. In this ideal world, the update manager would work
with RPM rather than stand perpendicular to it.

It would also be very nice to have rendezvous support or the like;
the idea being that in a typical organization you would simply launch
Eclipse and not have to figure out where the version control server
lies. In the free world an analog would be the ability to click a
link in Mozilla and have the repository automatically show up in
Eclipse. Every project’s home page would have a big “Hack Me” button
which you would click to get a working development tree. Or, you
could have Eclipse interface to irc and pick up on repositories and
update sites that way… join #classpath, get a dialog asking if you
want to access the classpath team set.

This latter idea gets to something that bothers me about
programming. Computers are powerful communication devices, and a huge
part of our jobs entails using them to communicate. However, much of
this communication happens at two extremes — there is the extremely
unstructured form like email or irc; and there is the overly
structured form, like a cvs commit. I’m hoping for a generation of
tools that is a bit more loose; tools that notice interesting things
without too much interaction or attention on my part; and also tools
that let me more easily wire up communication the way I need it (my
perennial example here is the amount of time I would’ve saved over the
years if only we could simply drive the debugger over irc).

There’s some work in this area. Eclipse has ECF, NetBeans has its
thing, whatever it is called. There’s also the Jazz project for
Eclipse, from IBM. This isn’t open source (it may be someday), but I
hear the demo is quite cool. It sounds a little too
command-and-control for my taste, perhaps, but I’d imagine it can be
made a little more peer-oriented.

Naturally, none of this is available on the timescale I want,
namely last week.

NCLUG

Last week I drove up to Fort Collins to give a talk about gcj at
NCLUG. I thought it went pretty
well… I gave an updated version of my old talk from FOSDEM 2004, but
then deleted the slides by mistake when I was trying to upload them.
The problem with my computer (and me!) assuming that I’m a power user
is that, occasionally and unpredictably, I am not.

Afterward a bunch of us went next door for Chinese food. I talked
to Evelyn from tummy.com a bit.
Apparently Fedora has let them retire KRUD, a local RH-based
distro. From the KRUD page it isn’t clear if this is a plus or a
minus, but in my mind it is a plus — it means Fedora is successfully
addressing needs that were not addressed by the old Red Hat
Linux.

Evelyn also had an experience similar to mine — and everybody’s,
I suppose — when installing linux for desktop use. I can’t just
install Fedora, I must also download flash (mozilla makes this easy,
but of course yum would be nicer), java (I didn’t on my FC5 box, but
partly because I’m keeping up appearances), and various sound and
video things. Evelyn also needed acroread, to my surprise; but
apparently only acroread can handle editing PDF forms.

Add to this the messy situation with proprietary drivers (my
laptop came with the atheros wifi stuff, which I still can’t get to
work on FC5) and the lack of ipod support, and you’d think that Linux
sucked.

I’m still hopeful though. We’ll outgrow this annoying phase.

I also learned about Night Vision for
Java
, a planetarium program written by Brian Simpson (he was
sitting across from me at dinner). Apparently this runs ok if you
enable the java2d stuff in Classpath; he tried it without success
during my talk but I’m told that things are all fixed in cvs (which, I
hope, we’ll be shipping in FC6).

Finally, I got to meet Bob
Proulx
. Bob does a lot of stuff in GNU-land and I had seen his
name before on the automake list, but I embarrassingly failed to
connect all the dots until after I had left. I hate those awkward
social moments. They seem to occur more often to me than to other
people.

I’ll be back in Fort Collins in a couple months to talk about
autoconf and automake. A little weird, since I haven’t worked on
these for so long.

Eclipse Plugins

I was also in Raleigh last week for a speaker training class, and
I caught up with Andrew Overholt there. We talked a bit about Eclipse
packaging, a hell we’ve both had to live in.

Whenever I think about what it was like to try to build that
thing, or its various plugins, I start thinking: why bother with this
at all? It’s just a huge mess!

But then I remember more. Of course we have to build it. We’re
building the OS, which changes. We need a reliable process from start
to finish so we can make and ship bug fixes. These are, btw, the same
reasons that open source java is needed — compatibility is desirable,
even necessary; but it is meaningless if you have no power to fix the
bugs preventing it.

As a user, it is convenient to just use the eclipse update manager
to download things. (Well, sort of convenient. The update manager UI
sucks and it has zero integration with the mozilla or anything else.)
And I do use it for a number of plugins. But installing an OS
reminded me why this approach sucks — it is a lot friendlier to have
a single way to install everything. The Eclipse approach means yet
another step in setting up a machine.

I suppose one answer here is to set up a site that provides a
bridge.

I’ve often thought about making an Eclipse meta-update site, which
would mirror every plugin available. The idea here is, why bother
copying those URLs to the update manager, navigating its brainless UI
once again? Instead, let one person do this and let Eclipse users
just point at this site. (The only problem with actually doing this
is that I couldn’t think of a way to make money off it. No ad revenue
via the update manager 🙂

Anyway, in conjunction with that I suppose you could auto-generate
RPMs from binary plugins, and from there a convenient yum repository.
This would solve the problem on the user end. Distros would still be
screwed, of course. Annoying binary distributions are the Java
standard, and Eclipse would just keep on contributing to the problem.

Happenings

So the buzz is that Sun will really actually truly free
Java sometime. Details, timeline, license, etc.: TBD.

This makes me feel very weird. I assume for a moment that it
is true and that it happens under acceptable conditions: it comes
pretty soon, it is complete, it is under a non-crazy license. On the
one hand, hallelujah! This is what we’ve wanted these 10 years.

On the other hand… I wonder what I’ll do with myself. I suppose
there are plenty of interesting things to work on. Even the Sun JDK I
suppose.

But the dislocation goes far beyond my future to-do list. What
does this mean about all the work I’ve done? Is it a waste?

I probably should’ve come up with answers to that back when we
merged libgcj into Classpath and nuked a lot of code. Sometimes I
feel bad about that process.

I do have my own answers for those questions. Everything is born,
lives for a while, and dies; our programs are no different. That they
die early or late doesn’t render them meaningless — only dead. And
meaning itself is something we bring, in interpretation; it isn’t an
intrinsic quality. Of course it is one thing to think that and
another to know.

Whew. Back to reality, we’re still hacking away on gcj. It
makes no sense to change course based on a maybe as big as this one.

Miguel’s blog pointed to a
nice entry
on this topic.

Danese Cooper says we’re
too poorly organized
, or at least thought of that way. I think
she is using “organized” to mean “backed by IBM” or something like
that. Anyway there’s not much correspondence between that idea and
what we’ve actually done.

It is true that Harmony has been a notable winner in lining up IBM
and Intel behind it. I often think of Harmony as a consortium in the
guise of an ASF project. I suspect our failure here was our license;
but it is difficult to say whether this was really a mistake per se.

She also wonders. “I’m wondering how long it will take the
various Linux distros to figure out that they can ship Harmony”.
We already know about Harmony. When shipping it isn’t a
big regression from shipping gcj, we’ll probably ship it. What does
that mean? It means that platform coverage and library coverage
matter. Meanwhile gcj remains the best free VM on my list of metrics:
platforms, performance, debuggability, and community.

gcj details

I’ve got the eclipse front end plugged into gcj here. It consists
of a new driver for ecj and a patch to the gcj specs to invoke it.
I’m debugging some .class compilation bugs that this
found, but I should be able to build everything soon. (I’ve already
built 1.5 code with it.) Next step: a branch in the gcc repository.

gnash

Last night when I couldn’t sleep I became bizarrely interested in
gnash and flash
software. First, I found the gnash source code kind of unreadable —
pretty messy. I read a bit about SWF; what a weird setup this thing
has.

A flash plugin is a classic example of what not to write
in C or C++. You end up reimplementing the world. Instead, start
with a library-rich language like java and it looks much simpler. I
found JSwiff for SWF
reading. Am I deluded when I think that this plus Java2d (and sound
and I guess JMF — yuck) plus a bit of glue would make it all happen?

Another One

Today I wrote another optimization pass for gcj. This one
collapses equivalent vtable references and array length references.
You’d think that GCC itself would do this, but there’s no way to tell
the optimizers that a given field is write-once.

Really I should fix GCC to do this… but writing a new pass is
easy to do, and fixing the generic code looks daunting.

The other day I also rewrote my devirtualization pass to use the
SSA propagation engine. Again, simple to do, and it improved the
results a bit.

Hacking GCC these days, while still tricky in some details, is
just enormously simpler than it was 5 years ago. Kudos to all the
tree-ssa folks who made this happen.

ecj

I spent some time this week hooking ecj up to gcj, as threatened.
I’ve got a new driver for the eclipse compiler that eases the argument
processing a bit. This is working well enough now that I was able to
successfully compile some source code using generics by running the
gcj driver.

If only I had a decent place to check this in. I wonder if the SC
would let me make a branch for this, even though it is in political
limbo.

JIT etc.

I started writing my GCC optimizer passes because I was curious
about writing a devirtualization pass for LLVM. I wrote about half of
it and then thought that surely this would be just as simple for
tree-ssa.

I’ve been thinking a bit about heuristics for when the libgcj JIT
should recompile. The easy ones are things like: recompile when
classes are initialized, so we can remove initialization calls from
inside loops; and recompile when constant pool references are
resolved, so we can replace expensive indirect accesses with cheap
direct ones.

There’s probably a lot of literature out there that I should be
reading on other times this is worthwhile — detecting when partial
specialization is worthwhile, profile-directed runtime optimization,
etc. Maybe HLVM will help.

Actually doing the recompilation is simple; LLVM provides the
needed hooks. For things like constant pool references, I think I
will take the simple approach of simply re-lowering from bytecode to
LLVM. If this proves to be too expensive, it can always be changed, I
think. But I suspect it won’t be. And, anyway, it will be fun
finding out.