Archive for the ‘Uncategorized’ Category

Why LLVM Matters

One of my wish-list items for libgcj hacking is to port the whole
mess to LLVM. LLVM is a low
level virtual machine; I think of it as a rough equivalent of the GCC
middle and back ends, only written in C++ and with a more flexible
design. For instance, LLVM can be used as a JIT as well as
ahead-of-time, and it can also do things like whole program
optimizations.

Hooking gcj and libgcj to LLVM doesn’t look particularly hard,
though it would require a larger block of free time than I seem to be
able to dig up. And, LLVM may not be quite ready for the
adventure… its exception handling is different (which is ok in a
closed world, but may matter more if you want real interoperability
with gcc-compiled C++ code), plus LLVM has notably fewer back ends,
and is missing some that matter. These are just minor bumps
though.

Today I read an interesting powerpoint
presentation
about the future of language design. It is full of
nice observations, for instance the idea that the implementation of
the next big programming language will probably be slower than what
we’re using now — since what we’re using now will have been heavily
optimized over the years.

This is where LLVM comes in. I think, these days, aspiring
language designers don’t really need to skimp on performance in order
to get their tools up and running. In the old days you would write an
interpreter or generate C code; but with LLVM it looks just as easy to
simply write a JIT. (It is also surprisingly easy to write a GCC
front end these days, so that is another viable approach to
implementing your language.)

The point of all of this is that free software lowers
institutional barriers, making division of labor more possible. In
other words, you write your language front end, and somebody else does
most of the worrying about turning it into efficient code.

New Laptop

Red Hat sent me a new laptop, replacing my ancient powerbook. It
is an excellent machine, notably more powerful than its predecessor.
For instance, I can actually build gcjx on it in a reasonable amount
of time. Finally a machine I can give Eclipse demos on 🙂

The FC3 install went very smoothly, though I haven’t yet worked
out every detail (getting wireless working looks fiddly).
Unfortunately, installing the OS is only the first step in really
configuring a new machine. Copying over my customizations is kind of
painful, especially random things I use but can’t be bothered to
properly package. Then there is also the task of configuring yum,
installing apt, and installing all that extra software I use that
isn’t in the OS itself.

This process is way simpler than it was back in the bad old days.
Configuring yum and apt is ridiculously easy. The significant barrier
right now seems to be simply remembering everything that I know I want
installed. Still, there’s room for growth in the “making it easy”
department here.

gcc.gnu.org

gcc.gnu.org is taking an little vacation, after having unexpected
and serious problems a couple of days ago. To make it possible to
get work done in the meantime, I’ve imported my working tree into monotone. This will make
it easy for me to keep my various patches separate for later copying
to the gcc repository. This is working out quite well… maybe we
should have just immediately set up a more public server for people to
use in the meantime. If gcc used monotone, I wouldn’t care nearly as
much when the machine crashed.

distcc and java

Some folks are working on modifying
javac to work with distcc
. I really should reply to this
on-list… what they are doing sort of overlaps with Anthony’s earlier
efforts in this area. Plus which, modifying a proprietary compiler is
ugly.

I’m not so sure that having distcc work for ordinary Java
compilation makes that much sense anyway. The current compilers are
all plenty fast on current hardware. It is hard to believe you could
get any substantial speedup by shipping compilations around the local
network.

On the other hand, there is a nice distcc improvement that would
help gcj. With the new binary compatibility ABI, when you compile
from .class to object, gcj no longer needs to read any dependencies
— it compiles the class file in isolation. This means it is
feasible to distribution these compilations.

So why won’t the ordinary distcc work? The most common way to
build using the new ABI is to compile entire jar files at once. So,
to be most useful, distcc would have to unpack the jar, distribute the
jobs itself, and then link the results. This is a slightly trickier
than it sounds because non-.class files have to be treated as
resources, requiring a different invocation of gcj.

Binary Compatibility How-To

Inspired by the GCC wiki
appearing on gcc.gnu.org, I wrote a
short
how-to
about using the new binary compatibility mode and the
libgcj database (aka caching jit) feature.

This is by far the simplest way to deploy existing applications
using gcj. It works for Eclipse, and this is how I tried out Derby
the other day. This is also the way we’re taking to get jonas to
work.

Derby and gcj

Today I gave Derby a quick
try using gcj. I followed the
same basic approach
as I used when compiling Eclipse.

That is, I compiled all the jar files with
-findirect-dispatch, stuck them in a libgcj class mapping
database, and ran gij using the compiled shared
libraries. This all compiled without trouble and the command-line
interpreter started without problems.

I haven’t done much testing, mostly since I don’t know much about
Derby and only had a short amount of time to play with it. Still,
looks like another easy success for gcj.

Wish lists

I updated my Eclipse wish
list
a little.

God of Cookery

Kurth told me about this movie about ten years ago, but I forgot
all about it until I happened to see the box on the counter at Video
Station a couple of weeks ago. Even then I had to wait since,
apparently, it is checked out frequently.

This movie — a kind of riches-to-rags-to-riches story about a
food critic slash chef — is every bit as great as K said it was all
those years ago. It is stunningly random, as if, at every juncture
while filming, the director considered the strangest next possible
direction to take. It really hit my funny bone, I recommend it.

Garbage Collection

Casey
recently wrote
about the woes of garbage collection. Here’s my
unsolicited take on the subject.

The big plus for GC is that it enables better software
engineering. A bit of global information — whether or not an object
is potentially in use — is handled globally, and no particular of
user module is responsible for its deallocation. This makes it much
simpler to write APIs; simply pass around objects as you like and the
system handles it.

Nothing is free, of course. You can usually expect to pay a speed
penalty with GC (though finding how large of one may be complicated).
The presence of GC changes the programming system in other ways as
well, for instance it ordinarily necessitates the presence of weak
references.

And, no conversation of GC would be complete without mentioning
that certain kinds of memory leaks will still persist. If you
continue to have a live reference to an object which will not be used
in the future, it won’t be collected. Explicit deallocation proponent
often erroneously point to this as a kind of GC failure, either
explicitly, or secondarily, as in “if you must explicitly null a
pointer, you might as well introduce a free() in the same
spot”. There are (at least) two points here: first, that in a GC
environment this is a local problem which can be fixed locally, and
second, the important point of GC is not that it reclaims memory, but
that it does not reclaim live memory.

That said, it is pretty easy to write C++ classes that basically
automate memory management. And, with a little planning in one’s
program, it is easy to avoid memory leaks altogether.

For gcjx, I wrote a simple “owning pointer” class that does
reference counting (you can find better ones in Boost). I’ve run into one or two
memory management bugs, mostly due to little design flaws in my API.
So, the situation in C++ really needn’t be that bad.

But then, gcjx is a fairly self-contained program, and the data
structures it builds are largely trees (with sole ownership). I
think the situation gets worse for explicit allocation when you start
looking at very large programs with modules over which you have
little control.

The ease of writing C++ wrapper classes is a minus, though, not a
plus, when it comes to this topic. Suppose you use a collection of
several libraries. Either you will end up using plain pointers and
missing out on the benefits of C++, or you’ll have to find ways to mix
and match various ownership approaches, potentially a fragile affair.
This is one of the big benefits of GC as I see it: not the technology
per se, but the API unification it implies across libraries.

Of course, the real reason I like GC is that I’m just lazy and it
makes the hacking go quicker.

gcjx now in gcc

A few days ago I finally moved gcjx development from sourceforge
to gcc.gnu.org. The branch is named gcjx-branch. It
isn’t fully hooked up to the build system yet, but you can build the
gcjx directory standalone and have a bytecode compiler.

I also recently ran jacks tests of both gcj and gcjx. The
results are overwhelmingly in gcjx’s favor:

gcjx:	Total	4928	Passed	4711	Skipped	45	Failed	172
gcj:	Total	4928	Passed	4166	Skipped	44	Failed	718

What’s funny is that their failures don’t overlap very much, and yet
they both manage to compile all of Classpath. Partly this can be
explained by the fact that compilers tend to do better on correct
code than incorrect code, but partly I just observe that even a
fairly buggy java compiler is still useful.

Andrew points out that, of course, gcjx will come with its own new
undiscovered bugs as well — and he said that without even looking at
the incomplete tree-generating back end. Still, at this point we seem
to have a lot of interesting code out there to use as test cases; I’m
sure at merge time (I think optimistically it will be sometime this
year) we’ll have confidence in the result.

Eclipse in Fedora

A gcj-compiled Eclipse RPM is now in Fedora Core Rawhide; for
instance look for “eclipse” here.
This hasn’t shown up in the FC info feed yet, but it
should soon. Thanks to Andrew Overholt, Tom Fitzsimmons, Bryce
McKinlay (and probably others) for getting this all running.

Visitors and Multimethods

gcjx uses a simple version of the Visitor pattern for
code generation. I’ve been thinking about this a bit lately, as
experience with gcjx and random discussions with Graydon have been
tweaking my interest in language design.

For those who don’t know, visitors are basically a way to achieve
dispatch on the dynamic type of an argument to a method. This is
very handy for doing things like walking the model of a program that
is built up inside a compiler.

In gcjx this takes a very simple form. There is an abstract
visitor base class which has one abstract method for each object in
the model, like:

class visitor {
  virtual void visit_block (model_block *,
			    const std::list<ref_stmt> &) = 0;
  ...
};

The arguments here are ad hoc, according to the particular object
being visited (it need not be done this way, but it was convenient
for gcjx).

Then each class in the model has its own visit method:

class model_block {
  void visit (visitor *v) {
    v->visit_block (this, statements);
  }
};

As you can see this results in a straightforward way to achieve
multiple dispatch. You simply call the visit method on
any element of the model, and the appropriate method in your visitor
will be called.

One nice thing about this approach is that the compiler will tell
you if your visitor is incomplete, since that can only happen if you
didn’t implement some abstract method. This also means it is easy to
add a new class to the model — all existing visitors will break,
making it simple to figure out where to add new methods.

The downside of this approach is that it is inflexible in a few
ways. For instance, consider the tree-generating back end in gcjx.
When compiling to trees, we want to build a new GCC tree object
representing each object in model of the program. So, the obvious
way to do that would be to have the visit method
return a tree.

This is unsatisfactory, though, because it means you have to
modify every class in the model to allow this. This in turn means
that the declaration of tree must be visible globally —
it can no longer be segregated to a single back end. Of course this
could be worked around; e.g., visit could return
void*… but then you lose type safety and have to add
casts all over.

Another approach to this problem is multi-methods, which means
doing dispatch on the runtime type of the arguments. This way you can
use generic functions instead of visitors, and then easily add new
kinds of visitors without modifying the classes in the model.

C++ doesn’t directly support this, though apparently it can
be done
. One drawback I do see here is that it doesn’t seem
possible to determine when you haven’t written a method. The
compiler, seemingly, can’t tell you… a classic sort of
static/dynamic tradeoff. I’m not really all that familiar with
existing multimethod implementations, maybe there is some nice way to
inform compilers of one’s intent here.

A third approach, taken in GCC, is to simply switch
on the type of the object. One advantage of this approach is that it
is often simpler to keep track of local state — you can write
iterative code instead of recursive code in some places, you don’t
have to invert a lot of logic to put things in separate functions,
etc. This also suffers from the problems that arise if you add a new
class.

Coding styles that substitute programmer discipline for compiler
errors don’t seem to work that well for me. The ideal approach would
look somewhat like multimethods, but would let me have the compiler
check self-imposed constraints about which methods must exist.