Recently Classpath’s generics branch was merged to be the main line. This means all future Classpath releases will use generics, and now we’re free to use other Java 5 features in the Classpath source.
When we started the generics branch we made a conscious decision to do a “shallow” translation to generics — we rewrote method signatures and visible field signatures, but not the bodies of methods. This was done so that we could more easily merge changes on the trunk to the generics branch, a smart decision considering that the generics branch lived for two years.
This weekend I spent some time adding generics in a deep way, that is, modifying the bodies of methods to properly use generics, and attempting to remove all the warnings related to raw types, unchecked casts, etc. Aside from random warning removal, I had a specific question in mind: how much reliability do generics add?
I completely converted about 5 components (meaning some core package plus all its support code). In all of this I found 2 actual bugs. (Also I once found a bug in imageio on the generics branch during a shallow conversion, bringing the known total of bugs found by generics to 3.)
In one case we were making an invalid assumption about the actual return type of Collections.toArray()
(and, btw, this particular API can only be truly fixed by reified generics; in a sense we were lucky to catch this bug).
In another case a protocol implementation had made an incorrect assumption about the types of the contents of a collection it returned.
I have to say I was very surprised by this result. Generics are somewhat tricky to use, add a lot of verbiage to the source (especially since Java doesn’t yet have C#’s var
, aka C++’s auto
type inference feature), and now, apparently, don’t really catch very many bugs.
There are a couple other theories to consider other than “generics aren’t worth it”.
One is that Classpath, being a core library, is unusually less susceptible to bugs of this kind than other Java programs. I don’t consider this very likely, but more experience with other programs would be useful.
Another is that the Classpath development process is unusually good and catches more bugs than normal. This would be nice if it were true, but I doubt this is very likely either.
Finally, one could argue that catching even a small number of bugs in legacy code is good, and that the real worth of generics comes when writing new code.
I’ll collect more data as we convert more of Classpath to use generics deeply. I’m curious to know whether other folks have had more positive experiences during conversion, or whether there’s something I’m missing about all this. At the moment generics appear to be “nice to have” but hardly worth the substantial upgrade effort across the toolchain and the large body of existing Java code…
12 Comments
There is another possibility: Generics are useful, but in a different context.
Case in point: generics in .NET are primarily for *performance*, not correctness.
Performance improvement #1: faster use of value types. In .NET 1.1 all value types (char, int, double, custom C# `struct’ types) need to be “boxed” in order to be added to a collection type, such as System.Collections.ArrayList (similar to java.lang.ArrayList). This is a performance penalty, as boxing involves allocating a new type on the heap and copying the stack-located value into the heap-allocated value. (Java 1.5’s autoboxing is conceptually identical). Furthermore, to read a boxed object, it must first be “unboxed.”
.NET 2.0 generics allow the removal of this boxing penalty — a List will contain an array of `int’s, with no boxing overhead or performance penalty to when compared to int[]. (This does imply increased memory requirements, as the JITed code for List can’t be shared with List, but the JITed code for all reference types *can* be shared, so List and List can share *some* of the JITed code.)
Performance improvement #2: Since the compiler and execution system ensure type safety, there is no need for generated code to perform type checking at runtime. If you have a List, it can only contain strings, so there is no need for List::Add(string) to check that its argument is actually a string, nor is there a need for List::get_Item() it to cast a value held within its internal array back into a string.
Compare this to Java generics, in which case the generic syntax is merely a way to remove the casting from the source code; all casting is still performed at runtime.
I think of it this way. Typed lists are a must! And maybe typed collections are ok too. And if you’ve introduced the syntax, then you might as well make it generally available.
I think the usefullness of generics is limited beyond lists and collections.
Hello,
Performance certainly plays an important role in C# generics, but I for one love the extra type safety introduced by them.
C# was strongly typed, except when it came to the general-purpose collections which relied on keeping the consumer and producer in sync regarding the datatypes stored on a collection (there were alternatives; manually writing wrappers that were strongly typed, but that is just too annoying and cumbersome).
Java and C# generics are sufficiently different that it would be worth publishing an update to the “C# and Java” comparison paper that was written by Dare a few years ago to cover the new features.
In general, people coming from a C# background can not stand the artificial limitations that Java generics have. And this might explain the extra overheard that Tom is referring to.
Miguel.
There are a is one major thing that makes it look like you aren’t getting all the bang for your generic buck. The GNU Classpath core libraries have been in use for several years in production environments. I know I have had a couple of runtime class cast exceptions that I wish we had found during compile/build time. Especially some of our Free Swing code had a couple of assumptions on what was put in its collections which didn’t hold true for all user code. That is something that would be easily found during write time, but that only came to light during runtime in the wild.
O, and you also seem to forget that generics were designed especially for new code and to make it easy to interoperate with legacy non-generic code. So your “goal” of finding lots of bugs in existing code (that as said above has already proven itself) was not one of the design goals.
The reason you could genericize your code swallowly without having to adopt any of the existing code that used the libraries (or even the implementation of most of the existing collection classes!) is because generics actually work as designed! 🙂
And in case you hadn’t seen it. There is already a patch for the GPLed javac to add the two proposed (var aka auto) type inference ideas (using either := or final).
http://weblogs.java.net/blog/forax/archive/2006/12/call_me_santa.html
I do feel a bit foolish for not considering that Classpath’s long life might have resulted in selection bias. Sigh. To make up for this oversight I did a bit of checking. There do seem to be some patches arising from erroneous ClassCastExceptions — but not really very many. Classpath (and libgcj) bugzilla don’t show much either (but Classpath hasn’t used bugzilla all that long).
After last time I don’t want to draw any conclusions from this however 🙂
As far as C# goes — I’m aware of the differences here. I’m not very interested in the performance question at the moment; choosing generics for performance reasons sounds like there’s been a mistake made much earlier somewhere. I suppose people do love their micro-optimizations though, I see this all the time. Even from me 🙁
Anyway, I think this question of quantifying the cost/benefit of language changes is still worth answering, even if I happen to be incompetent at doing so. That’s especially true given some of the more unusual changes being considered — closures and the special XML syntax come to mind.
To put it another way — a lot of effort went into rewriting the JLS, fixing the various compilers, adding new APIs, updating the class file format, etc. Perhaps it would have been much cheaper, and just as effective, to make FindBugs smarter.
That’s the devil’s advocate position anyhow. Naturally I use generics and wouldn’t consider doing otherwise.
I think using Classpath as a ‘base’ standard to check the usefullness is a wrong approach.
Most hackers on Classpath have been doing Java (or programming) for a long time; while the average programmer tends to have a lot less experience. Less experience thinking abstract in general.
Moving the datastructure from being implied by the docs to being checked by the compiler helps people that don’t want to keep all those pesky details in their heads. And that’s most people.
There are two things that are hard to measure. First if things work correctly some bugs will be caught while writing the tests. “oops, I put the wrong object type in the Map’, fix first, then offer patch (plus test) for review. Such things happen when writing new code and don’t show up in bugzilla because they (luckily) get caught before going into production (this is partly the static typing vs dynamic typing argument, do you trust your compiler’s type system or your unit tests?). Secondly generics add more documentation about the purpose of the various collections used in the code. That also is hard to measure because it also manifests itself only at the writing stage. It should be easier to understand what a collection really contains. But maybe the classpath code was already really well documented. And maybe a human comment is actually better than the generic annotation on the variables?
Java generics aren’t perfect (they could have made better decisions arounds dealing with subclasses that could have removed a lot of code and confusion, for instance), but they are still valuable.
The value proposition for generics can’t really be measured in the classpath code. One of the points that was made often at various JavaOne presentations when this came out was that generics are hard to get right for library implementors, but the benefits accrue to the *clients*. As Mark pointed out, type safety in general saves during the process of writing new code, where you don’t have to write as many unit tests (what was tested before becomes a compilation error instead), you don’t have to take the time to think about the casting, and you can benefit from code terseness and autocompletion in IDEs to write code faster. Every syntax addition that adds to type safety give more weight to those benefits.
Classpath (for these and a number of reasons already mentioned) is therefore the perfect vehicle to measure the cost of generics, but it is the worst of vehicles for measuring the benefit.
One other really useful aspects of generics is that they allow reverse engineering tools (e.g., code -> UML class diagrams) to do a much better job. WIthout generics, every class that uses a HashMap ends up with a dependency on the HashMap class. This creates a bogus impression that the classes are sharing a HashMap instance… it’s not true, and UML doesn’t imply it, but it reads that way (which perhaps says something about UML). Anyway, there shouild be much less of a problem with generics that way… it doesn’t go away (e.g., HashMap) but it’s better.
Also, I find code with generics easier to write and to read. Explicit casts make me nervous 🙂
I mostly do PHP now so when I read this, a small tear rolls down my face…I miss generics 🙁
Great read anyways
Take care
Jamie