Web Pages Related to Compiling the Java Programming Language

Forward

This web page was originally written in February of 2000. Many things have changed since then, in some cases dramatically. These changes have been reflected among the groups that were working on compiling Java into Native code (see They're All Dead below). As some of these companies went out of business and others changed their strategies, many of the HTML links referenced here became invalid. The material that was published on the Web has been lost as a result. As more and more of humanities information is published on the Internet, the problem of the transient nature of this information becomes an issue. To some degree this problem is addressed by the Internet Archive, but this archive is incomplete.
January 2004

Introduction

This is a growing list of links to groups and companies doing work on Java compilation. I've also added a list of references. True confession time: I hate trying to remember where I saw something, so if I see a Web link on compiling Java that might be useful in the future, I include it here so I can find it later. Like a squirrel burying nuts. Also, like a squirrel which forgets where the treasure is burried, I occasionally forget what is on my web pages, which I have been working since 1995.

I don't claim that this web page is a complete listing. For example, initially I missed the Jove optimizing byte code compiler. So if you know of something that is not included here that fits the grab bag of topics covered, please send me e-mail (iank@bearcave.com).

I have links to several commercial products on this page. Unless noted, I have only looked at the literature published on-line by these companies. In most cases I have not used the products. It is hard enough to keep up with all that is going on with Java, Jini, JavaSpaces, without trying to be a product reviewer. Sometimes a product that looks really good "on paper" may not be as impressive once it is used extensively. So I don't "endorse" any product (not that companies are actively seeking my endorsement).

It's also possible (probable?) that despite my best efforts I have misunderstood something about these products. So take what I write with a grain of salt. This is a round about way of stating that I don't want to get outraged e-mail (or letters) from marketeers and lawyers. This is just my opinion. However, if you are a compiler developer or software engineer and you have comments on the material here, I would definitely like to hear from you.

Compiling Java Byte Code into Native Code.

Most Java to native code compilers currently read Java class files, not Java source. This avoids having to implement a Java language front end, since the compiler can read class files generated by Sun's javac, Microsoft's J++ or any other byte code compiler.

The Java class file contains a lot of the original Java source information (see my notes on the Java class file disassembler I wrote). A compiler can read the Java class file symbol information and the Java byte codes and build an internal flow graph that can be used for optimization and code generation.

Java Source to Native Compilers

A byte code to native compiler builds the internal flow graph used by the compiler optimization and code generation phase from the Java byte codes. A Java source to native compiler has a front end that parses Java and does syntactic and semantic analysis. Some tasks, like method in-lining may be easier when there is a Java source front end. A compiler that reads Java source must also be able to read Java class files to properly handle import statements and classes defined in other files.

Java is a complex language. A Java grammar is large and the semantic checking that must be done to catch semantic errors is complicated. As a result, building a Java front end which parses Java source and class files and builds abstract syntax trees (AST), symbol and type information is a significant effort. Since such a front end must also be able to read class files, many compiler vendors skip the complexity of syntactic and semantic analysis by implementing the class file reader. A byte code compiler like Sun Microsystems javac serves as a "first pass". Since javac can be down-loaded and used without a fee, this does not impose too much on the user. So the list of compilers read Java source is smaller (tiny, in fact) than the list compilers that take Java class files as input. (We take a break now for a message from our sponsor, Bear Products International: Bear Products International is developing Java front end, which is available for license. We now return you to your Web page.)

Academic Projects

There are a number of Java to native projects. In fact, it has become difficult to keep these web pages up-to-date because work is proceeding so rapidly. So here are some of the academic Java compiler projects:

Tower Technology's towerJ byte code to C compiler

Tower Technology sells a byte code to C Java compiler named towerJ. Much of the material on the Tower Technology web pages discusses how the towerJ compiler can help Java uses get better performance. There is not much mention about optimizations. However, from talking to Tower Technology it appears that they do many of the standard optimizations in preparation for generating C code. The C code generation makes the product quickly protable in a way that native code is not. While C code will never be as efficient as native code, it may be that in corporate server side applications the performance advantage of native code compilation is not as high. Compared to a language like Fortran, Java is new. The computer science community does not have much experience with the language yet.

Lies, Damn Lies and Benchmarks

Performance analysis and characterization is difficult, whether it involves software, hardware or some combination there in. I've spent a lot of my professional life working on high performance computers and parallel processors. Prospective customers always want to know whether the high performance system the vendor is selling is faster than the system they already own. Sales and marketing will say "Yes, it's much faster". But an engineer will usually answer "it depends". The speed-up on the high performance system depends on the characteristics of the application. This is never a very satisfying answer for anyone, but it is inescapable. The same problem exists when trying to understand whether the code generated by one compiler is faster than the code generated by another.

Performance analysis of compiler generated code has a long history. Originally there were no benchmarks. Users would do their own benchmarking on in-house applications. This was fine as long as the users were willing to dedicate time to doing benchmarking, since in most cases they were not willing to publish the source for their applications. This left no way for the compiler vendors to publish performance numbers that would compare their compiler against other compilers. So synthetic benchmarks like Whetstone, Drystone and SPEC were developed. History starts getting ugly here, since some compilers have specific hacks that recognized portions of these benchmarks in order to produce better performance numbers. Because the special recognition was specific to the benchmark and did not generalize to a larger application class, with these compilers good benchmark numbers did not necessarily translate into better application performance.

Compared to Fortran, C and C++, Java is a new language. Many of the benchmark suites have not been translated into Java. Performance analysis of Java compiler generated code has not been a priority, since analysis centered on the speed of the interpreter and the runtime system (e.g., the garbage collector and the class library). See for example the Volano report, which looks at JVM and Java performance in a networked environment. Now that there are several Java compilers that generate native code, performance comparison is more important.

There has been a lot of discussion about the speed of Just-In-Time (JIT) compilers and hot spot optimizers that work with the interpreter. Some early Java to native compilers where only marginally faster than interpreted code. As a result, JIT and hot spot techniques resulted in Java performance that matched the code generated by the native compilers. This led to claims that a JVM could be just as fast as native code (underlying this was the idea that compiling Java to native code was heresy, since it violated Java's "write once, run anywhere" theme). Most native compiler vendors have now published benchmarks showing that the code generated by their compilers is faster than the JIT and/or hot spot interpreters. There has been less work on classic benchmarks, although these could be used for comparison against JIT or hot spot as well.

The Java world moves fast and more classic benchmarks are starting to appear. Microsoft Research did an excellent job benchmarking their Marmot Java compiler (see reference below). Benchmark and/or performance analysis data has been published by all the vendors above (except Cygnus, where the compiler is less mature). See:

The white papers published by Tower Technology provide less benchmark data, simply showing (in marketing terms) that the towerJ compiler is faster than Sun's HotSpot optimizer.

Natural Bridge notes, as I have above, that you have to be careful interpreting benchmarks (see also my discussion of benchmarks on the related web page Why Compile Java Into Native Code?). Most compiler groups claim to perform the full suite of modern optimizations. However, the implementations vary depending on the experience and talents of those groups. In practice this means that compilers that claim to implement the same optimization can differ widely in the performance delivered on real applications.

Java is more difficult to optimize than C++. If a compiler group is good, the performance of optimized code should improve over time. In several cases the Excelsior Java compiler produces code that is faster than code generated by Microsoft's Visual C++ (MSVC). The Marmot paper shows MSVC doing a better job at optimization than Marmot, in general (conspiracy theorists will say "of course", but I think that this is misplaced). So the fact that Excelsior generated code beats MSVC generated code suggests that either Excelsior does a better job or the benchmark is poorly chosen. I'm not sure which is the case.

Java performance remains a controversal issue. Sun continues to claim that the Hot Spot optimizer solves all problems. In fact, Sun's Hot Spot does show impressive performance numbers for long running servers. Hot Spot uses execution tracing for optimization. This can beat statically compiled and optimized code (for example, the Hot Spot optimizer can know that a branch is usually taken). Depsite the usual Sun hype, Sun did not invent execution profile driven optimization. This technique is almost twenty years old and has its roots in the Bulldog compiler implemented by John R. Ellis at Yale. The late mini-supercomputer company Multiflow used a trace scheduled compiler, as did Cydrome (another failed mini-supercomputer company). Profile driven optimization is currently used by some IA-64 compilers.

Long running servers represent only a fraction of the Java application space. All my Java code either consists of short running client/server applications or standard "run once" applications. For these, interpreted Java performance is slow and Hot Spot is of no help.

More benchmarks are starting to appear for Java to native compilers. For example, see The Java Performance Report - Part IV: Static Compilers, and More, August, 2001 by Osvaldo Pinali Doederlein (thanks are due to Dmitry Leskov at Excelsior sending me this link). This report provides some interesting comparisons. Benchmarking a suite of compilers is hard work and it is difficult to synthesize meaningful conclusions from the results (as noted above). As Doederlein (the author of the report) notes, I/O and other system factors can be a significant issue for some Java benchmarks, which contain networking code. Since this code is likely to be I/O bound (or dependent on the networking library), these are not a good measure of the performance of compiled code.

Optimization and Software Quality

Software quality and testing are critical in a compiler, since it is the foundation used to create all other software. Quality is achieved in two ways:

  1. Good design. A well designed compiler will have fewer bugs and will consistently produce better code (not just good benchmark code). By "well designed", I mean that the compiler phases are clearly structured and the components appear to be correct "by inspection".
  2. Testing the compiler with a large and growing test suite.

Optimization phases can be scary to work on. An optimizer takes correct code and rearranges it into another sequence of, it is hoped, correct code which is more efficient in terms of either memory or runtime. The optimization phase of a compiler consists of thousands of lines of code and the algorithms are complex. As any experienced programmer discovers, this means that a compiler is more likely to generate incorrect code when optimization is turned on.

A compiler vendor has a strong incentive to develop compilers that do well on the standard benchmarks. This does not mean that the compiler is a quality product. Unfortunately there is no benchmark that will report the relative quality of a compiler.

Once upon a time I worked for a compiler vendor, which I will not name, that sold compilers that implemented all the standard optimizations. As with most compiler vendors, they put a lot of effort into making sure that they did well on the benchmark suites. However, the quality of the compilers was terrible. The compiler source was full of ifdefs so the logic flow was difficult to understand without using a debugger. Rather than implementing software that at least looked correct by inspection, people would hack in changes and then test the compiler against the test suite. Few people at this company believed that it was possible to implement machine independent optimizations and all optimization was target dependent.

The compilers generated incorrect code much more frequently than they should have and they did not deliver performance on real applications that was proportional to the performance reported on benchmarks. Unfortunately there is no way to tell in advance whether a compiler is a reliable piece of software. Only use will determine this. So when you see a page of benchmark results, remember that quality is also an important metric.

Java Source to Byte Code Compilers

Java Class File Optimizers (byte code to byte code compilation)

A class file optimizer reads a Java class file and generates another class file that is optimized for size and byte code efficiency. The Jopt tool is a Java class file optimizer written by Markus Jansen of the University of Technology, Aachen, Germany. According to the Jopt Web page, these optimizations include:

Jopt has been tested against a range of Java class files. This web pages publishes the results of the optimizaton research that has been done with Jopt. The Jopt project is collecting Java class files to use as test cases for their optimizer. So if you have some around, send them in.

For those of you who are worried about people decompiling your Java classes, it looks like Jopt will optimize them into unreadability. So it's not only an optimizer, but it's an obfuscator. Personally, I think that if you really want to protect your code against being decompiled, just compile it into optimized native code.

They're All Dead

Lister: Where is everybody Hol?
Holly: They're dead Dave!
Lister: Who is?
Holly: Everybody Dave!
Lister: What, Tower Technology?
Holly: Everybody Dave!
Lister: What, Instantiations Jove compiler?
Holly: Everybody Dave!
Lister: What, The NaturalBridge BulletTrain compiler?
Holly: They're all dead. Everybodys dead Dave
Lister: What about Diab Data's fastJ? It's not dead.
Holly: Everybodys dead Dave!

With apologies to the British television program Red Dwarf

June, 2003

Except for the Excelsior Java to native compiler, the commercial Java to native compilers are dead. Tower Technology is no longer in business. Instantiations no longer sells their Jove Java compiler and Natural Bridges is concentrating on a high performance Java interpreter. Like most compiler companies, Diab Data was purchased by another company, in this case WindRiver. I did not find any mention of the fastJ Java to native compiler on their web pages. The GNU gcj compiler, from Red Hat still exists. From talking to people who have used it, this compiler is not of the same quality as the GNU C++ compiler.

When there are multiple products that do similar things it is not surprising to find that some products dominate, while others fail. This is not what seems to have happened in the case of Java to native compilers. Of the commercial products, only the Excelsior compiler still survives. One interpretation might be that Excelsior dominated the market and killed off all the competing products. Another explaination is that the market for java to native compilers is small, at best. Apparently the latter explaination is the one closest to the truth.

There may be a number of reasons for the Java to native compiler die-off. The reasons that occur to me are:

The reality of the market (they're all dead, Dave) is the ultimate argument. Still, I'm surprised. As processor performance increases so do the demands placed on applications. There never seems to be enough speed. Many Java applications, whether enterprise or mathematics models, have terrible performance. To some extent one can throw hardware at the problem. However, a Java to native compiler is considerably cheaper than a high performance multiprocessor.

I also expected that a Java to native compiler would be attractive because an increasing number of younger software engineers are not fluent in C++, since they learn Java in school and use it in industry. Switching to C++ is only an option if you have spent the years needed to develop C++ expertise.

References

Fortran is one of the best programming languages for compiler optimization. Since optimization has been important to the Fortran community, over the years the language semantics have been cleaned up and clearly specified to aid optimization. Sadly Fortran lacks the abstraction of object oriented languages like C++ and Java. But optimization in object oriented languages, especially in the presence of exceptions, is difficult. The core data structure for optimization is the control flow graph (data flow is the secondary structure, embedded in the control flow graph). Building a correct flow graph when there are exceptions is something that has not been discussed much in the compiler literature. I am still trying to understand how to build a flow graph that properly supports exceptions without totally destroying any chance for classic optimization.

The literature on compiler optimization spans at least thirty years. This list does not claim to cover even a fraction of this literature. It lists some of the material I have been reading currently and centers Java compilation issues. I would be grateful for any Java optimization Web accessable references or books not listed here. If you send me e-mail I will add these references to this list.

For some reason, 1997 was a good year for advanced compiler books. As I mentioned, the literature on compiler design, development and optimization is massive. The first two references below summarize much of this literature and are a great resource for anyone who is serious about compiler design.

Links to other Java Compiler related link collections

Links to compiler related resources on the Web

As noted above, the literature on compiler design and implementation is huge. We can only hope that Knuth will live long enough to actually write a book or books providing a summary of some of this material. The list below is not meant to be comprehensive. When I see something I that might be of interest in the future I try to squirrel it away. If you have references or links, please send me email. With all of those caveats, you can check my links page here.

Ian Kaplan, February 12, 2000
Revised most recently on: January 9, 2004


back to Notes on Software and Software Engineering