Architecture of a Java Compiler

Architectural Overview

A modern optimizing compiler can be logically divided into four parts:

The compiler front end

The front end includes the scanner and parser which read the Java source and build an abstract syntax tree (AST) representation of the source code. The front end must also be able to read the symbol information in the Java ".class" files that are referenced by import statements. After converting the source into an AST, the front end resolves symbol declarations, does semantic analysis and builds the symbol table and other supporting data structures. The output of the front end is an AST where each node in the AST is annoted with either type or symbol information.
Java symbol table design issues

The symbol table is one of the core data structures in a compiler. Unlike the AST, which can be deleted after the flow graph is built, the symbol table "lives" as long as the Java source is being compiled. Java's scoping and lack of unique names within a scope complicate symbol table construction.
The middle pass

The middle pass performs tree to tree transformations and builds the control flow graph of basic blocks that the optimizer works on. An example of a tree to tree transformations is method in-lining.
The optimizer

The optimizer builds data structures that describe the variable usage throughout the control flow graph for the method (this is usally called global data flow). This information is used to optimize data references globally within a method.
The code generator

The code generator generates instructions for the target processor. The code generation phase also does machine dependent optimization, including peep-hole optimization and load/store scheduling.

Ian Kaplan, March 5, 2000

back to Java page