Structure of Java Virtual Machine (JVM)

Java based applications run in Java Runtime Environment (JRE) which consists of set of Java APIs and Java Virtual Machine (JVM). The JVM loads an application via class loaders and run it by execution engine.

JVM runs on all kind of hardware where executes a java bytecode without changing java execution code. VM implements a Write Once Run Everywhere principle or so-called platform independence principle. Just to sum up JVM key design principles:
  • Platform independence
    • Clearly defined primitive data types – Languages like C or C++ have size of primitive data types dependent on the platform. Java is unified in that matter.
    • Fixed byte order to Big Endian (network byte order) – Intel x86 uses little endian while RISC processors uses big endian. Java uses big endian.
  • Automatic memory management – class instances are created by user and automatically removed by Garbage Collection.
  • Stack based VM – typical computer architectures such as Intel x86 are based on registers however JVM is based on a stack.
  • Symbolic reference – all types except primitive one are referred via symbolic name instead direct memory addresses.

Java uses bytecode as an intermediate representation between source code and machine code which runs on hardware. Bytecode instruction are represented as 1 byte numbers e.g. getfield 0xb4, invokevirtual 0xb6 hence there is maximum of 256 instructions. If the instruction doesn’t need operand so next instruction immediately follows otherwise operands follows instruction according to instruction set specification. Those instructions are contained in class files produced by java compilation. Exact structure of class file is defined in “Java Virtual Machine specification” section 4 – class file format. After some version information there are sections like: constant pools, access flags, fields info, this and super info, methods info etc. See the spec for the details.

A class loader loads compiled java bytecode to the Runtime Data Areas and the execution engine executes the java bytecode. Class is loaded when is used for the first time in the JVM. Class loading works in dynamic fashion on parent child (hierarchical) delegation principle. Class unloading in not allowed. Some time ago I wrote article about class loading on application server. Detail mechanics of class loading is out of scope for this article.

Runtime data areas are used during execution of the program. Some of these areas are created when JVM starts and destroyed when the JVM exits. Other data areas are per-thread – created on thread creation and destroyed on thread exit. Following picture is based mainly on JVM 8 internals (doesn’t include segmented code cache and dynamic linking of language introduced by JVM 9).

25AC9487-1AB9-48DE-936C-B6A9AC781637

  • Program counter – exist for one thread and has address of current executed instruction unless it is native then the PC is undefined. PC is in fact pointing to at a memory address in the Method Area.
  • Stack – JVM stack exists for one thread and holds one Frame for each method executing on that thread. It is LIFO data structure. Each stack frame has reference to for local variable array, operand stack, runtime constant pool of a class where the code being executed.
  • Native Stack –  not supported by all JVMs. If JVM is implemented using C-linkage model for JNI than stack will be C stack(order of arguments and return will be identical in the native stack typical to C program). Native methods can call back into the JVM and invoke java methods.
  • Stack Frame – stores references that points to the objects or arrays on the heap.
    • Local variable array – all the variables used during execution of the method, all method parameters and locally defined variables.
    • Operand stack – used during execution of the bytecode instruction. Most of the bytecode is manipulating operand stack moving from local variables array.
  • Heap – area shared by all threads used to allocate class instances and arrays in runtime. Heap is the subject of Garbage Collection as a way of automatic memory management used by java. This space is most often mentioned in JVM performance tuning.
  • Non-Heap memory areas
    • Method area – shared by all threads. It stores runtime constant pool information, field and method information, static variable, method bytecode for each classes loaded by the JVM. Details of this area depends on JVM implementation.
      • Runtime constant pool – this area corresponds constant pool table in the class file format. It contains all references for methods and fields. When a method or field is referred  to JVM searches the actual address of the method or field in the memory by using constant pool
      • Method and constructor code
    • Code cache – used for compilation and storage of methods compiled to native code by JIT compilation
Bytecode assigned to runtime data areas in the JVM via class loader is executed by execution engine. Engine reads bytecode in the unit of instruction. Execution engine must change the bytecode to the language that can be executed by the machine. This can happen in one of two ways:
  1. Interpreter – read, interpret and execute bytecode instructions one by one.
  2. JIT (Just In Time) compiler – compensate disadvantage of interpretation. Start executing the code in interpreted mode and JIT compiler compile the entire bytecode to native code. Execution is than switched from interpretation to execution of native code which is much faster. Native code is stored in the cache. Compilation to native code takes time so JVM uses various metrics to decide whether to JIT compile the bytecode.

How the JVM execution engine runs is not defined by JVM specification so venders are free to improve their JVM engines by various techniques.

More details can be found in The Java® Virtual Machine Specification – Java SE 9 Edition

 

 

Advertisements

Java class version

Time to time it might happen that you need to know which version the class files were compiled for. Or to be more specific what target were specified while running javac compiler. As target specifies VM version the classes were generated for. This can be specified in maven as follows:

               <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <configuration>
                         <target>1.6</target>
                    </configuration>
               </plugin>

It is not a rocket science, right. To find out the version the code were generated for we use javap (java class file disassembler). The following line do the trick:

javap -verbose -classpath versiontest-1.0.jar cz.test.string.StringPlaying

Compiled from "StringPlaying.java"
public class cz.test.string.StringPlaying extends java.lang.Object
  SourceFile: "StringPlaying.java"
  minor version: 0
  major version: 50
  Constant pool:
const #1 = Method       #12.#28;        //  java/lang/Object."<init>":()V
const #2 = String       #29;            //  beekeeper
const #3 = Method       #30.#31;        //  java/lang/String.substring:(II)Ljava/lang/String;

Major version matches java version based on following table


Table taken from Oracle blog