IJVM Tools Manual

I SCO, kapitel 4, afsnit 4.2, An example ISA: IJVM, defineres en simpel stakmaskine Integer Java Virtual Machine. Der er tale om en forenklet udgave af Java Virtual Machine (JVM) som er udstyret med en delmængde af dennes instruktionssæt (figur 4-11); der kan kun bruges simple datatyper såsom 32-bit heltal; lagret består af Constant Pool, Method Area, Local Variable Frame samt Operand Stack, der addresseres relativt til registre (CPP, PC, LV samt SP). Et program skrevet i symbolsk maskinsprog til IJVM er vist i figur 4-14. I figur 4-15 er det beskrevet i detaljer, hvad der sker, når programmet afvikles på IJVM. ønsker man at forsøge sig med at køre andre programmer til IJVM og se en detaljeret kørselsbeskrivelse, kan man bruge værktøjer beskrevet i denne vejledning: En symbolsk maskinsprogsoversætter ijvm-asm samt en IJVM bytecode fortolker ijvm. Brug af IJVM værktøjerne samt hvorledes de installeres på andre maskiner beskrives i følgende vejledning.

Contents

Overview

The IJVM tools consist of an assembler and an interpreter for the subset of Java bytecode introduced in Structured Computer Organization (Tanenbaum, 2005), chapter 4. The assembler, ijvm-asm, translates symbolic IJVM instructions into IJVM bytecode. The bytecode produced, in turn, serves as input to the interpreter, ijvm, which executes the bytecode and gives a detailed execution trace.

An Example

Consider the following assembly language IJVM program:

.method main               // int main
.args 3                    // ( int a, int b )
.define a = 1
.define b = 2
                           // {
        bipush 88          // Push object reference.
        iload a
        iload b
        invokevirtual min
        ireturn            // return min ( a, b );
                           // }

.method min                // int min
.args 3                    // ( int a, int b ){
.define a = 1
.define b = 2
.locals 1                  // int r;
.define r = 3

        iload a            // if ( a ≥ b )
        iload b
        isub

// stack = a - b, ... ; a - b < 0 → a < b

        iflt else

        iload b            //   r = b;
        istore r
        goto end_if

else:
        iload a            //   r = a;
        istore r

end_if:
        iload r            // return r;
        ireturn
                           // }

It consists of two methods main and min. They both take two arguments of type integer. This is specified by the directive .args 3, two integers plus the implicit object reference, SCO, p. 251. The method min has a single local variable specified by the directive .locals 1. Symbolic constants are introduced in both methods to ease readability of the index addressed access to the arguments ( a and b) and the local variable ( r).

Suppose the file test.j contains this program. To translated test.j into bytecode and save the bytecode in the file test.bc use the assembler ijvm-asm as follows:

ijvm-asm test.j test.bc

The result is a file containing the bytecode represented as:

main index: 0
method area: 40 bytes
00 03 00 00 10 58 15 01 15 02 b6 00 01 ac 00 03
00 01 15 01 15 02 64 9b 00 0a 15 02 36 03 a7 00
07 15 01 36 03 15 03 ac 
constant pool: 2 words
00000000
0000000e

The bytecode contains three regions; main index, method area and the constant pool. The main index specifies the index in the constant pool of the address of the main method initially invoked by the interpreter. The method area holds the bytecode generated for the methods in the program.The constant pool contains the constants used in the program and for each method an entry with the start address of the method in the method area.

The bytecode program can be executed by the IJVM interpreter ijvm e.g. as follows:

ijvm test.bc 77 43

In this case the two arguments 77 and 43 are passed as actual parameters to the initial invokation of main. The result is a detailed execution trace on standard output:

IJVM Trace of foo

                                stack = 0, 1, 43, 77, 15
bipush 88           [10 58]     stack = 88, 0, 1, 43, 77, 15
iload 1             [15 01]     stack = 77, 88, 0, 1, 43, 77, 15
iload 2             [15 02]     stack = 43, 77, 88, 0, 1, 43, 77, 15
invokevirtual 1     [b6 00 01]  stack = 12, 13, 0, 43, 77, 21, 0, 1
iload 1             [15 01]     stack = 77, 12, 13, 0, 43, 77, 21, 0
iload 2             [15 02]     stack = 43, 77, 12, 13, 0, 43, 77, 21
isub                [64]        stack = 34, 12, 13, 0, 43, 77, 21, 0
iflt 10             [9b 00 0a]  stack = 12, 13, 0, 43, 77, 21, 0, 1
iload 2             [15 02]     stack = 43, 12, 13, 0, 43, 77, 21, 0
istore 3            [36 03]     stack = 12, 13, 43, 43, 77, 21, 0, 1
goto 7              [a7 00 07]  stack = 12, 13, 43, 43, 77, 21, 0, 1
iload 3             [15 03]     stack = 43, 12, 13, 43, 43, 77, 21, 0
ireturn             [ac]        stack = 43, 0, 1, 43, 77, 15
ireturn             [ac]        stack = 43
return value: 43

The execution trace shows the disassembled bytecode instructions in the left column and the raw bytecodes in the middle column. The right column displays the top words of the stack (at most the eight top words are displayed). The first line shows the initial stack content after main has been invoked by the interpreter with the actual arguments taken from the command line i.e. 43 and 77. When the execution terminates the value returned from main is printed.

The IJVM Assembly Language Syntax

This section describes the syntax for the IJVM assembly language in a modified Backus-Naur form. In particular the notation method+ means one or more occurrences of method and directive* means zero or more occurrences or directive. Certain restrictions and features that are not directly visible from the syntax, are summarised here:

In the following literals are written in boldface:

program : method+

method : .method symbol directive* insn+

directive : .args expr
          | .locals expr
          | .define symbol = expr

insn : bipush expr
     | dup
     | goto symbol
     | iadd
     | iand
     | ifeq symbol
     | iflt symbol
     | if_icmpeq symbol
     | iinc expr , expr
     | iload expr
     | invokevirtual symbol
     | ior
     | ireturn
     | istore expr
     | isub
     | ldc_w expr
     | nop
     | pop
     | swap
     | symbol :

expr : integer
     | symbol
     | expr + expr
     | expr - expr
     | ( expr )

The contents of an IJVM assembler language program is a set of methods, each declared using the .method directive. For each method the number of arguments the method takes can be specified (.args) and the number of local variables to be allocated upon invocation (.locals). If nothing is specified the number of arguments will default to 1, since an object reference must always be passed. The default number of local variables is 0.

Using the .define directive symbolic constants can be introduced. The scope of these definitions is limited to the current method. A label is declared by writing its name followed by a colon in front of the instruction they refer to, ie.:

while:
        iload i
        bipush 1
        ...

As with symbolic constants, the scope of labels is limited to the current method. That is, only goto, ifeq, iflt and if_cmpeq instructions within the same method can use the label as jump target.

Input/Output

In JVM input/output is performed by calls to special methods, SCO, p. 254. Input/output is also available in IJVM as two special methods getchar and putchar. These are similar to the input/output available in the stdio library of the programming language C: getchar returns the ascii value of the next character available at the standard input unit as an unsigned byte, or -1 if end of file or error occurs; putchar takes an ascii value as argument and outputs the corresponding character to the standard output unit; the ascii value is also returned unless some error has occurred then -1 is returned.

A simple program copy.j that copies an input stream of characters to an output stream:

// An input stream of characters is copied from standard input
// to a stream of characters on standard output until the character 
// 'f' is encountered.

.method newline                 // int newline()
.define nl = 10                 // {
.define OBJREF = 44
        bipush OBJREF
        bipush nl
        invokevirtual putchar   // putchar(nl); 
        ireturn                 // return nl;
                                // }

.method main                    // int main(){                  
.locals 1                       // int c;
.define c = 1
.define asciif = 102
.define OBJREF = 44
  
        bipush OBJREF
        invokevirtual getchar
        istore c                // c = getchar();
while:  
        iload  c                // while ( c!='f') {
        bipush asciif
        isub
        ifeq end_while
        bipush OBJREF
        iload  c
        invokevirtual putchar   //    putchar(c);
        pop                     // discard return value
        bipush OBJREF
        invokevirtual getchar
        istore c                //    c = getchar();
        goto while              // }
end_while:
        bipush OBJREF
        invokevirtual newline   // newline();
        pop
        iload c
        ireturn                 // return c;
                                // }

When an IJVM program produces output on the screen through the method putchar , this output is interleaved with the execution trace. To avoid the execution trace a silent activation of the interpreter can be performed as follows:

ijvm -s copy.bc
ggtsj678
return value: 102

The output of the program copy.bc, ie. ggtsj678, is then the only output that will appear on the screen together with the returned value 102.

The 16 bit unsigned integer indices used for the predefined methods getchar and putchar in invokevirtual are 32768(0x8000) and 32769(0x8001).

IJVM Instructions

Opcode Mnemonic Description
0x10 BIPUSH byte_exp Push a byte onto stack
0x59 DUP Copy top word on stack and push onto stack
0xA7 GOTO label Unconditional jump
0x60 IADD Pop two words from stack; push their sum
0x7E IAND Pop two words from stack; push Boolean AND
0x99 IFEQ label Pop word from stack and branch if it is zero
0x9B IFLT label Pop word from stack and branch if it is less than zero
0x9F IF_ICMPEQ label Pop two words from stack and branch if they are equal
0x84 IINC varnum_exp, byte_exp Add a constant value to a local variable
0x15 ILOAD varnum_exp Push local variable onto stack
0xB6 INVOKEVIRTUAL method Invoke a method
0x80 IOR Pop two words from stack; push Boolean OR
0xAC IRETURN Return from method with integer value
0x36 ISTORE varnum_exp Pop word from stack and store in local variable
0x64 ISUB Pop two words from stack; push their difference
0x13 LDC_W constant_exp Push constant from constant pool onto stack
0x00 NOP Do nothing
0x57 POP Delete word from top of stack
0x5F SWAP Swap the two top words on the stack
0xC4 WIDE Prefix instruction; next instruction has a 16-bit index

byte_exp: an expression that evaluates to an integer in the range [-128,127]. Assembled to a 1-byte 2-complement signed integer.

label: a symbol defined as a label. Assembled to a 2-byte 2-complement signed integer offset from the opcode of the branch or goto instruction.

varnum_exp: an expression that evaluates to an integer in the range [0,255] (or [0,65535]). Assembled to a 1-byte (or 2-byte) unsigned integer index in the local variable frame. A wide prefix is used if a 2-byte index is needed. In the iinc instruction a 1-byte index is always used even if a wide prefix is present. This is not the case for JVM.

method: a symbol defined as the name of a method. Assembled to a 2-byte unsigned integer index of an entry in the constant pool that contains the start address within the method area of the method.

constant_exp: an expression that evaluates to an integer in the range [-2147483648,2147483647]. Assembled to a 4-byte 2-complement signed integer and placed in the constant pool. In the bytecode, the ldc_w opcode is followed by a 2-byte unsigned integer index of the entry in the constant pool that contains the constant value.

Installation

The IJVM tools are supported on the Irix, Solaris and Linux platforms but should work on most other UNIX derivatives also. To install the tools a working C compiler must be available on the system.

Installation from source code

First, download the distribution file: ijvm-tools-0.9.tar.gz. Extract the files contained in the compressed archive, using the command:

gzip -cd ijvm-tools-0.9.tar.gz | tar x

This creates a directory called ijvm-tools-0.9 where all the source files reside.

Next, configure the package for the system in question. This is done by changing to the ijvm-tools-0.9 directory and running the configure script, by issuing the command:

./configure

The script will check that a C compiler and other tools are available and then create makefiles. It will also determine installation directories. Binary files are installed in /usr/local/bin as a default, but this can be changed using the --prefix option; see the file INSTALL in the distribution for details.

If the configure script ran successfully, then build the tools, using the command:

make

If the build succeeded, the final installation of the tools is accomplished by:

make install

This will copy the binaries, ijvm-asm and ijvm to /usr/local/bin (unless something else is specified when configuring the tools).

At this point the tools should be installed and ready to use.

[19-11-2015] Kasper Sacharias Roos Eenberg has made an installation video.

[19-11-2015] Tobias Røikjer has made an interactive IJVM stack visualizer (tested on Linux)

Precompiled versions

Below is a list of contributed precompiled versions of ijvm-tools for various platforms.

[07-02-2006] A precompiled Windows version of the IJVM tools (version 0.8)ijvm-tool-0.8win.zip has been created by Steffen Mikkelsen.

[02-11-2009] A precompiled Linux version of the IJVM tools (version 0.9) available as a Debian/Ubuntu package has been created by Dan Søndergaard. 32 bit version: ijvm-tools_0.9-1_i386.deb. 64 bit version: packages/ijvm-tools_0.9-1_amd64.deb.

[12-11-2015] A precompiled Mac OS X version of the IJVM tools (version 0.9.1) ijvm-tools-0.9.1-mac.zip has been created by Asger Hautop Drewsen. See README.txt for details.

[19-11-2015] A package for Arch Linux by Kasper Sacharias Roos Eenberg.

[19-11-2015] Adding /users/kursus/dArk/linux/bin to your path will give access to the tools.

IJVM Specification File

The IJVM tools are instantiated by means of a specification obtained from a file ijvm.spec. This file defines all the instructions that the IJVM tools implement: the assembler uses the specification to get a definition of the instructions ie. their names, opcodes and operandtypes; the simulator uses the specification to be able to disassemble the instructions.

The specification file defines the instructions using a fairly simple line-based format. Each line contains an instruction definition made up of three elements:

<opcode> <mnemonic> <comma separated operand list>

where opcode is the opcode in hexadecimal notation; mnemonic is the mnemonic code of the instruction (ie. the name of the instruction, eg. iadd) and the operand list specifies what types of operands the instruction expects. The following table explains the different types available and the resulting bytecode values that the assembler generate:

byte An operand that evaluates to an 8 bit signed integer value; the 8 bit signed value is the bytecode value generated.
label A label defined within the current method; the 16 bit signed offset from the current opcode to the address of the label is the bytecode value generated.
method The name of a method; the 16 bit unsigned index of the method in the constant pool is generated.
varnum An 8 bit unsigned index, handled as the operand type byte.
varnum-wide An 8 or 16 bit unsigned index; depending on the size of the index, either an 8 or 16 bit index is generated. Furthermore, if the index is greater than 255 a wide prefix opcode is generated before the current opcode.
constant A 32 bit signed constant; the value is added to the constant pool, and the 16 bit unsigned index of the value in the constant pool is generated.

Thus, when bytecode for an instruction is generated, the assembler first inspects the actual operands to see if a wide prefix is needed. Then it generates the instruction opcode and then for each operand type it generates bytecode values as described above.

As an example the following is used as the default specification by the tools:

0x10 bipush byte
0x59 dup
0xA7 goto label
0x60 iadd
0x7E iand
0x99 ifeq label
0x9B iflt label
0x9F if_icmpeq label
0x84 iinc varnum, byte
0x15 iload varnum-wide
0xB6 invokevirtual method
0x80 ior
0xAC ireturn
0x36 istore varnum-wide
0x64 isub
0x13 ldc_w constant
0x00 nop
0x57 pop
0x5F swap
0xC4 wide

By default the tools will read a specification from the file ijvm.spec installed in /usr/local/share (this depends of course on the installation). Alternatively, by setting the environment variable IJVM_SPEC_FILE it is possible to make the tools read a different specification. If the shell is tcsh this is accomplished by:

setenv IJVM_SPEC_FILE my-ijvm.spec

If the shell is bash the following will do:

export IJVM_SPEC_FILE=my-ijvm.spec

Alternatively you can use the command line option -f to supply another specification file, eg.:

ijvm-asm -f my-ijvm.spec test-program.j

and likewise for the simulator.

Extending IJVM

The specification file mechanism described above makes it easy to extend the IJVM instruction set. First, the specification is extended with definitions of the new instructions. Then, the simulator is extended to be able to execute the new instructions. This is done by adding a case-branch for each new instruction to the switch statement in the file ijvm.c and then rebuild the simulator. To make the new case-branch readable add a definition of a name for each opcode (by means of #define in the file ijvm-util.h) to be able to refer to the opcode using a symbolic name instead of a hex code.

To rebuild the simulator you will need either the full distribution or mini-ijvm.tar.gz, which is the minimum set of files required. To unpack the archive do:

tar xfz mini-ijvm.tar.gz

This will create a directory called mini-ijvm which contains the files:

Makefile
ijvm-spec.c
ijvm-spec.h
ijvm-util.c
ijvm-util.h
ijvm.c
ijvm.spec
types.h

The file ijvm.c implement the fetch-decode-execute cycle of the simulator; ijvm-util.c, ijvm-util.h, ijvm-spec.c and ijvm-spec.h are auxiliary files that implement disassembling, output and specification file handling. The file Makefile controls how the simulator is built (see eg. the man page for make) and ijvm.spec is a copy of the default specification file, provided for convenience.

When the specification file, ijvm.c and ijvm-util.h have been changed, type make to rebuild the simulator. As a result, the new simulator is available as ijvm. To try the simulator it may be necessary to invoke it as ./ijvm, otherwise the shell will run the version installed on the system.

Here is a small example. Suppose we want to add the instruction iconst_0, which pushes the constant 0 onto the stack, ie. the equivalent of bipush 0. As opcode, 0x03 would be a reasonable choice, since this is the opcode in the JVM (see, the JVM Reference).

First, we add the line:

0x03 iconst_0

to the file ijvm.spec so that ijvm-asm will be able to handle iconst_0. Then we define a symbolic name for the opcode by adding the following line to ijvm-util.h:

#define IJVM_OPCODE_ICONST_0 0x03

Finally, the actual implementation of iconst_0 is done by adding the following case-branch to ijvm.c:

case IJVM_OPCODE_ICONST_0:
  ijvm_push (i, 0);
  break;

To test our implementation, we use this test program:

.method main

        iconst_0
        bipush 5
        iadd
        ireturn

Having set the environment variable to ./ijvm.spec as described above, we assemble the test program and get the following bytecode:

main index: 0
method area: 9 bytes
00 01 00 00 03 10 05 60 ac
constant pool: 1 words
00000000

When we run this with the new simulator we get the result expected:

IJVM Trace of - 

                                stack = 0, 0, 1
iconst_0            [03]        stack = 0, 0, 0, 1
bipush 5            [10 05]     stack = 5, 0, 0, 0, 1
iadd                [60]        stack = 5, 0, 0, 1
ireturn             [ac]        stack = 5
return value: 5