IJVM Tools Manual
I SCO, kapitel 4, afsnit 4.2, An example ISA: IJVM, defineres en simpel stakmaskine Integer Java Virtual Machine. Der er tale om en forenklet udgave af Java Virtual Machine (JVM) som er udstyret med en delmængde af dennes instruktionssæt (figur 4-11); der kan kun bruges simple datatyper såsom 32-bit heltal; lagret består af Constant Pool, Method Area, Local Variable Frame samt Operand Stack, der addresseres relativt til registre (CPP, PC, LV samt SP). Et program skrevet i symbolsk maskinsprog til IJVM er vist i figur 4-14. I figur 4-15 er det beskrevet i detaljer, hvad der sker, når programmet afvikles på IJVM. ønsker man at forsøge sig med at køre andre programmer til IJVM og se en detaljeret kørselsbeskrivelse, kan man bruge værktøjer beskrevet i denne vejledning: En symbolsk maskinsprogsoversætter ijvm-asm samt en IJVM bytecode fortolker ijvm. Brug af IJVM værktøjerne samt hvorledes de installeres på andre maskiner beskrives i følgende vejledning.
Overview
The IJVM tools consist of an assembler and an interpreter for the subset of Java bytecode introduced in Structured Computer Organization (Tanenbaum, 2005), chapter 4. The assembler,
ijvm-asm, translates symbolic IJVM instructions into IJVM bytecode. The bytecode produced, in turn, serves as input to the interpreter, ijvm, which executes the bytecode and gives a detailed execution trace.
An Example
Consider the following assembly language IJVM program:
.method main // int main
.args 3 // ( int a, int b )
.define a = 1
.define b = 2
// {
bipush 88 // Push object reference.
iload a
iload b
invokevirtual min
ireturn // return min ( a, b );
// }
.method min // int min
.args 3 // ( int a, int b ){
.define a = 1
.define b = 2
.locals 1 // int r;
.define r = 3
iload a // if ( a ≥ b )
iload b
isub
// stack = a - b, ... ; a - b < 0 → a < b
iflt else
iload b // r = b;
istore r
goto end_if
else:
iload a // r = a;
istore r
end_if:
iload r // return r;
ireturn
// }
It consists of two methods main and min. They both take two arguments of type integer. This is specified by the directive .args 3, two integers plus the implicit object reference, SCO, p. 251. The method min has a single local variable specified by the directive .locals
1. Symbolic constants are introduced in both methods to ease readability of the index addressed access to the arguments (
a and b) and the local variable (
r).
Suppose the file
test.j contains this program. To translated
test.j into bytecode and save the bytecode in the file
test.bc use the assembler ijvm-asm as follows:
ijvm-asm test.j test.bc
The result is a file containing the bytecode represented as:
main index: 0
method area: 40 bytes
00 03 00 00 10 58 15 01 15 02 b6 00 01 ac 00 03
00 01 15 01 15 02 64 9b 00 0a 15 02 36 03 a7 00
07 15 01 36 03 15 03 ac
constant pool: 2 words
00000000
0000000e
The bytecode contains three regions; main index, method area and the constant pool. The main index specifies the index in the constant pool of the address of the main method initially invoked by the interpreter. The method area holds the bytecode generated for the methods in the program.The constant pool contains the constants used in the program and for each method an entry with the start address of the method in the method area.
The bytecode program can be executed by the IJVM interpreter
ijvm e.g. as follows:
ijvm test.bc 77 43
In this case the two arguments 77 and 43 are passed as actual parameters to the initial invokation of main. The result is a detailed execution trace on standard output:
IJVM Trace of foo
stack = 0, 1, 43, 77, 15
bipush 88 [10 58] stack = 88, 0, 1, 43, 77, 15
iload 1 [15 01] stack = 77, 88, 0, 1, 43, 77, 15
iload 2 [15 02] stack = 43, 77, 88, 0, 1, 43, 77, 15
invokevirtual 1 [b6 00 01] stack = 12, 13, 0, 43, 77, 21, 0, 1
iload 1 [15 01] stack = 77, 12, 13, 0, 43, 77, 21, 0
iload 2 [15 02] stack = 43, 77, 12, 13, 0, 43, 77, 21
isub [64] stack = 34, 12, 13, 0, 43, 77, 21, 0
iflt 10 [9b 00 0a] stack = 12, 13, 0, 43, 77, 21, 0, 1
iload 2 [15 02] stack = 43, 12, 13, 0, 43, 77, 21, 0
istore 3 [36 03] stack = 12, 13, 43, 43, 77, 21, 0, 1
goto 7 [a7 00 07] stack = 12, 13, 43, 43, 77, 21, 0, 1
iload 3 [15 03] stack = 43, 12, 13, 43, 43, 77, 21, 0
ireturn [ac] stack = 43, 0, 1, 43, 77, 15
ireturn [ac] stack = 43
return value: 43
The execution trace shows the disassembled bytecode instructions in the left column and the raw bytecodes in the middle column. The right column displays the top words of the stack (at most the eight top words are displayed). The first line shows the initial stack content after main has been invoked by the interpreter with the actual arguments taken from the command line i.e. 43 and 77. When the execution terminates the value returned from main is printed.
The IJVM Assembly Language
Syntax
This section describes the syntax for the IJVM assembly language in a modified Backus-Naur form. In particular the notation method+ means one or more occurrences of
method and directive* means zero or more occurrences or directive. Certain restrictions and features that are not directly visible from the syntax, are summarised here:
- All literals are case-sensitive.
- An arbitrary number of ``white-space'' characters (ie. space, tab and newline) are allowed between terminals and non-terminals on the right side of productions.
- A symbol can be arbitrarily long, but must start with a letter, possibly followed by more letters, digits or "_". These are examples of valid symbols:
fibonacci, then23 and
MyMult_32.
- An integer is either specified in decimal notation (eg. 143, 45 or -31) or hexadecimal notation (eg. 0xf000, 0xbeef or -0x34).
- Comments start with "//" and extend to the end of the line.
- A method with the name main has to be one of the methods in the program.
- Arguments to invokevirtual must be names of methods declared using the .method directive.
In the following literals are written in boldface:
program : method+
method : .method symbol directive* insn+
directive : .args expr
| .locals expr
| .define symbol = expr
insn : bipush expr
| dup
| goto symbol
| iadd
| iand
| ifeq symbol
| iflt symbol
| if_icmpeq symbol
| iinc expr , expr
| iload expr
| invokevirtual symbol
| ior
| ireturn
| istore expr
| isub
| ldc_w expr
| nop
| pop
| swap
| symbol :
expr : integer
| symbol
| expr + expr
| expr - expr
| ( expr )
The contents of an IJVM assembler language program is a set of methods, each declared using the .method directive. For each method the number of arguments the method takes can be specified (.args) and the number of local variables to be allocated upon invocation (.locals). If nothing is specified the number of arguments will default to 1, since an object reference must always be passed. The default number of local variables is 0.
Using the .define directive symbolic constants can be introduced. The scope of these definitions is limited to the current method. A label is declared by writing its name followed by a colon in front of the instruction they refer to, ie.:
while:
iload i
bipush 1
...
As with symbolic constants, the scope of labels is limited to the current method. That is, only goto, ifeq,
iflt and if_cmpeq instructions within the same method can use the label as jump target.
Input/Output
In JVM input/output is performed by calls to special methods, SCO, p. 254. Input/output is also available in IJVM as two special methods getchar and putchar. These are similar to the input/output available in the stdio library of the programming language C: getchar returns the ascii value of the next character available at the standard input unit as an unsigned byte, or -1 if end of file or error occurs; putchar takes an ascii value as argument and outputs the corresponding character to the standard output unit; the ascii value is also returned unless some error has occurred then -1 is returned.
A simple program
copy.j that copies an input stream of characters to an output stream:
// An input stream of characters is copied from standard input
// to a stream of characters on standard output until the character
// 'f' is encountered.
.method newline // int newline()
.define nl = 10 // {
.define OBJREF = 44
bipush OBJREF
bipush nl
invokevirtual putchar // putchar(nl);
ireturn // return nl;
// }
.method main // int main(){
.locals 1 // int c;
.define c = 1
.define asciif = 102
.define OBJREF = 44
bipush OBJREF
invokevirtual getchar
istore c // c = getchar();
while:
iload c // while ( c!='f') {
bipush asciif
isub
ifeq end_while
bipush OBJREF
iload c
invokevirtual putchar // putchar(c);
pop // discard return value
bipush OBJREF
invokevirtual getchar
istore c // c = getchar();
goto while // }
end_while:
bipush OBJREF
invokevirtual newline // newline();
pop
iload c
ireturn // return c;
// }
When an IJVM program produces output on the screen through the method putchar , this output is interleaved with the execution trace. To avoid the execution trace a silent activation of the interpreter can be performed as follows:
ijvm -s copy.bc
ggtsj678
return value: 102
The output of the program copy.bc, ie.
ggtsj678, is then the only output that will appear on the screen together with the returned value 102.
The 16 bit unsigned integer indices used for the predefined methods getchar and putchar in
invokevirtual are 32768(0x8000) and
32769(0x8001).
IJVM
Instructions
Opcode |
Mnemonic |
Description |
0x10 |
BIPUSH byte_exp |
Push a byte onto stack |
0x59 |
DUP |
Copy top word on stack and push onto stack |
0xA7 |
GOTO label |
Unconditional jump |
0x60 |
IADD |
Pop two words from stack; push their sum |
0x7E |
IAND |
Pop two words from stack; push Boolean AND |
0x99 |
IFEQ label |
Pop word from stack and branch if it is zero |
0x9B |
IFLT label |
Pop word from stack and branch if it is less than zero
|
0x9F |
IF_ICMPEQ label |
Pop two words from stack and branch if they are equal
|
0x84 |
IINC varnum_exp, byte_exp |
Add a constant value to a local variable |
0x15 |
ILOAD varnum_exp |
Push local variable onto stack |
0xB6 |
INVOKEVIRTUAL method |
Invoke a method |
0x80 |
IOR |
Pop two words from stack; push Boolean OR |
0xAC |
IRETURN |
Return from method with integer value |
0x36 |
ISTORE varnum_exp |
Pop word from stack and store in local variable |
0x64 |
ISUB |
Pop two words from stack; push their difference |
0x13 |
LDC_W constant_exp |
Push constant from constant pool onto stack |
0x00 |
NOP |
Do nothing |
0x57 |
POP |
Delete word from top of stack |
0x5F |
SWAP |
Swap the two top words on the stack |
0xC4 |
WIDE |
Prefix instruction; next instruction has a 16-bit index
|
byte_exp: an expression that evaluates to an integer in the range [-128,127]. Assembled to a 1-byte 2-complement signed integer.
label: a symbol defined as a label. Assembled to a 2-byte 2-complement signed integer offset from the opcode of the branch or goto instruction.
varnum_exp: an expression that evaluates to an integer in the range [0,255] (or [0,65535]). Assembled to a 1-byte (or 2-byte) unsigned integer index in the local variable frame. A
wide prefix is used if a 2-byte index is needed. In the
iinc instruction a 1-byte index is always used even if a
wide prefix is present. This is not the case for JVM.
method: a symbol defined as the name of a method. Assembled to a 2-byte unsigned integer index of an entry in the constant pool that contains the start address within the method area of the method.
constant_exp: an expression that evaluates to an integer in the range [-2147483648,2147483647]. Assembled to a 4-byte 2-complement signed integer and placed in the constant pool. In the bytecode, the ldc_w opcode is followed by a 2-byte unsigned integer index of the entry in the constant pool that contains the constant value.
Installation
The IJVM tools are supported on the Irix, Solaris and Linux platforms but should work on most other UNIX derivatives also. To install the tools a working C compiler must be available on the system.
Installation from source code
First, download the distribution file:
ijvm-tools-0.9.tar.gz. Extract the files contained in the compressed archive, using the command:
gzip -cd ijvm-tools-0.9.tar.gz | tar x
This creates a directory called ijvm-tools-0.9 where all the source files reside.
Next, configure the package for the system in question. This is done by changing to the ijvm-tools-0.9 directory and running the configure script, by issuing the command:
./configure
The script will check that a C compiler and other tools are available and then create makefiles. It will also determine installation directories. Binary files are installed in
/usr/local/bin as a default, but this can be changed using the --prefix option; see the file INSTALL in the distribution for details.
If the configure script ran successfully, then build the tools, using the command:
make
If the build succeeded, the final installation of the tools is accomplished by:
make install
This will copy the binaries, ijvm-asm and
ijvm to /usr/local/bin (unless something else is specified when configuring the tools).
At this point the tools should be installed and ready to use.
[19-11-2015] Kasper Sacharias Roos Eenberg has made an installation video.
[19-11-2015] Tobias Røikjer has made an interactive IJVM stack visualizer (tested on Linux)
Precompiled versions
Below is a list of contributed precompiled versions of ijvm-tools for various platforms.
[07-02-2006] A precompiled Windows version of the IJVM tools (version 0.8)ijvm-tool-0.8win.zip has been created by Steffen Mikkelsen.
[02-11-2009] A precompiled Linux version of the IJVM tools (version 0.9) available as a Debian/Ubuntu package has been created by Dan
Søndergaard. 32 bit version:
ijvm-tools_0.9-1_i386.deb. 64 bit version:
packages/ijvm-tools_0.9-1_amd64.deb.
[12-11-2015] A precompiled Mac OS X version of the IJVM tools (version 0.9.1)
ijvm-tools-0.9.1-mac.zip has been created by Asger Hautop Drewsen. See
README.txt for details.
[19-11-2015] A package for Arch Linux by Kasper Sacharias Roos Eenberg.
[19-11-2015] Adding /users/kursus/dArk/linux/bin to your path will give access to the tools.
IJVM Specification
File
The IJVM tools are instantiated by means of a specification obtained from a file ijvm.spec. This file defines all the instructions that the IJVM tools implement: the assembler uses the specification to get a definition of the instructions ie. their names, opcodes and operandtypes; the simulator uses the specification to be able to disassemble the instructions.
The specification file defines the instructions using a fairly simple line-based format. Each line contains an instruction definition made up of three elements:
<opcode> <mnemonic> <comma separated operand list>
where opcode is the opcode in hexadecimal notation;
mnemonic is the mnemonic code of the instruction (ie. the name of the instruction, eg. iadd) and the
operand list specifies what types of operands the instruction expects. The following table explains the different types available and the resulting bytecode values that the assembler generate:
byte |
An operand that evaluates to an 8 bit signed integer value; the 8 bit signed value is the bytecode value generated.
|
label |
A label defined within the current method; the 16 bit signed offset from the current opcode to the address of the label is the bytecode value generated. |
method |
The name of a method; the 16 bit unsigned index of the method in the constant pool is generated. |
varnum |
An 8 bit unsigned index, handled as the operand type
byte. |
varnum-wide |
An 8 or 16 bit unsigned index; depending on the size of the index, either an 8 or 16 bit index is generated. Furthermore, if the index is greater than 255 a
wide prefix opcode is generated before the
current opcode. |
constant |
A 32 bit signed constant; the value is added to the constant pool, and the 16 bit unsigned index of the value in the constant pool is generated. |
Thus, when bytecode for an instruction is generated, the assembler first inspects the actual operands to see if a
wide prefix is needed. Then it generates the instruction opcode and then for each operand type it generates bytecode values as described above.
As an example the following is used as the default specification by the tools:
0x10 bipush byte
0x59 dup
0xA7 goto label
0x60 iadd
0x7E iand
0x99 ifeq label
0x9B iflt label
0x9F if_icmpeq label
0x84 iinc varnum, byte
0x15 iload varnum-wide
0xB6 invokevirtual method
0x80 ior
0xAC ireturn
0x36 istore varnum-wide
0x64 isub
0x13 ldc_w constant
0x00 nop
0x57 pop
0x5F swap
0xC4 wide
By default the tools will read a specification from the file
ijvm.spec installed in /usr/local/share (this depends of course on the installation). Alternatively, by setting the environment variable IJVM_SPEC_FILE it is possible to make the tools read a different specification. If the shell is
tcsh this is accomplished by:
setenv IJVM_SPEC_FILE my-ijvm.spec
If the shell is bash the following will do:
export IJVM_SPEC_FILE=my-ijvm.spec
Alternatively you can use the command line option -f to supply another specification file, eg.:
ijvm-asm -f my-ijvm.spec test-program.j
and likewise for the simulator.
Extending IJVM
The specification file mechanism described above makes it easy to extend the IJVM instruction set. First, the specification is extended with definitions of the new instructions. Then, the simulator is extended to be able to execute the new instructions. This is done by adding a case-branch for each new instruction to the switch statement in the file ijvm.c and then rebuild the simulator. To make the new
case-branch readable add a definition of a name for each opcode (by means of #define in the file
ijvm-util.h) to be able to refer to the opcode using a symbolic name instead of a hex code.
To rebuild the simulator you will need either the full distribution or
mini-ijvm.tar.gz, which is the minimum set of files required. To unpack the archive do:
tar xfz mini-ijvm.tar.gz
This will create a directory called mini-ijvm which contains the files:
Makefile
ijvm-spec.c
ijvm-spec.h
ijvm-util.c
ijvm-util.h
ijvm.c
ijvm.spec
types.h
The file ijvm.c implement the fetch-decode-execute cycle of the simulator; ijvm-util.c,
ijvm-util.h, ijvm-spec.c and
ijvm-spec.h are auxiliary files that implement disassembling, output and specification file handling. The file
Makefile controls how the simulator is built (see eg. the man page for make) and ijvm.spec is a copy of the default specification file, provided for convenience.
When the specification file, ijvm.c and
ijvm-util.h have been changed, type make to rebuild the simulator. As a result, the new simulator is available as ijvm. To try the simulator it may be necessary to invoke it as ./ijvm, otherwise the shell will run the version installed on the system.
Here is a small example. Suppose we want to add the instruction iconst_0, which pushes the constant 0 onto the stack, ie. the equivalent of bipush 0. As opcode,
0x03 would be a reasonable choice, since this is the opcode in the JVM (see, the JVM
Reference).
First, we add the line:
0x03 iconst_0
to the file ijvm.spec so that ijvm-asm will be able to handle iconst_0. Then we define a symbolic name for the opcode by adding the following line to
ijvm-util.h:
#define IJVM_OPCODE_ICONST_0 0x03
Finally, the actual implementation of iconst_0 is done by adding the following case-branch to
ijvm.c:
case IJVM_OPCODE_ICONST_0:
ijvm_push (i, 0);
break;
To test our implementation, we use this test program:
.method main
iconst_0
bipush 5
iadd
ireturn
Having set the environment variable to ./ijvm.spec as described above, we assemble the test program and get the following bytecode:
main index: 0
method area: 9 bytes
00 01 00 00 03 10 05 60 ac
constant pool: 1 words
00000000
When we run this with the new simulator we get the result expected:
IJVM Trace of -
stack = 0, 0, 1
iconst_0 [03] stack = 0, 0, 0, 1
bipush 5 [10 05] stack = 5, 0, 0, 0, 1
iadd [60] stack = 5, 0, 0, 1
ireturn [ac] stack = 5
return value: 5