virtual machine part i: stack arithmetic
TRANSCRIPT
Foundations of Global Networked Computing:
Building a Modern Computer From First Principles
IWKS 3300: NAND to Tetris
Spring 2019
John K. Bennett
This course is based upon the work of Noam Nisan and Shimon Schocken.
More information can be found at (www.nand2tetris.org).
Virtual Machine Part I: Stack Arithmetic
Error in the Book (pg. 138, Figure 7.10)
4315… …
local pointer that
4315…
4317 19…
local pointer that
4315… …
local pointer that
4315… 4317
19…
local pointer that
As printed:
Should be:
X
Also, in text on page 139, and in Fig. 11 on pg. 140:
replace “b.radius = 17” with “b.radius.r” (the value of r is 17)
Where We Are
Assembler
Chapter 6
H.L. Language
&
Operating Sys.
abstract interface
Compiler
Chapters 10 - 11
VM Translator
Chapters 7 - 8
Computer
Architecture
Chapters 4 - 5
Gate Logic
Chapters 1 - 3 Electrical
EngineeringPhysics
Virtual
Machine
abstract interface
Software
Hierarchy
Assembly
Language
abstract interface
Hardware
Hierarchy
Machine
Code
abstract interface
Hardware
Platform
abstract interface
Chips &
Logic Gates
abstract interface
Human
Thought
Abstract design
Chapters 9, 12
You are here
Motivation
class Main {
static int x;
function void main() {
// Inputs and multiplies two numbers
var int a, b, x;
let a = Keyboard.readInt(“Enter a number”);
let b = Keyboard.readInt(“Enter a number”);
let x = mult(a,b);
return;
}
}
// Multiplies two numbers.
function int mult(int x, int y) {
var int result, j;let result = 0; let j = y;
while ~(j = 0) {
let result = result + x;
let j = j – 1;
}
return result;
}
}
Jack code (example)
Our ultimate goal:
Translate high-level
programs into
executable code.
Compiler
00000000000100001110111111001000000000000001000111101010100010000000000000010000111111000001000000000000000000001111010011010000000000000001001011100011000000010000000000010000111111000001000000000000000100010000000000010000111011111100100000000000000100011110101010001000000000000001000011111100000100000000000000000000111101001101000000000000000100101110001100000001000000000001000011111100000100000000000000010001...
Hack code
Different Compilation Models
. . .
requires n m translators
hardware
platform 2
hardware
platform 1
hardware
platform m. . .
language 1 language 2 language n
direct compilation:
.
. . .
hardware
platform 2
hardware
platform 1
hardware
platform m. . .
language 1 language 2 language n
intermediate language
requires n + m translators
2-tier compilation:
Two-tier compilation:
First compilation stage: depends only on the details of the source language
Second compilation stage: depends only on the details of the target language.
Using Intermediate Code
. . .
RISC
machine
Intermediate code
other digital platforms, each equipped
with its own VM implementation
RISC
machine
language
Hackcomputer
Hack
machine
language
CISC
machine
language
CISC
machine
. . .written in
a high-level
language
Any
computer
. . .
VM
implementation
over CISC
platforms
VM imp.
over RISC
platforms
VM imp.over the Hack
platform
VM
emulator
Some Other
language
Jack
language
Somecompiler Some Other
compiler
Jackcompiler
. . .Some
language. . . Intermediate Code:
The interface between
the two compilation
stages
Should be sufficiently
general to support several
<high-level language,
machine-language>
pairs
Can be modeled as the
language of an abstract
virtual machine (VM)
Can typically be
implemented in several
different ways
Focus of Project 7 (yellow):
. . .
RISC
machine
VM language
other digital platforms, each equipped
with its VM implementation
RISC
machine
language
Hackcomputer
Hack
machine
language
CISC
machine
language
CISC
machine
. . .written in
a high-level
language
Any
computer
. . .
VM
implementation
over CISC
platforms
VM imp.
over RISC
platforms
VM imp.
over the Hack
platform
VM
emulator
Some Other
language
Jack
language
Somecompiler Some Other
compiler
Jackcompiler
. . .Some
language. . .
1, 2, 3, 4, 5, 6
7, 8
9, 10, 11, 12
Book chapters and
course projects:
(this, and the
next project)
The Hack VM Model and Language
Other VM models (e.g., Java JVM/JRE, .NET’s IL/CLR and Pascal
PCode) are similar in nature, but differ in scope and details.
Several different ways to think about the notion of a virtual machine:
Abstract software engineering view:
the VM is an interesting and useful abstraction
Practical software engineering view:
the VM code layer enables “managed code” (e.g., enhanced security)
Pragmatic compiler writing view:
a VM architecture makes writing a compiler much easier
(as we will see later in the course)
Opportunistic empire builder view:
a VM architecture allows writing high-level code once and have it run on
many target platforms with little or no modification (think Sun/Oracle vs.
Microsoft)
Our Game Plan
Arithmetic / Boolean commands
add
sub
neg
eq
gt
lt
and
or
not
Memory access commands
pop x (pop into x, a variable)
push y (push y, a variable or constant)
Program flow commands
label (declaration)
goto (label)
if-goto (label)
Function calling commands
function (declaration)
call (a function)
return (from a function)
Today Next Week
Goal: Specify and implement a VM model and language:
Our game plan today: (a) describe the VM abstraction (above)
(b) suggest how to implement it over the Hack platform.
The Hack VM is a Stack-based Machine
All operations are performed on the stack
Data is stored to and from the stack to several separate memory
segments
All memory segments have the same access semantics
One of the memory segments m is called static, and we will use it
(as an arbitrary example) in the following examples:
Hack VM Data Types
The VM has a single 16-bit data type that can be used as:
an integer value (16-bit 2’s complement: -32768, ... , 32767),
a Boolean value (-1 and 0, representing true and false, respectively), or a
a pointer (memory address)
The stack grows “higher” in memory:
VM Stack Memory Access Operations
The stack:
A classic LIFO data structure
Elegant and powerful
Several hardware / software implementation options
0
SP
121
5
17
Stack static
5
1 12
2
3
...
3
-532
...
pop
static 0 0
static
17
1 12
2
3
3
-532
SP
121
5
Stack
... ...
0
SP
121
5
17
Stack static
5
1 12
2
3
...
3
-532
...
(before)
push
static 20
static
5
1 12
2
3
3
-532
SP
121
5
17
3
Stack
... ...
(after)
Example of Stack Evaluation of Arithmetic Expression
Stack
y
SP
SP
2
5
-3
9
5
x 5
...9 SP
2
SP
-3
SP
-3
9
SP
SP
-3
14 SP
-17
static
push x
sub
push y
add
sub
pop z
SP
Stack
push 2
push 5
y
x 5
9
static
z
...-17
// z=(2-x)-(y+5)push 2push xsubpush ypush 5addsubpop z
(suppose that
x refers to static 0,
y refers to static 1, and
z refers to static 2)
Note: as shown here, the stack pointer is incremented
after a push, and is thus “ready” for the next push. Can
you think of another option?
Evaluation of Boolean Expressions
Stack
y
SP
SP
12
7
false
8
8
x 12
...8 SP
12
SP
false
SP
false
8
SP
SP
false
true SP
true
static
push x
lt
push y
eq
or
push 7
push 8
// ((x<7) || (y=8))push xpush 7ltpush ypush 8eqor
(suppose that
x refers to static 0, and
y refers to static 1)
(In the Hack VM, true and
false are represented as -1
and 0, respectively)
Arithmetic and Boolean Operations in the Hack VM Language
In every case, all operands are popped off of the
stack, and any result is pushed onto the stack.
A VM program is designed to provide an intermediate
abstraction of a program written in some high-level language
Modern object-oriented high-level languages normally include a
variety of variable types:
Class level: Static variables (class variables)
Private variables (sometimes called “fields” or “properties”)
Method / function level: Local variables
Argument variables
When translated into the VM language,
The static, private, local and argument variables are mapped
by the compiler on the four memory segments:
static, this, local, argument In addition, there are four other memory segments:
that, constant, pointer, temp
VM Memory Segments
VM Memory Segments and Memory Access Commands
Memory access VM commands:
pop memorySegment index
push memorySegment index
Where memorySegment is static, this, local, argument, that, constant, pointer, or temp
And index is a non-negative integer
In all our code examples thus far, memorySegment has been static.
At the VM abstraction level, all memory segments are treated the same way.
The VM abstraction includes 8 separate memory segments named:
static, this, local, argument,
that, constant, pointer, temp
All memory segments are accessed in the same way using the same generic syntax:
Hack VM Memory Segments
f3f1 f2 f4 f5
static static
temp
constant
Hack Assembly Language Code
argument argument argument
local local local
this this this
that that that
pointer pointer pointer
Foo.vm Bar.vmVM files
(f = VM function)
(one set of these VM
segments for each
instance of a running
function)
VM
Translator
one per .vm file
shared by all functions
shared by all functions
argument
local
this
that
pointer
argument
local
this
that
pointer
Hack Memory Segments
Segment Purpose Comments
argument function arguments Dynamically allocated in
ARG segment
local function local variables Dynamically allocated in
LCL; initialized to 0
static static variables Allocated for each .vm
file in static; shared by
all functions of VM file
constant pseudo segment of constants
0…32767
Emulated by VM; seen
by all functions of
program
this,
that
segments somewhere in heap Used to access selected
areas of heap
pointer two-entry segment that stores
the base address of this & that
Set by functions to point
to different areas of
heap; RAM[3..4]
temp fixed 8-entry segment for GP use RAM[5…12]
VM Programming
VM programs are normally written by compilers, not by humans
However, compilers are written by humans ...
In order to write or optimize a compiler, it helps to first understand the
spirit of the compiler’s target language – the VM language
So let’s look at a VM program example
The example includes three new VM commands: function functionSymbol // function declaration
label labelSymbol // label declaration
if-goto labelSymbol // pop x
// if x=true, jump to execute the command after labelSymbol// else proceed to execute the next command in the program
For example, to effect if (x > n) goto loop, we can use the following VM commands:
push x
push n
gt
if-goto loop // Note that x, n, and the truth value are removed from the stack
VM Programming Example
function mult (x,y) {
int result, j;
result = 0;
j = y;
while ~(j = 0) {
result = result + x;
j = j - 1;
}
return result;
}
High-level code
function mult(x,y)
push 0
pop result
push y
pop j
label loop
push j
push 0
eq
if-goto end
push result
push x
add
pop result
push j
push 1
sub
pop j
goto loop
label end
push result
return
VM code (first approx.)
function mult 2
push constant 0
pop local 0
push argument 1
pop local 1
label loop
push local 1
push constant 0
eq
if-goto end
push local 0
push argument 0
add
pop local 0
push local 1
push constant 1
sub
pop local 1
goto loop
label end
push local 0
return
Actual VM code
Just after mult(7,3) returns:
x
y
Just after mult(7,3) is entered:
SP
SP
21
7
3
0
argument
1
...
sum
j
0
0
0
local
1
Stack
Stack
...
Result
j
VM Programming: Handling Multiple Functions
Compilation:
A Jack application is a set of 1 or more class files (like .java or .cs files), containing one
or more VM functions.
When we apply the Jack compiler to these files, the compiler creates a corresponding set
of 1 or more .vm files. Each method/function in the Jack application is translated into a
VM function written in the VM language
Execution:
At any given point of time, only one VM function is executing (the “current function”), while
0 or more functions are waiting for the current function to terminate (the functions up the
“calling hierarchy”)
For example, a main function starts running; at some point it call another function, say
factorial, at which point the factorial function starts running; factorial may in turn call
mult, at which point the mult function starts running, while both main and factorial are
waiting for mult to terminate
The stack: a global data structure, used to save and restore the resources (memory
segments) of all the VM functions up the calling hierarchy (e.g. main and factorial).
The top of this stack is the working stack (called the “stack frame”) of the current function.
Our Game Plan
Arithmetic / Boolean commands
add
sub
neg
eq
gt
lt
and
or
not
Memory access commands
pop x (pop into x, which is a variable)
push y (y being a variable or a constant)
Program flow commands
label (declaration)
goto (label)
if-goto (label)
Function calling commands
function (declaration)
call (a function)
return (from a function)
Project 7 Project 8
Goal: Specify and implement a VM model and language:
Method: (a) specify the VM abstraction (stack, memory segments, commands)
(b) propose how to implement this abstraction over the Hack platform.
VM Implementation
VM implementation options:
Software-based (e.g. emulate the VM model using C#)
Translator-based (e. g. translate VM programs into the Hack assembly language, or directly into Hack machine code)
Hardware-based (realize the VM model using dedicated hardware resources)
Two well-known translator-based implementations:
JVM: Java compiler translates Java programs into bytecodes;
The JVM translates the bytecode into
the machine language of the host computer
CLR: C# compiler translates C# programs into IL code;
The CLR translated the IL code into
the machine language of the host computer.
Software Implementation: The Book’s VM Emulator
VM Implementation on the Hack platform
The stack: a global data structure, used to
save and restore the resources of all the
VM functions in the calling hierarchy.
The top part of this stack is the working stack
(stack frame) of the current function
static, constant, temp, pointer:
Global memory segments, all functions
see the same four segments
local, argument, this, that:
These segments are local to each function;
each function sees its own, private copy of
each one of these four segments
The challenge:
represent all these logical constructs in the
same single physical address space -- the
host RAM.
3
12
. . .
4
5
14
15
0
1
13
2
THIS
THAT
SP
LCL
ARG
16
. . .
17
19
20
18
21
Host
RAM
VM implementation on the Hack platform
Basic idea: the mapping of the stack and global segments onto
the RAM is fixed by the VM implementation (you); the mapping
of the function-level segments is dynamic, using pointers
stack: mapped on RAM[256 ... 2047];
The stack pointer is kept in RAM address SP
static: mapped on RAM[16 ... 255];
each static variable foo appearing in a VM file named
fname.vm is compiled to the assembly language symbol
“fname.foo” (recall that the assembler further maps such
symbols to RAM, from address 16 onward)
local, argument: these function-level segments are mapped into
the stack. The base addresses of these segments are kept in
RAM addresses LCL and ARG. Access to the i-th entry of any
of these segments is implemented by accessing
RAM[LCL/ARG +i]
this, that: these method-level segments are mapped into the
heap, from addresses 2048-3FFF. The base addresses of
these segments are kept in RAM addresses pointer 0 (for
the this segment) and pointer 1 for the that segment).
Although this and that are segments, we also refer to their
base addresses as THIS and THAT. Thus, access to the i-th
entry of these segments is implemented by accessing
RAM[THIS/THAT + i]
constant: a “virtual” segment, i.e., the VM implements “access” to
constant i by supplying the constant i.
Statics
3
12
. . .
4
5
14
15
0
1
13
2
THIS
THAT
SP
LCL
ARG
TEMP
255
. . .
16
General
purpose
2047
. . .256
2048
Stack
Heap. . .
Host
RAM
“Standard” Hack Memory Map
R0R1R2R3R4
SP
LCL Base
ARG Base
THIS Base
THAT Base
R5R6R7R8R9
TEMP
R10R11R12R13R14
N/A
N/A
R15
N/AN/AN/A
N/A
Stack
Heap
Scrn. & Kbd.
0123456789
101112131415
256-2047 *
2048-16383 *
16384-24575
Stack PointerPtr. To LCL Vars (in Stack Frame)
Ptr. To Args (in Stack Frame)Pointer 0 (ptr to This segment)Pointer 1 (ptr to That segment)
Temp 0Temp 1Temp 2Temp 3Temp 4Temp 5Temp 6Temp 7
GP registerGP registerGP register
Stack (including LCL and ARG)Heap (including THIS and THAT)
Memory-Mapped I/O
N/A Static16-255 * static vars.
RAM[n] Register n Seg. Name Use
*The memory addresses in purple can be changed if there is a reason to do so.
VM Implementation on the Hack Platform
Statics
3
12
. . .
4
5
14
15
0
1
13
2
THIS
THAT
SP
LCL
ARG
TEMP
255
. . .
16
General
purpose
2047
. . .256
2048
Stack
Heap. . .
Host
RAM
Practice Exercises
Now that we know how the memory segments are
mapped on the host RAM, let’s write some Hack
assembly language that realizes a few VM commands:
1. push constant 1
2. pop static 7 (assume this line appears in practice.vm)
3. push constant 5
4. add
5. pop local 2
6. Eq (i.e., if top two items on stack are = replace with
true (-1), otherwise, replace with false (0))
Hints:
1.The implementation of any one of these VM
commands requires several Hack assembly
commands involving pointer arithmetic
(using commands like A=M)
2. If you run out of registers (Hack only has A & D...),
you may use R13, R14, and R15.
3. For temp labels, start with “_1” then increment
Practice Exercises// push constant 1
// pop static 7
// push constant 5
// add
// pop local 2
// eq: [if top two items on stack are = replace with
// true (0), otherwise, replace with false (-1)]
Sample Answers to Practice Exercises// push constant 1
@1
D=A // D = 1
@SP
A=M // push D onto stack
M=D
@SP
M=M+1 // “preincrement” SP
// add
@SP
AM=M-1 // pop stack top into D
D=M
A=A-1 // SP = SP-1
M=M+D // RAM[SP-1] = RAM[SP-1] + D
Answers to Practice Exercises// push constant 1
@1
D=A // D = 1
@SP
A=M // push D onto stack
M=D
@SP
M=M+1 // “preincrement” SP
// pop static 7
@SP
AM=M-1 // pop stack top into D
D=M
@practice.7 // one static segment per file
M=D
// push constant 5
@5
D=A
@SP
A=M
M=D
@SP
M=M+1
// add
@SP
AM=M-1 // pop stack top into D
D=M
A=A-1 // SP = SP-1
M=M+D // RAM[SP-1] = RAM[SP-1] + D
// pop local 2
@LCL
D=M // D = LCL
@2
D=D+A // D = LCL+2
@R15 // using R15 as a temp to hold local 2 value
M=D // R15 = LCL + 2
@SP
AM=M-1 // pop stack top into D
D=M
@R15
A=M // A = LCL + 2
M=D // RAM[LCL + 2] = D
// eq: [if top two items on stack are = replace with
// true (0), otherwise, replace with false (-1)]
@SP
A=M-1 // A = SP - 1
A=A-1 // A = SP - 2
D=M // D = RAM[SP - 2]
A=A+1 // A = SP – 1
D=D-M // D = RAM[SP - 2] – RAM[SP - 1]
@_1 // A = ROM address at label: “_1”
D;JEQ // if result == 0, goto “_1”; else fall thru
@_2 // A = ROM address at label: “_2”
D=0;JMP // D = “false” (0); goto “_2”
(_1)
D=-1 // D = “true” (-1)
(_2) // D is T or F, depending on how we got here
@SP
AM=M-1 // discard 2 ops., leaving SP post push
A=A-1 // Set A to new SP - 1
M=D // push D (T or F) onto stack
Book’s Proposed VM Translator Implementation: Parser
Parser: Handles the parsing of a single .vm file, and encapsulates access to the input code. It reads VM commands, parses
them, and provides convenient access to their components. In addition, it removes all white space and comments.
Routine Arguments Returns Function
ConstructorInput file /
stream--
Opens the input file/stream and gets ready to parse
it.
hasMoreCommands -- boolean Are there more commands in the input?
advance -- --
Reads the next command from the input and
makes it the current command. Should be called
only if hasMoreCommands is true.
Initially there is no current command.
commandType --
C_ARITHMETIC, C_PUSH,
C_POP, C_LABEL, C_GOTO,
C_IF, C_FUNCTION,
C_RETURN, C_CALL
Returns the type of the current VM command.
C_ARITHMETIC is returned for all the arithmetic
commands.
arg1 -- string
Returns the first arg. of the current command.
In the case of C_ARITHMETIC, the command itself
(add, sub, etc.) is returned. Should not be called if
the current command is C_RETURN.
arg2 -- int
Returns the second argument of the current
command. Should be called only if the current
command is C_PUSH, C_POP, C_FUNCTION, or
C_CALL.
Book’s Proposed VM Translator Implementation: CodeWriter
CodeWriter: Translates VM commands into Hack assembly code.
Routine Arguments Returns Function
Constructor Output file / stream -- Opens the output file/stream and gets
ready to write into it.
setFileName fileName (string) -- Informs the code writer that the translation
of a new VM file is started.
writeArithmetic command (string) -- Writes the assembly code that is the
translation of the given arithmetic
command.
WritePushPop command (C_PUSH or
C_POP),
segment (string),
index (int)
-- Writes the assembly code that is the
translation of the given command, where
command is either C_PUSH or C_POP.
Close -- -- Closes the output file.
Comment: More routines will be added to CodeWriter in Project 8.
enum VM_CommandType { VM_NO_COMMAND = 0, C_ARITHMETIC, C_PUSH, C_POP, C_LABEL, C_GOTO, C_IF, C_FUNCTION, C_RETURN, C_CALL };
switch (commandType) {case VM_CommandType.C_ARITHMETIC:
writeArithmetic();break;
case VM_CommandType.C_PUSH:WritePushPop(shortFileName);break;
case VM_CommandType.C_POP:WritePushPop(shortFileName);break;
case VM_CommandType.C_LABEL:WriteLabel(shortFileName);break;
case VM_CommandType.C_GOTO:WriteGoto(shortFileName);break;
case VM_CommandType.C_IF:WriteIf(shortFileName);break;
case VM_CommandType.C_FUNCTION:functionName = null; // make sure that we are starting freshWriteFunction();break;
case VM_CommandType.C_RETURN:WriteReturn();break;
case VM_CommandType.C_CALL:WriteCall();break;
case VM_CommandType.VM_NO_COMMAND:// Do nothing.break;
default:errorFile.WriteLine("Line No: {0}, illegal command: {1}", lineNo, command);break;
... parseLine(); // sets commandType, vm language command, arg1 and arg2 if valid ...
Basic VM Translation Flow
Next week
private void writeArithmetic(){
// Write Hack code for arithmetic commandswitch (command){
case "add":WritePopD(); // Pop top of stack into D; side effect: A = SPoutFile.WriteLine("A=A-1");outFile.WriteLine("M=M+D");break;
case "sub":WritePopD(); // side effect: A = SPoutFile.WriteLine("A=A-1");outFile.WriteLine("M=M-D");break;
…
Example Translation
private void WritePopD(){
// POP stack top into D RegisteroutFile.WriteLine("@SP");outFile.WriteLine("AM=M-1");outFile.WriteLine("D=M");
}
using System.IO;…
if (File.Exists(inFileNameOrDir) {// This path is a file, strip off the ".vm" for the output nameshortFileName = inFileNameOrDir.Substring(0, inFileNameOrDir.Length - 3);progOutFile = new StreamWriter(shortFileName + ".asm");vm = new VirtualMachine();…ProcessFile(inFileNameOrDir);progOutFile.Close();
}else if (Directory.Exists(inFileNameOrDir)) {
// It’s a directory; process all (and only) '.vm' files in the directory; ignore subdirectoriesint lastSlash = inFileNameOrDir.LastIndexOf("\\", 0);string dirName = inFileNameOrDir.Substring(lastSlash + 1);progOutFile = new StreamWriter(dirName + ".asm");// Write the initialization codevm = new VirtualMachine();…string[] fileEntries = Directory.GetFiles(inFileNameOrDir, "*.vm", SearchOption.TopDirectoryOnly);foreach (string fileName in fileEntries) {
// Get rid of the ".vm"; we don't look for "vm, we just delete the last two charactersint dirSlash = fileName.IndexOf("\\", 0);if ((dirSlash < 0)) { // no leading dir name
shortFileName = fileName.Substring(0, fileName.Length - 2);}else {
shortFileName = fileName.Substring((dirSlash + 1), fileName.Length - dirSlash - 4);} ProcessFile(fileName);
}progOutFile.Close();
}else {
Console.WriteLine("{0} is not a valid file or directory.", inFileNameOrDir);Environment.Exit(-1);
}
File or Directory? (in C#)
Perspective
In this project we began the process of
building a compiler
Modern compiler architecture:
Front-end (translates from a high-level language to a VM language)
Back-end (translates from the VM language to the machine
language of some target hardware platform)
Brief history of virtual machines:
1970’s: p-Code
1990’s: Java’s JVM
2000’s: Microsoft .NET
A full blown VM implementation typically also includes a common
software runtime library (sort-of mini OS).
We will build such a mini OS later in the course.
. . .
VM language
RISC
machine
languageHack
CISC
machine
language. . .
written in
a high-level
language
. . .
VM
implementation
over CISC
platforms
VM imp.
over RISC
platformsTranslatorVM
emulator
Some Other
language Jack
Somecompiler Some Other
compilercompiler
. . .Some
language. . .
The Big Picture
JVM
Java
Java compiler
JRE
CLR
C#
C# compiler
.NET base
class library
VM
Jack
Jack compiler
Jack OS (really
just a runtime
system)
7, 8
9
10, 11
12
(Book chapters and
course projects)