codesigned virtual machines

40
Codesigned Virtual Machines Shin Gyu, Kim 2006. 10. 16

Upload: miracle

Post on 09-Jan-2016

61 views

Category:

Documents


2 download

DESCRIPTION

Codesigned Virtual Machines. Shin Gyu, Kim 2006. 10. 16. Codesigned VM. Application Binary. Application Binary. Application Binary. Native ISA. Source ISA. Source ISA. Hardware. Virtual Machine. VM Software. Target ISA. Target ISA. VM Hardware. Hardware. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Codesigned  Virtual Machines

Codesigned Virtual Machines

Shin Gyu, Kim2006. 10. 16

Page 2: Codesigned  Virtual Machines

22

Codesigned VM

ApplicationBinary

Hardware

Native ISA

ApplicationBinary

VirtualMachine

Source ISA

Hardware

Target ISA

ApplicationBinary

VM Software

VM Hardware

Source ISA

Conventional HW/SW interface

Conventional Virtual Machine

interface

HW/SW Codesigned

Virtual Machine

Target ISA

SW becomes part of the HW

divide the implementation of HW & SW in an optimal way

Page 3: Codesigned  Virtual Machines

33

Codesigned VM & System VM (1/2)

• Codesigned VM & System VMo Support an entire system (OS + App)

Codesigned VM has a form of system VM.o But in codesigned VMs,

• Not intended to virtualize HW resources• Not intended to support multiple VM environment.

o The Goals include performance, power efficiency, design simplicity.

Page 4: Codesigned  Virtual Machines

44

Hardware

Codesigned VM & System VM (2/2)

Application

OS

VMM

Translator

CodeCache

Source ISA (IA32)

Target ISA (Crusoe)

o We refer to the VM SW as a VM Monitor (VMM)

Con

ceal

ed

Mem

ory

Vis

ible

Mem

ory

Page 5: Codesigned  Virtual Machines

55

Codesigned VM & Process VM

• Codesigned VM vs. Process VMo Similarity : emulate the source ISA, dynamic translat

ion, code cacheo But in codesigned VMs,

1. Intrinsic compatibility at the ISA level (not ABI level) Both user-level & system-level ISA must be emulated.

2. Improved performance, power efficiency, design simplicity. Compatibility is just a requirement, not a motivation.

Page 6: Codesigned  Virtual Machines

66

Codesigned VM & Superscalar processor

• Codesigned VM vs. Superscalar processoro Similarity : perform translation

• source ISA target ISAo But in codesigned VMs,

• The translation is done in SW. less cost, small size, design simplicity, much more opti

mization opportunities, low power consumption• Inter-instruction optimization is possible.

Page 7: Codesigned  Virtual Machines

77

Code Translation Methods

instr. 1

instr. 2

instr. 3

.

.

.

instr. n

micro-op amicro-op bmicro-op cmicro-op dmicro-op e

.

.

.

micro-op pmicro-op qmicro-op r

source target

Code translation by HW

Context-free

instr. 1

instr. 2

instr. 3

.

.

.

instr. n

instr. A

instr. B

instr. C

instr. D

.

.

.

instr. M

source target

Code translation by SW

Context-sensitive

Page 8: Codesigned  Virtual Machines

88

Contents

• Memory & Register State Mapping• Self-Modifying Code & Self-

Referencing Code• Support for Code Caching• Implementing Precise Traps• Input/Output

Page 9: Codesigned  Virtual Machines

99

Register State Mapping

• Register state mapping is easier.o Host register files can be made larger enough

to accommodate the guest’s.

r0-r31

counter

linkreg

MQ

const. 0

R0-R31

R32

R33

R34

R35

R36-R63

PowerPC Daisy host

ScratchSpeculative Results

ConstantsPointers

Page 10: Codesigned  Virtual Machines

1010

Memory State Mapping

• Concealed Memoryo A reserved region for VMM, code cache, other

data used by VMM.o Never visible to guest SW.

• This is possible because VMM takes control from the boot process.

o Fixed size, normally diskless (to simplify the system design)

o VMM may be stored in ROM.

Page 11: Codesigned  Virtual Machines

1111

Concealed Memory (1)

CodeCache

VMM Code

VMM Data

Source ISA Code

Source ISA Data

ICache

DCache

ProcessorCore

Concealed Memory

Conventional Memory

• Memory system in Codesigned VMo I-cache only holds target ISA instructions

Page 12: Codesigned  Virtual Machines

1212

Concealed Memory (2)

• Memory mapping for Concealed memory1. Concealed logical memory shares address

space with the guest• Host address space must be enlarged

ConcealedLogicalAddress

ConventionalLogicalAddress

Concealedreal

memory

Conventionalreal

memory

concealedmemorymapping

guestmemorymapping

Page 13: Codesigned  Virtual Machines

1313

Concealed Memory (3)

• Memory mapping for Concealed memory2. Two separate logical address spaces.

• Load/Store must select the mapping.• This can be controlled by the VMM.

concealedmemorymapping

guestmemorymapping

ConcealedLogicalAddress

ConventionalLogicalAddress

Concealedreal

memory

Conventionalreal

memory

Page 14: Codesigned  Virtual Machines

1414

Concealed Memory (4)

• Memory mapping for Concealed memory3. Use real addressing for concealed memory.

• Special case of option 2.• Separate set of Load/Store, or a mode bit.

guestmemorymapping

ConcealedReal

Address space

ConventionalLogicalAddress

Concealedreal

memory

Conventionalreal

memory

Page 15: Codesigned  Virtual Machines

1515

Self-Modifying Code (1)

• Basically use same technique in a process VM as in Ch 3.o It is easiest Keep guest OS’s virtual-to-real page mapp

ing intacto Write-protect guest code region Any attempt to writ

e into that region will cause a trap Then VM can handle this

• But in codesigned VMs,o cannot use a system call to write-protect, because it i

s the guest OS that manages the page tables

Page 16: Codesigned  Virtual Machines

1616

Self-Modifying Code (2)

• TLBo TLB is managed by VMMo Additional bit indicating “write-protect”.o The VMM sets write-protect bit whenever an

entry for a code page is loaded into TLB.o VMM should maintain a table of all the guest

virtual pages for translated code.

Page 17: Codesigned  Virtual Machines

1717

Self-Modifying Code (3)

• Special hardware support.o In the Transmeta Crusoe, a special hardware

structure is added to speedup fine-grained write-protection checking.

• Goal : Find out whether this is really write to translated code region

• Virtual address (TLB) Real address (Filtered by write-protect table) write fault or not

Page 18: Codesigned  Virtual Machines

1818

Self-Modifying Code (4)

virt.addr phys.addr 0

virt.addr phys.addr 1

.

.

.

.

.

.

virt.addr phys.addr 0

bit mask

bit mask

.

.

.

bit mask

ComparisonLogic

source addr

virt. page No.

phys. page No.

WP. bits

TLB

Write-Protect Tablehit/miss

wp bit mask

Page level write-protect fault

source code write fault

Page Offset Bits

Page 19: Codesigned  Virtual Machines

1919

Self-Modifying Code (5)

• I/O writes to guest code memory must be caught.o For translated code in the code cache, keep

track of all the real guest pages.o Maintain a hardware table for I/O writes –

entries for all the real pages that hold guest code page.

o A store to any of these pages cause an interrupt to the VMM. Then VMM flushes the translated code.

Page 20: Codesigned  Virtual Machines

2020

Support for Code Caching (1)

• Code cache performance is the most important.o SPC (Hash) TPC (if hit) access code cacheo Involves multiple mem access + indirect jumpo For direct jumps and branches

• Superblock chaining eliminates table lookupo But how about indirect jumps?

Page 21: Codesigned  Virtual Machines

2121

Support for Code Caching (2)

• To reduce table lookup overhead – use SW-based jump target prediction

• But,o If SW prediction is incorrect time is wastedo Many indirect jumps are difficult to predict (ex. return

s)

• Hardware support for code caching.o JTLB (Jump Translation Lookaside Buffers)o D-RAS (Dual-address Return Address Stack)

if (Rx == #addr_1) goto #target_1else if (Rx == #addr_2) goto #target_2else map_lookup(Rx)

Page 22: Codesigned  Virtual Machines

2222

JTLB (1)

• “a specially designed HW cache of map table entries”

SPC Hashtag TPC

tag TPC

tag TPC

TagCompare

tag

tag

MUX

hit or miss

JTLB

select

TPC

Page 23: Codesigned  Virtual Machines

2323

JTLB (2)

• JTLB_Lookup instruction

• Lookup_Jump instruction and predictiono Predict using BTB (branch target buffer)

• JTLB hit and prediction correct OK• JTLB hit but misprediction Redirect fetch

to jump target TPC from JTLB• JTLB miss Redirect fetch to fall-through

addr.

JTLB_Lookup Ri, Rj, RkJump Ri, Rj == 0Jump map_lookup

SPChit/missTPC

Page 24: Codesigned  Virtual Machines

2424

JTLB (1)

Tag TPC

BTB

Lookup_Jump instruction

PC

Predictedjump target

TPC

Lookup_JumpInstruction

(in pipeline)

SPC

RegisterFile

Registerindentifier

Tag TPC

JTLB

Match? Hit?

NoYes

Yes

Jumpdestination

TPC

Jumpdestination

SPC

Next predicted fetch TPC

BTB misprediction:Redirect fetch to jump target TP

C from JTLB

JTLB miss:Redirect fetch to fall-through

address

No

BTB prediction is correct

Page 25: Codesigned  Virtual Machines

2525

D-RAS (1)

• The RAS (Return address stack) helps solving return-jump problem.o Push the fall-through PC onto a stack

• But, in codesigned VM, o We need TPC (not SPC)o If the procedure call is at the end of a

translated superblock, the return address may not be correct.

TranslationBlock A

TranslationBlock X

Call

Return

???

Page 26: Codesigned  Virtual Machines

2626

D-RAS (2)

• A specialized dual-address RAS is used.

Opcode SPC TPC

Push_DRAS instruction

SPC TPC

.

.

.

.

.

.

Opcode SPC

Return instruction

Predicted SPC

Predicted TPC

push pop

Dual-Address Return Address Stack

Page 27: Codesigned  Virtual Machines

2727

Implementing Precise Traps

• Similar techniques in Chapter 3, 4o Maintain SW checkpointso Code motion with extending register live rangeo Trap occurs Interpretation beginning at the

checkpoint to establish correct state

• In codesigned VM, o Enough registers live ranges can be

extended with less register pressureo Restriction of code motion is relaxed.

Page 28: Codesigned  Virtual Machines

2828

HW Support for Checkpoints (1)

• Use HW to set a checkpoint when each translation block is entered.

TranslationBlock A

TranslationBlock B

TranslationBlock C

TranslationBlock N

set checkpoint

set checkpoint

set checkpoint

set checkpoint

Page 29: Codesigned  Virtual Machines

2929

HW Support for Checkpoints (2)

• If a trap occurs,o HW restores the

state at the beginning of the block.

o Then interpretation is used to provide the precise exception state.

TranslationBlock A

TranslationBlock B

Source code

restore checkpoint

trap !

interpret

Page 30: Codesigned  Virtual Machines

3030

HW Support for Checkpoints (3)

• When a new translation block is entered,o The state from the previous block is

“committed” o And a new checkpoint is set.

• Setting register checkpointo When checkpoint is set – registers are copied

to shadow registers.o When a trap occurs – copy back from shadow

registers to working registerso These copying are done very fast.

Page 31: Codesigned  Virtual Machines

3131

HW Support for Checkpoints (4)

• Checkpointing memoryo Gated store buffer

• Store operations are bufferedo Until the current translation block is exited

(committed)o If an exception occurs, the buffered stores are

flushedo Restrictions on code motion are relaxed.

• The code inside a translation block can be reordered by software in any fashion.

o Fixed size of store buffer constrain the translation block size.

Page 32: Codesigned  Virtual Machines

3232

HW Support for Checkpoints (5)

Guest regs

ScratchSpeculative Results

Constants

shadow Guest regs

ScratchSpeculative Results

Constants

shadow

When checkpoint is committed

When trap is detected

Page 33: Codesigned  Virtual Machines

3333

Page Fault Compatibility (1)

• Guest OS must observe exactly the same page fault as on a native platform.

• If guest OS manages conventional memoryo Page fault for data region will be detected

naturallyo During interpretation, page fault for code

region will also be detected.o But executing translation code does not fetch

any code from the guest memory

Page 34: Codesigned  Virtual Machines

3434

Page Fault Compatibility (2)

• When a translated instruction is fetched from the code cache, we trigger a page fault, ifo the corresponding guest instruction would

have caused a page fault on a native platform.

• Two approacheso Active approacho Lazy approach

Page 35: Codesigned  Virtual Machines

3535

Active Page Fault Detection (1)

• Monitor potential page replacement by the guest OS.o Assuming architected page table, VMM can

identify the mem region of page table.o VMM monitors the guest OS’s modification to

the architected page table.o By write-protecting the page table, VMM can

monitor any change of a virtual page mapping.o VMM keeps a table for : in which virtual pages

each source instructions is contained

Page 36: Codesigned  Virtual Machines

3636

Active Page Fault Detection (2)

• If the page table is modified,o VMM flushes all the translations in the code

cache derived from that (modified) page.o Table 1 - Each source page : all the translation

block (must-be-flushed blocks)o Table 2 – keep track of any link backpointers

• links (for removed pages) are changed to point VMM emulation manager.

• emulation process will detect the instruction page fault.

Page 37: Codesigned  Virtual Machines

3737

Lazy Page Fault Detection (1)

• Code cache flushing is postponed until actual use of the replaced code.o Every time the translated code crosses a

source page boundary, check the page table.o At the time crossing the boundary, Verify_Translation instruction is inserted.

o It checks the page mapping • page mapped correctly proceed • page not mapped page fault

Page 38: Codesigned  Virtual Machines

3838

Lazy Page Fault Detection (2)

ABC

DE

FG

HI

J

K

L

ABC

DE

FG

HI

J

K

L

ABC

DE

FG

HIJ

KL

Probe page tablePage

correctlymapped?

Yes

No Jump to VMM

continue execution

Guest Pages Code Cache

Verify_Translationinstruction

Page 39: Codesigned  Virtual Machines

3939

Input/Output (1)

• If the VMM does not use any I/O,o All the guest device drivers can be run as is.o Any I/O instructions or memory mapped I/O is

simply passed through.

• Volatile memory inhibit optimization. So we need to identify access to the volatile memory.o Use access-protect bit : load/store to that page

trap deoptimize for correct sequence.o Special volatile version of load/store

Page 40: Codesigned  Virtual Machines

4040

Input/Output (2)

• Using disk in VMMo for disk-based code

cache approach – large, persistent code cache

o requires relaxed transparency

o “concealed secondary storage”

o VMM-aware special disk driver

Guest OS

VMM

SpecialDisk Driver

ConcealedDisk region