simulation and debugging of full system binary translation
DESCRIPTION
Simulation and Debugging of Full System Binary Translation. Erik R. Altman and Kemal Ebcioglu IBM T.J. Watson Research Center. Presenter : Kim Jin Chul. Table of Contents. What is DAISY? DAISY Architecture Full System Simulation of DAISY Problem Debugging DAISY Conclusion. - PowerPoint PPT PresentationTRANSCRIPT
Simulation and Debugging of Full System Binary
TranslationErik R. Altman and Kemal EbciogluIBM T.J. Watson Research Center
Presenter : Kim Jin Chul
Table of Contents
• What is DAISY?• DAISY Architecture• Full System Simulation of DAISY• Problem Debugging DAISY• Conclusion
DAISY Background
• Problem: Many previous novel ILP Machines were quite different from x86, PowerPC, and S/390 Compatibility difficult.
• Observation: Acceptance of novel ILP Machines would be helped by compatibility with existing architectures.
• Solution: DAISY – Dynamically Architected Instruction Set from Yorktown
DAISY Principles
• First time a fragment of code is executed, it is rapidly translated to ILP code for a simple underlying ILP Machine and saved in main memory.
• Subsequent execution of same fragment do not require a translation.
• All software is translated to ILP code.
DAISY Features
• Achieves 100% architectural compatibiity with complex architectures.
• Dynamic compilation unprecedented amount of runtime info at compile time.
• Unlike SimOS and SimICS, DAISY emulation is operating system and device independent.
• The simulator model a complete system
DAISY Architecture: DAISY Schmatic
AIX Application
AIX
DAISY Translator
DAISY Machine
(a) DAISY Hardware System
AIX Application
AIX
DAISY Translator
DAISY Machine
(b) DAISY Simulator System
PowerPC Machine
DAISY Architecture: DAISY Simulation System
PowerPC604e
DAISY VLIW
Simulator
DAISYPowerPC
MemoryController
Disk Video Network Keyboard
PowerPCFlash ROM
60x Bus
Memory
PCI Bus
DAISY Architecture : DAISY Memory Map
DAISY Memory
• Translator
• Translated Code
• Side Tables
• SystemSoftware
• Simulator
• PowerPC Pages 0,1,2
PowerPC Memory
-- Except PowerPC Pages 0, 1, 2
DAISY Register Conventions• DAISY registers r0-r31 always contain the values
in PowerPC registers r0-r31.• DAISY registers r36-r63 are used for renaming
speculative results during scheduling, and as scratchpads.
• DAISY register r32 has PowerPC counter value.• DAISY register r33 has PowerPC linkreg value.• DAISY register r35 contains the constant 0.
– On PowerPC memory accesses, using r0 as an address means literal 0. Keeping 0 in r35 simplifies renaming of r0 in some cases.
• DAISY register r34 contains the now deleted Power MQ register.
Bootstrapping the DAISY Simulator
• Loading the traslator and simulator software into hgh real memory.
• This load of the translator and simulator is accomplished via AIX kernel extension.
• The kernel extension runs with address translation on initially.
• Translation of PowerPC code to DAISY VLIW code.
DAISY Scheduling Example
Original PowerPC Code
1) add r1,r2,r3
2) bc L13) sli r12,r1,34) xor r4,r5,r65) and r8,r4,r76) bc L27) b OFFPAGE8) L1: sub r9,r10,r119) b OFFPAGE10) L2: cntlz r11,r411) b OFFPAGE
Translated VLIW Code
VLIW1 : add r1,r2,r3
DAISY Scheduling Example
Original PowerPC Code
1) add r1,r2,r32) bc L1 3) sli r12,r1,34) xor r4,r5,r65) and r8,r4,r76) bc L27) b OFFPAGE8) L1: sub r9,r10,r119) b OFFPAGE10) L2: cntlz r11,r411) b OFFPAGE
Translated VLIW Code
VLIW1 : add r1,r2,r3
bc L1
DAISY Scheduling Example
Original PowerPC Code
1) add r1,r2,r32) bc L13) sli r12,r1,3
4) xor r4,r5,r65) and r8,r4,r76) bc L27) b OFFPAGE8) L1: sub r9,r10,r119) b OFFPAGE10) L2: cntlz r11,r411) b OFFPAGE
Translated VLIW Code
VLIW1 : add r1,r2,r3
bc L1
b VLIW2
VLIW2 : sli r12,r1,3
DAISY Scheduling Example
Original PowerPC Code
1) add r1,r2,r32) bc L13) sli r12,r1,34) xor r4,r5,r6
5) and r8,r4,r76) bc L27) b OFFPAGE8) L1: sub r9,r10,r119) b OFFPAGE10) L2: cntlz r11,r411) b OFFPAGE
Translated VLIW Code
VLIW1 : add r1,r2,r3
bc L1xor r63,r5,r6
b VLIW2
VLIW2 : sli r12,r1,3
r4 = r63
DAISY Scheduling Example
Original PowerPC Code
1) add r1,r2,r32) bc L13) sli r12,r1,34) xor r4,r5,r65) and r8,r4,r7
6) bc L27) b OFFPAGE8) L1: sub r9,r10,r119) b OFFPAGE10) L2: cntlz r11,r411) b OFFPAGE
Translated VLIW Code
VLIW1 : add r1,r2,r3
bc L1xor r63,r5,r6
b VLIW2
VLIW2 :
and r8,r63,r7
sli r12,r1,3r4 = r63
DAISY Scheduling Example
Original PowerPC Code
1) add r1,r2,r32) bc L13) sli r12,r1,34) xor r4,r5,r65) and r8,r4,r76) bc L27) b OFFPAGE8) L1: sub r9,r10,r119) b OFFPAGE10) L2: cntlz r11,r411) b OFFPAGE
Translated VLIW Code
VLIW1 : add r1,r2,r3
bc L1sub r9,r10,r11
xor r63,r5,r6
b VLIW2 b OFFPAGE
VLIW2 : sli r12,r1,3
cntlz r11,r63
b OFFPAGE
b OFFPAGE
r4 = r63and r8,r63,r7bc L2
Problem Debugging DAISY
• PowerPC VLIW PowerPC– More difficult still is debugging!
Problem Debugging DAISY
• The bugs and difficulties changed over time as DAISY reached ever further through the boot process
1. For the first 10-20 million instruction of firmware, the serial port on the RS/6000 machine on which we run DAISY is disabled.
2. The parallel port is also disabled.3. The debug output can be obtained during this
time is via a 3 hex digit LED on the front of the machine.
Problem Debugging DAISY
4. The firmware decompresses part of itself from ROM into system RAM.
5. We often use a binary search technique in looking for bugs.
6. Once the AIX kernel is loaded, a semantic understanding of what code is doing becomes slightly easier again.
7. But, in this threaded multitasking environment, it can be very difficult to isolate where bugs occur since things do not happen in a deterministic order
Problem Debugging DAISY
8. Finally, Is the bug in?a. VLIW codeb. Simulation code for the VLIW codec. The simulator and system software
Debugging Approach
• Debugging is a difficult problem when operating on a bare machine during a boot.
• How to solve it?– Binary Search Approach
Binary Search Approach
1000groups
500groups
500groups
We isolate the precise group with the problem.
250groups
250groups
Conclusion
• DAISY uses dynamic binary translation to make a VLIW architecture appear to be a complete 32-bit PowerPC architecture, running both user and operating system level code.