background optimization in full system binary translation
DESCRIPTION
Background Optimization in Full System Binary Translation Roman Sokolov Alexander Ermolovich MCST Intel - PowerPoint PPT PresentationTRANSCRIPT
Background Optimization in Full System Binary Translation
Roman Sokolov Alexander Ermolovich MCST Intel
Presentation for 5th Spring/Summer Young Researchers‘
Colloquium on Software Engineering (SYRCoSE), May 12-13, 2011
IA-32 ApplicationsIA-32 Applications
IA-32 BIOS,IA-32 BIOS,OS & LibrariesOS & Libraries
IA-32IA-32ApplicationsApplications
App level BTApp level BT
NativeNativeApplicationsApplications
Native BIOS,Native BIOS, OS & LibrariesOS & Libraries
HWHW
Full System Full System BTBT
HWHW
Application-level binary translation Full system binary translation
Elbrus Binary Translation Technology for IA-32 Compatibility
Adaptive binary translation (1/2)
Translation cache: execution of
translated codes and profiling
Optimizing region translation
Interpretation and profiling
Non-optimizing trace translation
IA-32 binaries
Adaptive retranslation
Cycles per one source instruction translation
Translated code performance
Non-optimizing translation 1600 0.18
O0 optimization 30000 0.58
O1 optimization 1000000 1.0
Adaptive binary translation (2/2)
Profile of dynamic binary translation
Dynamic optimization vs. Latency
Execution Execution
New hot region acquired
Interrupt
Interrupt delivery delay (latency)
Interrupt delivery
End of optimizationStart of optimization
Background optimization
Our Approach• Optimizing translation is separated in a different thread (optimization thread), that can run simultaneously with the main execution thread.• Hot regions are detected by the execution thread and are then scheduled to be optimized in background by the optimization thread.
Dual(many)-coreOptimization is moved onto underutilized processor core Benefits• Improves application’s execution latency• Removes overhead from the application’s execution
• Enables the application of more aggressive optimizations.
Single-coreOptimization is interleaved with execution
Benefits• Improves application’s execution latency
Single-core background optimization (1/3)
Execution Execution
New hot region acquired
Interrupt
Interrupt delivery
End of optimizationStart of optimization
InterruptInterrupt delivery
Execution
Optimization
Single-core background optimization (2/3)
Consecutive optimization
Interleaved (background) optimization
O1 phase mean time 1.54 s 3 s
O1 phase max time, T01_max 8.8 s 29.5 s
interrupt delivery mean time with no optimization in progress
54 µs
interrupt delivery max time (with О1 phase in progress)
8.8 s (T01_max) 1.7 ms
Latency improvement
(CPU frequency = 300 MHz; thread time slice = 50000 cycles)
Single-core background optimization (3/3)
Performance degradation
Dual-core background optimization (1/2)
Core 2· Optimizing translation of region
Core 1· Execution· Run-time support· Interpreter and non-opt. translation
Acquire new hot region
Allocate region translation in translation cache
Dual-core background optimization (2/2)
Performance improvement
Future works
Source architecture multiprocessor system emulation
IA-32 ApplicationsIA-32 Applications
IA-32 BIOS,IA-32 BIOS,OS & LibrariesOS & Libraries
Full System Full System BTBT
HWHW
IA-32 ApplicationsIA-32 Applications
IA-32 BIOS,IA-32 BIOS,OS & LibrariesOS & Libraries
Full System Full System BTBT
Core0:Exec
Core1:Opt
Full System Full System BTBT
Core0:Exec
Core1:Opt
Q&A