1
Software Thread Integration Software Thread Integration for Concurrency and ILP infor Concurrency and ILP in
Embedded SystemsEmbedded Systems
Alex Dean [email protected]
Center for Embedded Systems ResearchDepartment of Electrical and Computer Engineering
North Carolina State Universitywww.cesr.ncsu.edu/agdean
2
STI and ASTI Eliminate Context Switches and STI and ASTI Eliminate Context Switches and Interrupts Where They Limit PerformanceInterrupts Where They Limit Performance
• Problem: Embeddedsystems are inherently multithreaded, but most processors are single-threaded
• Solution: Create efficient implicitly multithreaded (integrated) functions
• Use a compiler at design time to create the functions (compile for low-cost concurrency)
• Build the task/function scheduling decisions into scheduler or ISRsBreak down the barrier between task scheduling (scheduler and dispatcher) and instruction scheduling (compiler)
• Two efficiencies improved– Integrated threads more efficient– Integration process automated
• Simplifies hardware to software migration
Primary (real-time)
Thread
Secondary Thread
Idle Time
Integrated ThreadGuest
Schedule (Execution
Time Reqts.)
Idle Time Reclaimed
(Asynchronous) Software Thread
IntegrationHardwareFunction
Integration
foo.s
foo.int.sfoo.id
Data-flow Analysis
Control-flow Analysis
Static Timing Analysis
Integration Analysis
GProf
3
Demo Systems: NTSC Generator & Hot Soft Demo Systems: NTSC Generator & Hot Soft CANCAN
Byte ClockDivider
Pixel Clock Divider
4-bit Shift Register
4-bit Shift Register
64 kByteSRAM
Latch
Shi
ft
Load
MCU Clock
Sync
Clear
ATmega128MCU
NTSC Video Out
115 kbps serial port
H-LineV-Line D-Line
X-Major-Line
11.8
13.5
12.0
4.0
0
5
10
15
Nor
mal
ized
Per
form
ance
(1/
time)
Discrete
Integrated
Swatch Test.MPG
HoneywellHT83C51
Microcontroller
Serial Data Link to
Controller
CAN Bus
HoneywellHT6256
32K x 8 SRAM
Software CAN
Hardware CAN
HoneywellHT83C51
Microcontroller
Serial Data Link to
Controller
CAN Bus
HoneywellHT6256
32K x 8 SRAM
CANController
Additional4 kBytesof code
STIGlitz: Sync. Threads w/STISTIGlitz: Sync. Threads w/STI HSCAN: Async. Threads w/ASTIHSCAN: Async. Threads w/ASTI
4
Other ActivitiesOther Activities• STI for streaming programs on VLIW processors
– Problem: VLIW processors often have many unused issue slots
– Software pipelining doesn’t always work (resource bound < recurrence bound, control flow, calls, register pressure)
– STI can help in many cases– Target System: StreamIt + TI C6x DSP– StreamIt implicitly guarantees data
independence, simplifying analysis– Developing methods to analyze and transform
StreamIt program graph to improve performance
• Energy Efficiency for Low-End Embedded Systems
• Portable Benchmarking for Embedded Systems
Speedup by STI
0
1
2
3
4
5
6
7
8
38 10 61 58 12 62 64 59 60 63 27
Loop ID
Sp
eed
up
nosti/noswp(baseline)sti2/noswpnosti/swpsti2/swp
IPC before and after SWP
0123456789
4 38 13 17 61 56 82 59 3 18 34 49 52 68 31 77 45 7 69 48 39
Loop ID
IPC
noswpswp