stories, not words: abstract datatype instruction...
TRANSCRIPT
Stories, Not Words: Abstract Datatype
Instruction Sets
Martha KimColumbia University
Workshop on New Directions in Computer Architecture
6/5/2011
Sunday, June 5, 2011
The Utilization Wall
• Exponential decrease in percentage of transistors that can be operated at full frequency.
• In 45nm TSMC process, 7% of 300mm die can operate at full frequency
• In 32nm, 3.5%
Moore’s Law (manufacturable transistors)
Power budget (operable transistors)
Goulding et al. Conservation cores: Reducing the energy of mature computations. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 205–218, Pittsburgh, Pennsylvania, March 2010.
2
Sunday, June 5, 2011
Specialization Is a Promising ApproachR. Hameed et al., “Understanding sources of inefficiency in general-purpose chips,” ISCA '10
G. Venkatesh et al., “Conservation cores: reducing the energy of mature computations,” ASPLOS '10
J. Kelm, D. Johnson, W. Tuohy, S. Lumetta, and S. Patel, “Cohesion: a hybrid memory model for accelerators,” ISCA '10
H. Franke et al., “Introduction to the wire-speed processor and architecture,” IBM Journal of Research and Development, vol. 54, no. 1, pp. 3:1–3:11, 2010.
V. Govindaraju, C. Ho, and K. Sankaralingam, “Dynamically Specialized Datapaths for energy efficient computing,” HPCA ’11
M. Lyons, M. Hempstead, G. Wei, and D. Brooks, “The Accelerator Store framework for high-performance, low-power accelerator-based systems,” Computer Architecture Letters, vol. 9, no. 2, pp. 53–56, 2010.
C. Cascaval, S. Chatterjee, H. Franke, K. Gildea, and P. Pattnaik, “A taxonomy of accelerator architectures and their programming models,” IBM Journal of Research and Development, vol. 54, no. 5, p. 5, 2010.
R. Hou et al., “Efficient data streaming with on-chip accelerators: Opportunities and challenges,” HPCA ’11
N. Goulding et al., “GreenDroid: A Mobile Application Processor for Silicon’s Dark Future,” Hotchips ‘10.
Sunday, June 5, 2011
An Ideal Accelerator SystemHigh Performance
Low Energy
Easy to Program
Software Portability
Sunday, June 5, 2011
Accelerator Design Processes
We need a design flow that facilitates usability
Application
Sunday, June 5, 2011
Accelerator Design Processes
We need a design flow that facilitates usability
Application
Microarch.
Sunday, June 5, 2011
Accelerator Design Processes
We need a design flow that facilitates usability
Application
Microarch.
Arch.
Sunday, June 5, 2011
Accelerator Design Processes
We need a design flow that facilitates usability
!Application
Microarch.
Arch.
Application
Microarch.
Arch.
Sunday, June 5, 2011
Accelerator Design Processes
We need a design flow that facilitates usability
!Application
Microarch.
Arch.
Application
Microarch.
Arch.
Application
Arch.
Sunday, June 5, 2011
Accelerator Design Processes
We need a design flow that facilitates usability
!Application
Microarch.
Arch.
Application
Microarch.
Arch.
Application
Arch.
Microarch.
Sunday, June 5, 2011
Extending Software Abstractions to Hardware
Application
Libraries
Machine Code
Micro-ops
Execution core
Caches
Memory
Sunday, June 5, 2011
Extending Software Abstractions to Hardware
Application
Libraries
Machine Code
Micro-ops
Execution core
Caches
Memory
Sunday, June 5, 2011
Extending Software Abstractions to Hardware
Application
Libraries
Machine Code
Micro-ops
Execution core
Caches
Memory
Raise HW/SW interface
Sunday, June 5, 2011
Extending Software Abstractions to Hardware
Application
Libraries
Machine Code
Micro-ops
Execution core
Caches
Memory
Raise HW/SW interface
Extend interfaces from libraries to hardware
Sunday, June 5, 2011
Extending Software Abstractions to Hardware
Application
Libraries
Machine Code
Micro-ops
Execution core
Caches
Memory
Raise HW/SW interface
Extend interfaces from libraries to hardware
Exploit interfaces with specialized hardware
Sunday, June 5, 2011
Abstract Datatype Processing
SW
Arch
UArch
Sunday, June 5, 2011
Abstract Datatype Processing
class HashTable
put(k,v) v get(k)SW
Arch
UArch
Sunday, June 5, 2011
Abstract Datatype Processing
class HashTable
put(k,v) v get(k)
put $h, $k, $v get $h, $k, $v
SW
Arch
UArch
Sunday, June 5, 2011
Hash Table Processor
Abstract Datatype Processing
class HashTable
put(k,v) v get(k)
put $h, $k, $v get $h, $k, $v
SW
Arch
UArch
Sunday, June 5, 2011
Compilation & Execution
Sequence Labeling
SparseVec HashTable
SV HTGP
Dispatch
Sunday, June 5, 2011
The Software Fallback
SVGP
Dispatch
SVGP
Dispatch
Sunday, June 5, 2011
An Ideal Accelerator SystemHigh Performance
Low Energy
Easy Use - align hardware interfaces with those software is already using
Portability - software fallback plan
Sunday, June 5, 2011
Sparse Vector Accelerator
Enforcing Data Encapsulation
set $v,$i,$x
CPU
get $v,$i,$x dot $v1,$v2,$p
Sunday, June 5, 2011
Sparse Vector Accelerator
Enforcing Data Encapsulation
set $v,$i,$x
CPU
get $v,$i,$x dot $v1,$v2,$p
v i x
AI B
Sunday, June 5, 2011
Sparse Vector Accelerator
Enforcing Data Encapsulation
set $v,$i,$x
CPU
get $v,$i,$x dot $v1,$v2,$p
v i x
AI BAI B I A B
Sunday, June 5, 2011
Sparse Vector Accelerator
Enforcing Data Encapsulation
set $v,$i,$x
CPU
get $v,$i,$x dot $v1,$v2,$p
v i x
AI BAI B I A BC D C D
Sunday, June 5, 2011
Specialized Caching for Sparse Vectors
0%
25%
50%
75%
100%
128 256 512 1024 2048
Hit
Rat
e
Storage Capacity (B)
Standard CacheVecStore
Sunday, June 5, 2011
Key Reuse in Hash Tables
0%
25%
50%
75%
100%
0.1 1 10 100 1000 10000 100000
Pct.
Has
h O
pera
tions
Number of Keys
LZW Compress Parser
Sunday, June 5, 2011
Key Reuse in Hash Tables
0%
25%
50%
75%
100%
0.1 1 10 100 1000 10000 100000
Pct.
Has
h O
pera
tions
Number of Keys
LZW Compress Parser
Sunday, June 5, 2011
Key Reuse in Hash Tables
0%
25%
50%
75%
100%
0.1 1 10 100 1000 10000 100000
Pct.
Has
h O
pera
tions
Number of Keys
LZW Compress Parser
386 entry table26% of table 99% of dynamic accesses
Sunday, June 5, 2011
Key Reuse in Hash Tables
0%
25%
50%
75%
100%
0.1 1 10 100 1000 10000 100000
Pct.
Has
h O
pera
tions
Number of Keys
LZW Compress Parser
386 entry table26% of table 99% of dynamic accesses
94K entry table.1% of table 75% of dynamic accesses
Sunday, June 5, 2011
Exploiting Key Reuse
Compress HTX-MParser HTX-M AccessesCompress HTX-M Entrystore AccessesParser HTX-M Entrystore Accesses
Hash Table Accelerator (HTX)
put $h,$k,$v get $h,$k,$v
HTX-M
HTX-C
Sunday, June 5, 2011
Exploiting Key Reuse
0%
25%
50%
75%
100%
1 10 100 1000
Red
uctio
n In
HT
X-M
Acc
esse
s
Cache Capacity
Compress HTX-MParser HTX-M AccessesCompress HTX-M Entrystore AccessesParser HTX-M Entrystore Accesses
Hash Table Accelerator (HTX)
put $h,$k,$v get $h,$k,$v
HTX-M
HTX-C
Sunday, June 5, 2011
SummaryExtend software’s encapsulated datatypes into hardware accelerators
Natural alignment with standard software engineering
Accelerator utility on all applications that use a particular type
A software fallback that ensures portability
Aggressive optimization of computation and data movement
Sunday, June 5, 2011
Research ChallengesWhat are the appropriate types to target? What is the lower bound in complexity? Is there a max number of types a hardware system can support?
How do I implment polymorphism efficiently? (e.g., priority queue with arbitrary types and user-defined sort function)
How do I optimized enforcement of data encapsulation? (copy-on-read is conservative)
Can the execution model support parallel execution?
What is type-specific coherence like? Simpler? Uglier?
What is the appropriate system-level resource allocation between general and specialized? Between different types?
Sunday, June 5, 2011
Thank You
Sunday, June 5, 2011