micro-architecture is dead, long live micro-architecture
Post on 07-Apr-2022
12 Views
Preview:
TRANSCRIPT
JUSTIN RATTNERSenior Fellow & Vice President Chief Technology OfficerIntel Corporation
"Micro-architecture
IS DEAD, LONG LIVE Micro-architecture"
Slowdown in Single CorePerformance Growth
Performance (vs. VAX-11/780)
1978 1982 1986 1990 1994 1998 2002 2006
1
10
100
1000
10000
25% Per Year
52%Per Year
20% Per Year
Source - Dave Patterson
Actual
Slope
What Really Matters?
Locally Instant Response
People want access to relevant digital “stuff” instantly
Finding the needle in the haystack is what matters
One device is insufficient
Content must be accessible across and among devices, seamlessly
Martin BJ Darcy
End-User Meaningful Platform Performance
Rating 3 - 5
Connect, browse, dwnld files
Send files to/from peripherals
SW updates/virus scans
Run multimedia SW
Transfer to/from CD/DVD
Rating 1 or 2 Rating 6 or 7
Importance of
Performance Symptoms
Run productivity SW
Quick to turn “on/off/standby”
File search, scroll results
Sound, image, video quality
Voice/video call quality
6.3
5.9
5.9
5.8
5.5
5.3
5.1
5.2
4.9
3.2
X
Nearly 90% consider
fast Internet usages
very important
Nearly ¾ believe quick
to „turn on/off‟ and
„run productivity SW‟
is critical
Half of respondents consider
multimedia performance
very important
Very few say high quality
phone calls on
a laptop is essential 9%
8%
9%
8%
45%
45%
41%
37%
34%
26%
21%
21%
7%
48%
50%
56%
60%
68%
71%
73%
87%
43%
49%
7%
41%
Source – Intel PaPR Ken Anderson
8%
Probability of Interruptions
At Home
Away
Source - PaPR Ken Anderson, Tye Rattenbury, Dawn Nafus
Per
cent
of S
essi
ons 80% of active sessions
were < 15mins long
0
10
20
30
40
50
60
70
80
90
100%
5 15 20 30 40 50 6010
Cumulative distribution
of average sessions duration
Active session duration (min)
55% of interruptions happenin the first 5 mins
Lots of little hits of IT…
Response Time Matters
It’s the Hard Disk, Stupid!
Re-Boot/Startup on Home PC Starting Outlook
Elapsed Time 45.700667, s
Disk Busy Time 41.056997, s
Average Data Rate 1.37389, MB/s
Elapsed Time 105.213536, s
Disk Busy Time 91.368480, s
Average Data Rate 6.60669, MB/s
86% BUSY 89% BUSY
Memory Gaps – Old and NewL
AT
EN
CY
BYTE / $
NEWGAP
Remove ormove closer
OLDGAP
250X
25,000X
Remove ormove away
Solid State Drive Performance Variability
Intel SATA SSD
Prototype
Random
4KB Read
Sequential
4KB Read
Random
4KB Write
Sequential
4KB Write
0
1
2
3
4
5
Intel 80GB SSD
S1 32GB SATA SSD
S2 64GB SATA SSD
6
Relative IOPS
1/10thThe Power*
> 10X Performance*
1,000X More Durable*
* Compared to HDDs
Time to Feature Matters
R&D Pipeline is Very, Very Longfor Mainstream Processors
New Feature
Insertion
YEARS
0 2 3 7 8
EXPLORATION PLANNING DEVELOPMENT PRODUCTION
CPU + Fixed Logic
Feature Implementation Choices
How do you improve
time to market?
0yr 4yr
Firmware On CPU
Time to Market (log)
Per
form
ance
/
Po
wer
1X
10X
Faster Time to Market
Late Binding of New ISA Features
Quickly Adapting to Changing World
CPU+Reconfigurable
Logic
Reconfigurable Logic (RL) UnitFine-Grain vs. Coarse-Grain
Array of Lookup Tables
(i.e. FPGA)
Array of 8-bit ALUs
(e.g. Matrix )
Greatest Flexibility
Lowest Area Efficiency
Reduced Flexibility
Higher Area Efficiency
5X Area and Freq Loss
A Hybrid PE Design
Most Suitable
RL can provide a 2-4X benefit over SW
Multiple instructions can be mapped to one RL
Reconfigurable Array of PEs
PE Fabric(Interconnect Between PEs)
Regular PEs(Configurable Logic)
PE I/O(Data input and output)
Control PEs(Configurable Logic)
CORERL
Two Specialization Examples(Integer/Text)
String Matching
String Processing
XML processing
Variable Length Decoding
Integer processing
Search Engines
Exhaustive Byte by Byte
comparisons inefficient with
traditional IA ISA*
0 1 0 1 1
Check the first “continue/stop” bit
Concatenate remaining bits
Decode each integer value
Sum all integers together
Byte-wise operation
Shifts based on Bit-based Control
Inefficient with traditional IA ISA
1byteS1
S2
1byt
e
BB QQ DD XX YY ZZ YY ZZ
BB
XX
YY
ZZ
AA
BB
YY
ZZ
üü ûû ûû ûû ûû ûû ûû ûû
üüûû ûûûû ûû ûû ûû ûû
üüûû ûûûû ûû ûû ûûüü
üüûû ûû ûûûû ûû üüûû
ûû ûû ûû ûû ûû ûû ûû
ûû ûû ûû ûû ûû ûû ûû
ûû
üü
üüûû ûûûû ûû ûû ûûüü
üüûû ûû ûûûû ûû üüûû
Reconfigurable HW Comparison
Area
Perf
TTM
Interface
Software
on IA
Fixed Logic
in IA
Reconfigurable
Logic in IA
None <1 sqmm <0.1 sqmm
1X ~2-4X 5x to 10x
None < 6 mos 2-4 years
No change C-ISA Legacy ISA
Cost No changeSilicon Reuse
& Co-ValidationSilicon
Additions
Energy Efficiency Matters
Energy Efficiency: Average Vs. Idle
Average*
0
2
4
6
8
10
12
* Mobile Mark 2005
Wat
ts
CPU
WLAN/LAN
DISPLAY
HDD
Power Supply Loss
GMCH
Idle
CPU
8% 3%
50% Reduction in Idle Power
Processor Managed
• Tickless, event-driven OS
• Grouped & aligned activity
• HW changes power state
600 ms
• Periodic, polled activity
• Frequent, asynchronous events
• OS changes power state
Platform Managed
TimeTime
Idle
Pla
tfo
rm P
ow
er
Holistic Approach to Energy EfficiencyPlatform Managed Power
Operating Systems and VMMsWell-behaved software
OS/VMM-guided PM policies
ManageabilityScale power management from
a system to the data center
TelemetryEnhanced visibility for
platform-level policies
(e.g. power, temperature)
Power Delivery and CoolingMaximum efficiency under all loads
Interconnects and PeripheralsWell-behaved, power-efficient devices
and interconnects
Core LogicPlatform-level power, performance and
thermal management utilizing a
rich set of fine-grain techniques
Immersiveness Matters
Hints of a RevolutionGoogle Earth iPhone Second Life
Robot CarsNintendo Wii Medical Imaging
Virtual Worlds Compute Requirements
Client Server
Application % CPU
Utilization
% GPU Utilization
(nVidia G80)
2D Websites 20 0-1
Google Maps 60 3-5
Google Earth 50 10-15
Google 3D
Warehouse
50 15-20
Second Life (SL) 70 35-75
Type Software Maximum
Client/Server
MMOGEve Online 34420
WoW 2500
Virtual
Worlds
Second Life 40-60
SL server-side spends 75%+ time in compute-
intensive components
SL requires 10-100x more computation per
client than MMORGS
SL requires 3X more CPU processing and
10-100X more GPU processing than 2D
websites
SL client spends 65%+ of CPU time in
compute-intensive components
SL needs at least 20x GPU processing
compared to 2D
When Architectures Collide
Fixed Function
Partially
Programmable
Fully
Programmable
Multi-threading Multi-core Many Core
Throughput Performance
Pro
gra
mm
abili
ty
CPU
GPU
CPU• Evolving toward throughput computing
• Motivated by energy-efficient performance
GPU• Evolving toward general-purpose computing
• Motivated by higher quality graphics and
GP-GPU usages
An Architecture for Immersive ComputingIntel’s Forthcoming Larrabee Processor
Familiar programming model
Support for irregular, shared data
Automatic data movement
Algorithmic vs. instruction efficiency
VECTOR
IA CORE
INTERPROCESSOR NETWORK
INTERPROCESSOR NETWORKFIX
ED
FU
NC
TIO
N L
OG
IC
ME
MO
RY
and
I/O
INT
ER
FAC
ES
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
COHERENT
CACHE
…
…
…
…
COHERENT
CACHE
COHERENT
CACHECOHERENT
CACHE
COHERENT
CACHE
COHERENT
CACHE
COHERENT
CACHE
COHERENT
CACHE
Let’s Take a Moment to Clarify
LRB is NOT a Ray-Tracing Machine
LRB is also NOT a Raster Machine
LRB does both very well in software
LRB does other things quite well, too*
Significant LRB disclosures at
SIGGRAPH ’08
HOTCHIPS ’08
VLDB ’08
*
VECTOR
IA CORE
INTERPROCESSOR NETWORK
INTERPROCESSOR NETWORKFIX
ED
FU
NC
TIO
N L
OG
IC
ME
MO
RY
and
I/O
INT
ER
FAC
ES
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
VECTOR
IA CORE
COHERENT
CACHE
…
…
…
…
COHERENT
CACHE
COHERENT
CACHECOHERENT
CACHE
COHERENT
CACHE
COHERENT
CACHE
COHERENT
CACHE
COHERENT
CACHE
PARSEC Benchmark SuiteReleased January 29, 2008
Challenges Going Forward
Optimal Core Size13mm, 100W, 48MB Cache,
4B Transistors, in 22nm
12 CORES 48 CORES144 CORES
Sub-Threshold Logic320mV 56μW 411GOPS/W Ultra-Low Voltage
Motion Estimation Accelerator
Programming LanguagesAll Programs Become Parallel Programs
Memory BandwidthEnabling Large Capacity LLC’s
What Really Matters?
Response Time
Time to Feature
Platform Energy Efficiency
Immersive Experience
Xie Xie
top related