micro-architecture is dead, long live micro-architecture

31
JUSTIN RATTNER Senior Fellow & Vice President Chief Technology Officer Intel Corporation "Micro-architecture IS DEAD, LONG LIVE Micro-architecture"

Upload: others

Post on 07-Apr-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

JUSTIN RATTNERSenior Fellow & Vice President Chief Technology OfficerIntel Corporation

"Micro-architecture

IS DEAD, LONG LIVE Micro-architecture"

Page 2: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Slowdown in Single CorePerformance Growth

Performance (vs. VAX-11/780)

1978 1982 1986 1990 1994 1998 2002 2006

1

10

100

1000

10000

25% Per Year

52%Per Year

20% Per Year

Source - Dave Patterson

Actual

Slope

Page 3: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

What Really Matters?

Page 4: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Locally Instant Response

People want access to relevant digital “stuff” instantly

Finding the needle in the haystack is what matters

One device is insufficient

Content must be accessible across and among devices, seamlessly

Martin BJ Darcy

Page 5: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

End-User Meaningful Platform Performance

Rating 3 - 5

Connect, browse, dwnld files

Send files to/from peripherals

SW updates/virus scans

Run multimedia SW

Transfer to/from CD/DVD

Rating 1 or 2 Rating 6 or 7

Importance of

Performance Symptoms

Run productivity SW

Quick to turn “on/off/standby”

File search, scroll results

Sound, image, video quality

Voice/video call quality

6.3

5.9

5.9

5.8

5.5

5.3

5.1

5.2

4.9

3.2

X

Nearly 90% consider

fast Internet usages

very important

Nearly ¾ believe quick

to „turn on/off‟ and

„run productivity SW‟

is critical

Half of respondents consider

multimedia performance

very important

Very few say high quality

phone calls on

a laptop is essential 9%

8%

9%

8%

45%

45%

41%

37%

34%

26%

21%

21%

7%

48%

50%

56%

60%

68%

71%

73%

87%

43%

49%

7%

41%

Source – Intel PaPR Ken Anderson

8%

Page 6: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Probability of Interruptions

At Home

Away

Source - PaPR Ken Anderson, Tye Rattenbury, Dawn Nafus

Per

cent

of S

essi

ons 80% of active sessions

were < 15mins long

0

10

20

30

40

50

60

70

80

90

100%

5 15 20 30 40 50 6010

Cumulative distribution

of average sessions duration

Active session duration (min)

55% of interruptions happenin the first 5 mins

Lots of little hits of IT…

Page 7: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Response Time Matters

Page 8: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

It’s the Hard Disk, Stupid!

Re-Boot/Startup on Home PC Starting Outlook

Elapsed Time 45.700667, s

Disk Busy Time 41.056997, s

Average Data Rate 1.37389, MB/s

Elapsed Time 105.213536, s

Disk Busy Time 91.368480, s

Average Data Rate 6.60669, MB/s

86% BUSY 89% BUSY

Page 9: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Memory Gaps – Old and NewL

AT

EN

CY

BYTE / $

NEWGAP

Remove ormove closer

OLDGAP

250X

25,000X

Remove ormove away

Page 10: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Solid State Drive Performance Variability

Intel SATA SSD

Prototype

Random

4KB Read

Sequential

4KB Read

Random

4KB Write

Sequential

4KB Write

0

1

2

3

4

5

Intel 80GB SSD

S1 32GB SATA SSD

S2 64GB SATA SSD

6

Relative IOPS

1/10thThe Power*

> 10X Performance*

1,000X More Durable*

* Compared to HDDs

Page 11: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Time to Feature Matters

Page 12: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

R&D Pipeline is Very, Very Longfor Mainstream Processors

New Feature

Insertion

YEARS

0 2 3 7 8

EXPLORATION PLANNING DEVELOPMENT PRODUCTION

Page 13: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

CPU + Fixed Logic

Feature Implementation Choices

How do you improve

time to market?

0yr 4yr

Firmware On CPU

Time to Market (log)

Per

form

ance

/

Po

wer

1X

10X

Faster Time to Market

Late Binding of New ISA Features

Quickly Adapting to Changing World

CPU+Reconfigurable

Logic

Page 14: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Reconfigurable Logic (RL) UnitFine-Grain vs. Coarse-Grain

Array of Lookup Tables

(i.e. FPGA)

Array of 8-bit ALUs

(e.g. Matrix )

Greatest Flexibility

Lowest Area Efficiency

Reduced Flexibility

Higher Area Efficiency

5X Area and Freq Loss

A Hybrid PE Design

Most Suitable

RL can provide a 2-4X benefit over SW

Multiple instructions can be mapped to one RL

Page 15: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Reconfigurable Array of PEs

PE Fabric(Interconnect Between PEs)

Regular PEs(Configurable Logic)

PE I/O(Data input and output)

Control PEs(Configurable Logic)

CORERL

Page 16: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Two Specialization Examples(Integer/Text)

String Matching

String Processing

XML processing

Variable Length Decoding

Integer processing

Search Engines

Exhaustive Byte by Byte

comparisons inefficient with

traditional IA ISA*

0 1 0 1 1

Check the first “continue/stop” bit

Concatenate remaining bits

Decode each integer value

Sum all integers together

Byte-wise operation

Shifts based on Bit-based Control

Inefficient with traditional IA ISA

1byteS1

S2

1byt

e

BB QQ DD XX YY ZZ YY ZZ

BB

XX

YY

ZZ

AA

BB

YY

ZZ

üü ûû ûû ûû ûû ûû ûû ûû

üüûû ûûûû ûû ûû ûû ûû

üüûû ûûûû ûû ûû ûûüü

üüûû ûû ûûûû ûû üüûû

ûû ûû ûû ûû ûû ûû ûû

ûû ûû ûû ûû ûû ûû ûû

ûû

üü

üüûû ûûûû ûû ûû ûûüü

üüûû ûû ûûûû ûû üüûû

Page 17: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Reconfigurable HW Comparison

Area

Perf

TTM

Interface

Software

on IA

Fixed Logic

in IA

Reconfigurable

Logic in IA

None <1 sqmm <0.1 sqmm

1X ~2-4X 5x to 10x

None < 6 mos 2-4 years

No change C-ISA Legacy ISA

Cost No changeSilicon Reuse

& Co-ValidationSilicon

Additions

Page 18: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Energy Efficiency Matters

Page 19: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Energy Efficiency: Average Vs. Idle

Average*

0

2

4

6

8

10

12

* Mobile Mark 2005

Wat

ts

CPU

WLAN/LAN

DISPLAY

HDD

Power Supply Loss

GMCH

Idle

CPU

8% 3%

Page 20: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

50% Reduction in Idle Power

Processor Managed

• Tickless, event-driven OS

• Grouped & aligned activity

• HW changes power state

600 ms

• Periodic, polled activity

• Frequent, asynchronous events

• OS changes power state

Platform Managed

TimeTime

Idle

Pla

tfo

rm P

ow

er

Page 21: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Holistic Approach to Energy EfficiencyPlatform Managed Power

Operating Systems and VMMsWell-behaved software

OS/VMM-guided PM policies

ManageabilityScale power management from

a system to the data center

TelemetryEnhanced visibility for

platform-level policies

(e.g. power, temperature)

Power Delivery and CoolingMaximum efficiency under all loads

Interconnects and PeripheralsWell-behaved, power-efficient devices

and interconnects

Core LogicPlatform-level power, performance and

thermal management utilizing a

rich set of fine-grain techniques

Page 22: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Immersiveness Matters

Page 23: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Hints of a RevolutionGoogle Earth iPhone Second Life

Robot CarsNintendo Wii Medical Imaging

Page 24: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Virtual Worlds Compute Requirements

Client Server

Application % CPU

Utilization

% GPU Utilization

(nVidia G80)

2D Websites 20 0-1

Google Maps 60 3-5

Google Earth 50 10-15

Google 3D

Warehouse

50 15-20

Second Life (SL) 70 35-75

Type Software Maximum

Client/Server

MMOGEve Online 34420

WoW 2500

Virtual

Worlds

Second Life 40-60

SL server-side spends 75%+ time in compute-

intensive components

SL requires 10-100x more computation per

client than MMORGS

SL requires 3X more CPU processing and

10-100X more GPU processing than 2D

websites

SL client spends 65%+ of CPU time in

compute-intensive components

SL needs at least 20x GPU processing

compared to 2D

Page 25: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

When Architectures Collide

Fixed Function

Partially

Programmable

Fully

Programmable

Multi-threading Multi-core Many Core

Throughput Performance

Pro

gra

mm

abili

ty

CPU

GPU

CPU• Evolving toward throughput computing

• Motivated by energy-efficient performance

GPU• Evolving toward general-purpose computing

• Motivated by higher quality graphics and

GP-GPU usages

Page 26: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

An Architecture for Immersive ComputingIntel’s Forthcoming Larrabee Processor

Familiar programming model

Support for irregular, shared data

Automatic data movement

Algorithmic vs. instruction efficiency

VECTOR

IA CORE

INTERPROCESSOR NETWORK

INTERPROCESSOR NETWORKFIX

ED

FU

NC

TIO

N L

OG

IC

ME

MO

RY

and

I/O

INT

ER

FAC

ES

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

COHERENT

CACHE

COHERENT

CACHE

COHERENT

CACHECOHERENT

CACHE

COHERENT

CACHE

COHERENT

CACHE

COHERENT

CACHE

COHERENT

CACHE

Page 27: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Let’s Take a Moment to Clarify

LRB is NOT a Ray-Tracing Machine

LRB is also NOT a Raster Machine

LRB does both very well in software

LRB does other things quite well, too*

Significant LRB disclosures at

SIGGRAPH ’08

HOTCHIPS ’08

VLDB ’08

*

VECTOR

IA CORE

INTERPROCESSOR NETWORK

INTERPROCESSOR NETWORKFIX

ED

FU

NC

TIO

N L

OG

IC

ME

MO

RY

and

I/O

INT

ER

FAC

ES

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

VECTOR

IA CORE

COHERENT

CACHE

COHERENT

CACHE

COHERENT

CACHECOHERENT

CACHE

COHERENT

CACHE

COHERENT

CACHE

COHERENT

CACHE

COHERENT

CACHE

PARSEC Benchmark SuiteReleased January 29, 2008

Page 28: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Challenges Going Forward

Page 29: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Optimal Core Size13mm, 100W, 48MB Cache,

4B Transistors, in 22nm

12 CORES 48 CORES144 CORES

Sub-Threshold Logic320mV 56μW 411GOPS/W Ultra-Low Voltage

Motion Estimation Accelerator

Programming LanguagesAll Programs Become Parallel Programs

Memory BandwidthEnabling Large Capacity LLC’s

Page 30: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

What Really Matters?

Response Time

Time to Feature

Platform Energy Efficiency

Immersive Experience

Page 31: Micro-architecture IS DEAD, LONG LIVE Micro-architecture

Xie Xie