introduction to multi-core - rev2 - umass · pdf fileintroduction to multi-core baskaran...

71
Reach To Teach Reach To Teach Intel Higher Education Program & Intel Higher Education Program & Foundation for Advancement of Education and Research (FAER) Foundation for Advancement of Education and Research (FAER) 1 Introduction to Multi-Core Baskaran Ganesan [email protected] Sr. Design Engineer Digital Enterprise Group, Intel Corporation

Upload: truongcong

Post on 30-Jan-2018

231 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 1

Introduction to Multi-Core

Baskaran Ganesan

[email protected]

Sr. Design EngineerDigital Enterprise Group, Intel Corporation

Page 2: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 2

Topics

1.CPU (semiconductor) HISTORY (SESSION-1)

a. Moore’s Law

b. Transistor scaling

c. Scaling limitations & impact

d. What then?

- Dual core

e. The new era

- ARCHITECTURE (SESSION-2)

a. Core Architecture

- Core basics, Platform architecture, Core architecture

b. Multi-core architecture

c. Multi-core challenges

d. Closing notes

Page 3: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 3

Moore’s Law

Page 4: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 4

Moore’s law at work

Compute Power

SW/IT eco-system

Volume Market

CPU Cost

Manufacturing technology

CPU Arch technology

Transistor Size Transistor Count

Page 5: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 5

Historical Driving Forces

0.01

0.1

1

10

1970 1980 1990 2000 2010 2020

0.01

0.1

1

10

1970 1980 1990 2000 2010 2020

FeatureFeature

SizeSize

(um)(um)

Shrinking GeometryShrinking Geometry

20052005MontecitoMontecito

1.7B Transistors1.7B Transistors

197119714004 Processor4004 Processor2300 Transistors2300 Transistors

197819788008 Processor8008 Processor

IBM PCIBM PC

19861986i386 Processori386 Processor

3232--bitbit

19931993Pentium ProcessorPentium Processor3.1M transistors3.1M transistors

1

10

100

1000

10000

100000

1970 1980 1990 2000 2010 2020

1

10

100

1000

10000

100000

1970 1980 1990 2000 2010 2020

Increased FrequencyIncreased Frequency

FrequencyFrequency

(MHz)(MHz)

Page 6: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 6

Scale Factors (loosely defined)

Voltage scale-factor: Rate at which the transistor voltage decreases with respect to a change in transistor dimensions

Frequency scale-factor: Rate at which the transistor frequency increases with respect to a change in transistor dimensions

Cost scale-factor: Rate at which the per-transistor cost decreases with respect to a change in transistor dimensions

Count scale-factor: Rate at which the transistor count increases with respect to a change in transistor dimensions

Page 7: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 7

Scaling: More data

Page 8: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 8

The Act of Balancing

Delivered Performance = Instructions Per Cycle (IPC) * Frequency

Delivered Performance = Delivered Performance =

Instructions Per Cycle (IPC) * FrequencyInstructions Per Cycle (IPC) * Frequency

Power α Cdynamic * V * V * FrequencyPower Power αα CCdynamicdynamic * V * V * Frequency* V * V * Frequency

Goal is higher performance and lower power

Page 9: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 9

Pentium® 4 Processor

August 27, [email protected] GHz core55 Million 0.13µ transistors1249 SPECint2000

386 Processor

May 1986@16 MHz core

275,000 1.5µ transistors~1.2 SPECint2000

17 Years200x

200x/11x1000x

Scaling at its best

Page 10: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 10

Architectural Innovations

• Serial, sequential execution

• Overlapped execution (pipelining)

• Multi-stage, deep pipelining

• Control-speculative execution

• Data-speculative execution

• Super-scalar execution

• Out-of-order execution

• Vector computing

• Addressing extensions

• Application specific instructions

• Multi-level on-chip caching

• Memory disambiguation

• Register renaming

• Score-boarding

• Hardware data prefetching

• …

Many decades of computer architecture Many decades of computer architecture focused onfocused on

InstructionInstruction--Level Parallelism (ILP) enhancementLevel Parallelism (ILP) enhancement

Page 11: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 11

The Challenges

30nm

45nm65nm

90nm

0.13um

0.18um

0.25um

0.35um

0.5um

0.7um

0.1

1

10

1990 1993 1997 2001 2005 2009

~30%

30nm

45nm65nm

90nm

0.13um

0.18um

0.25um

0.35um

0.5um

0.7um

0.1

1

10

1990 1993 1997 2001 2005 2009

~30%SupplySupply

Voltage Voltage

(V)(V)

Diminishing Voltage ScalingDiminishing Voltage Scaling

slowing

Power LimitationsPower Limitations

Power = Capacitance x VoltagePower = Capacitance x Voltage22 x Frequencyx Frequencyalsoalso

Power ~ VoltagePower ~ Voltage33

Page 12: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 12

Heat Dissipation

Po

we

r D

en

sit

yP

ow

er

Den

sit

y

(W/c

m2

)(W

/cm

2)

40044004

80088008

80808080

80858085

80868086

286286386386

486486

PentiumPentium®®

processorsprocessors

11

1010

100100

1,0001,000

10,00010,000

’’7070 ’’8080 ’’9090 ’’0000 ’’1010

Hot PlateHot Plate

Nuclear ReactorNuclear Reactor

Rocket NozzleRocket Nozzle

SunSun’’s Surfaces Surface

ProjectedProjected

Page 13: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 13

Max FrequencyMax Frequency

PowerPower

PerformancePerformance

1.00x1.00x

What then?

Page 14: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 14

OverOver--clockedclocked

(+20%)(+20%)

1.73x1.73x

1.13x1.13x1.00x1.00x

Max FrequencyMax Frequency

PowerPower

PerformancePerformance

Over-clocking

Page 15: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 15

OverOver--clockedclocked

(+20%)(+20%)UnderUnder--clockedclocked

((--20%)20%)

0.51x0.51x

0.87x0.87x1.00x1.00x

1.73x1.73x

1.13x1.13x

Max FrequencyMax Frequency

PowerPower

PerformancePerformance

Under-clocking

Page 16: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 16

OverOver--clockedclocked

(+20%)(+20%)

1.00x1.00x

Relative singleRelative single--core frequency and core frequency and VccVcc

1.73x1.73x

1.13x1.13x

Max FrequencyMax Frequency

PowerPower

PerformancePerformance

DualDual--corecore

((--20%)20%)

1.02x1.02x

1.73x1.73x

DualDual--CoreCore

Multi-CoreEnergy-Efficient Performance

Page 17: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 17

Dual core with voltage scaling

Area = 1Area = 1

Voltage = 1Voltage = 1

Freq = 1Freq = 1

Power = 1Power = 1

PerfPerf = 1= 1

Area = 2Area = 2

Voltage = 0.85Voltage = 0.85

Freq = 0.85Freq = 0.85

Power = 1Power = 1

PerfPerf = ~1.8= ~1.8

10%45%15%

Performance

Reduction

Power

Reduction

Frequency

Reduction

A 15% A 15%

ReductionReduction

In VoltageIn Voltage

YieldsYields

SINGLE CORESINGLE CORE DUAL COREDUAL CORE

RULE OF THUMBRULE OF THUMB

Page 18: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 18

Intel: Dual & Quad Cores

Page 19: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 19

A New Era…

PerformanceEquals Frequency

Unconstrained Power

Voltage Scaling

PerformanceEquals IPC

THE OLDTHE OLD

THE NEWTHE NEW

Multi-Core

MicroarchitectureAdvancements

Power Efficiency

Page 20: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 20

Trade-off equations

- Power is costly; Transistors, relatively cheap

- Frequency alone is not important; Efficiency IS

- Performance-per-watt is critical; per-core performance is not quite

- Computation is relatively easy; Memory accesses are NOT

Page 21: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 21

Q & AQ & A

Page 22: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 22

Topics

1. CPU (semiconductor) HISTORY (SESSION-1)

a. Moore’s Law

b. Transistor scaling

c. Scaling limitations & impact

d. What then?

- Dual core

e. The new era

- ARCHITECTURE (SESSION-2)

a. Core Architecture

- Core basics, Platform architecture, Core architecture

b. Multi-core architecture

c. Multi-core challenges

d. Closing notes

Page 23: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 23

Typical PC Architecture

Page 24: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 24

Processor Resources

- Caches: L0, L1, L2 etc (Different levels of caches)

- General Purpose Registers (For SW programming)

- Segment Registers & TLB (for memory management)

- FP registers, XMM registers

- System Flags

- Control and Data registers, Debug registers, MSRs

- Many more

Page 25: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 25

CMP/SMP/HT

CMP: Chip Multi Processing, refers to multiple physical core engines that have unique resources

Unique: L0/L1 Cache, TLBs, Instruction Pointer, GP Regs

Shared: L2 Cache

SMP: Refers to multiple threads that share all resources (time muxed)

Shared: L0/L1/L2 Caches, TLBs

Unique: Instruction Pointer, GP Regs

Hyper Threading: Refers to multiple threads that share more resources (L0/L1 Cache for example); May/May not be part of a CMP core

SW Threading: Application (SW) level threading of processes on one/more physical core engines

Page 26: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 26

Core Architecture (Prescott)

Page 27: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 27

Core Architecture (Xeon – Dual Core)

Page 28: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 28

Multi-core platform (Freescale: embedded)

Page 29: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 29

Multi-Core platform (RMI-XLR: embedded)

Page 30: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 30

Tilera – 64 core CPU

Page 31: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 31

Tilera – Platform

Page 32: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 32

Tera-scale Computing

Terabytes

TIPS

Gigabytes

MIPS

Megabytes

GIPS

Perform

ance

Dataset SizeKilobytes

KIPS

Mult-Media

3D &Video

Text

RMS Personal Media Personal Media

Creation and Creation and

ManagementManagement

EntertainmentEntertainment

Learning & Learning &

TravelTravel

HealthHealthSingle CoreSingle Core

Multi-coreMulti-core

Tera-scaleTera-scale

IPS = Instruction per second

“RMS” ApplicationsRecognition

MiningSynthesis

Page 33: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 33

Intel Polaris (80-core)

Page 34: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 34

Page 35: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 35

Multi-Core: what next?

Page 36: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 36

Connecting multiple cores

Page 37: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 37

Platform Architecture (multi-core)

External

I/F

Page 38: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 38

Multi-core: Architectural Challenges

- Instruction-level parallelism v/s Thread-level parallelism tradeoffs and balance

- Shared resource management (functional units, caches, tlb, btb)

- Multi-threading v/s Multi-core tradeoffs

- On and Off-chip bandwidth requirements

- Latencies (execution, cache, and memory) reduction

- Memory Coherence/Consistency (for high speed on-die cache hierarchies)

- Multiple domains (and crossing) in clocking, voltage, reset,...

- Partitioning resources (between threads/cores)

- Fault tolerance (at device, storage, execution, core level) (aka reliability)

- On-die interconnect (optimized along latency, bw, modularity, power, ...)

- Integration (of system components, and/or fixed function devices)

Page 39: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 39

Multi-core: Design Challenges

Design Complexity, Productivity Tools / Methods Advance

• …But at slower rate than Moore’s Law

• Replicating cores improves productivity

Visibility for Test & Debug

• Pin Bandwidth/Transistor continues to decline

• Shrinking dimensions, increasing speeds, …

• Increased test time adding to cost

Power

• Power Delivery – di/dt of Amps/nano-second

• Thermals: Overall power and thermal density

Page 40: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 40

Multi-core: Eco-system challenges

Underlying Software assumptions on resource sharing

• Lack of standard mechanisms to share “resource sharing info”between hw and OS

Lack of “Resource sharing” aware SW

• Compilers, Schedulers, Configuration/Management (Power!) etc

Legacy SW architectural requirements left on Multi-Core CPUs

• Compatibility requirements

Many more…unknowns (to CPU Design world)

Page 41: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 41

Algorithms, Programming Languages, Compilers, Operating Systems,Architectures, Libraries, … not ready for 100s of CPUs / chip

Multi-core: Software Challanges

- Scalability of O/S Data Structures and Policies- Synchronization and locking, Scheduling, Process management,

Data structure sizing and management limitations, Threading granularity and primitives

- Memory Hierarchy Awareness

- Impact of coherency policy, Efficiency of Data-sharing and Process migration effects, SW visibility to High speed on-die interconnect, SW control of Cache hierarchy, NUCA Awareness

- High Bandwidth I/O Support- Light weight Interrupts, Data movement and transformation

engines, I/O Affinity

Page 42: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 42

More than the cores

Page 43: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 43

Closing notes

• Single and Multi-core architectures presented

• Multi-Core CPU is the next generation CPU Architecture

– 2Core and Intel Quad-Core designs plenty on market already

– Many More are on their way

• Several old paradigms ineffective; Several new problems to be addressed

• Chip Level Multiprocessing and large caches can exploit Moore’s Law

• Thread/Core count in future microprocessor systems to increase

• Eco-system immature/non-existent

• Numerous domains in arch/design awaiting research & innovation and here is where you come in!!!

MultiMulti--Core Architecture and DesignCore Architecture and Designready forready for

research, development and innovation!research, development and innovation!

Page 44: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 44

Acknowledgements

Gautam Doshi [Principal Engineer, Digital Enterprise Group]

Ajay Bhatt [Intel Fellow, Digital Enterprise Group]

Dileep Bhandarkar [Architect, Digital Enterprise Group]

Sunit Tyagi [Sr. Principal Engineer, Digital Enterprise Group]

… and countless foil-wares

Page 45: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 45

Resources

Intel Tech/Research: http://www.intel.com/technology/index.htmEnergy Efficient Performance: http://www.intel.com/technology/eep/index.htmIntel Core Microarchitecture: http://www.intel.com/technology/architecture/coremicro/Dual-core processor: http://www.intel.com/technology/computing/dual-core/index.htmMulti/Many Core: http://www.intel.com/multi-core/index.htmIntel Platforms: http://www.intel.com/platforms/index.htmThreading: http://www3.intel.com/cd/ids/developer/asmo-na/eng/dc/threading/index.htm

Page 46: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 46

Q & AQ & A

Page 47: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 47

Backup: Core Backup: Core

uArchuArch

Page 48: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 48

Intel® Core™ Microarchitecture

*Graphics not representative of actual die photo or relative siz*Graphics not representative of actual die photo or relative sizee

ScalableScalableLow PowerLow Power High PerformanceHigh Performance

(Core2 Duo) (Core2 Duo) MeromMerom

(Core2 Duo) Conroe(Core2 Duo) Conroe

(Xeon) (Xeon)

WoodcrestWoodcrest

65nm65nm

Server Server

OptimizedOptimized

Desktop Desktop

OptimizedOptimized

Mobile Mobile

OptimizedOptimized

IntelIntel®® Wide Wide

Dynamic Dynamic

ExecutionExecution

IntelIntel®®

Intelligent Intelligent

Power Power

CapabilityCapability

IntelIntel®®

Advanced Advanced

Smart CacheSmart Cache

IntelIntel®® Smart Smart

Memory Memory

AccessAccess

IntelIntel®®

Advanced Advanced

Digital Media Digital Media

BoostBoost

Page 49: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 49

Intel® Intelligent Power Capability

UltraUltraFineFine

GrainedGrained

CoarseCoarseGrainedGrained

•• AggressiveAggressive

Clock GatingClock Gating

•• EnhancedEnhanced

SpeedSpeed--StepStep

•• Low VCC ArraysLow VCC Arrays

•• Blocks ControlledBlocks Controlled

Via SleepVia Sleep

TransistorsTransistors

•• Low LeakageLow Leakage

TransistorsTransistors

•• SleepSleep

TransistorsTransistors

TransistorTransistor

•• 65nm65nm

•• Strained SiliconStrained Silicon

•• LowLow--K DielectricK Dielectric

•• More Metal LayersMore Metal Layers

ProcessProcess

ADVANTAGEADVANTAGE•• MobileMobile--Level Power ManagementLevel Power Management•• Energy Efficient PerformanceEnergy Efficient Performance

*Graphics not representative of actual die photo or relative siz*Graphics not representative of actual die photo or relative sizee

Energy Energy

Page 50: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 50

Intel® Wide Dynamic Execution

INSTRUCTION FETCHINSTRUCTION FETCH

AND PREAND PRE--DECODEDECODE

INSTRUCTION QUEUEINSTRUCTION QUEUE

RETIREMENT UNITRETIREMENT UNIT

(REORDER BUFFER)(REORDER BUFFER)

DECODEDECODE

RENAME / ALLOCRENAME / ALLOC

SCHEDULERSSCHEDULERS

EXECUTEEXECUTE

INSTRUCTION FETCHINSTRUCTION FETCH

AND PREAND PRE--DECODEDECODE

INSTRUCTION QUEUEINSTRUCTION QUEUE

RETIREMENT UNITRETIREMENT UNIT

(REORDER BUFFER)(REORDER BUFFER)

DECODEDECODE

RENAME / ALLOCRENAME / ALLOC

SCHEDULERSSCHEDULERS

EXECUTEEXECUTE

CORE 1CORE 1 CORE 2CORE 2

4 WIDE 4 WIDE --

DECODE TODECODE TO

EXECUTEEXECUTE

4 WIDE 4 WIDE --

MICROMICRO--OPOP

EXECUTEEXECUTE

MICROMICRO

andand

MACROMACRO

FUSIONFUSION

DEEPERDEEPER

BUFFERSBUFFERS

EFFICIENTEFFICIENT

14 STAGE14 STAGE

PIPELINEPIPELINE

ENHANCEDENHANCED

ALUsALUs

EACH COREEACH CORE

ADVANTAGEADVANTAGE•• 33% Wider Execution over Previous Gen33% Wider Execution over Previous Gen•• Comprehensive AdvancementsComprehensive Advancements•• Enabled In Each CoreEnabled In Each CoreEnergy Energy

Perf Perf

Page 51: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 51

Intel® Wide Dynamic ExecutionMicro and Macro Fusion

DECODEDECODE

EXECUTEEXECUTE

uCODEuCODE

ROMROM

ADVANTAGEADVANTAGE•• Instruction Load Reduced ~ 15%Instruction Load Reduced ~ 15%****

•• MicroMicro--Ops Reduced ~ 10%Ops Reduced ~ 10%****

WITHOUT MACRO FUSIONWITHOUT MACRO FUSIONWITH MACRO FUSIONWITH MACRO FUSION

INSTRUCTION 3INSTRUCTION 3

INSTRUCTION 2INSTRUCTION 2

INSTRUCTION 1INSTRUCTION 1

INTERNAL INST 1INTERNAL INST 1

COMPLETED INST 3COMPLETED INST 3

COMPLETED INST 2COMPLETED INST 2

COMPLETED INST 1COMPLETED INST 1

DECODEDECODE

EXECUTEEXECUTE

INSTRUCTION 3INSTRUCTION 3

INSTRUCTION 2INSTRUCTION 2

INSTRUCTION 1INSTRUCTION 1

INTERNAL INST 3INTERNAL INST 3

INTERNAL INST 2INTERNAL INST 2

INTERNAL INST 1INTERNAL INST 1

COMPLETED INST 3COMPLETED INST 3

COMPLETED INST 2COMPLETED INST 2

COMPLETED INST 1COMPLETED INST 1

DECODEDECODE

EXECUTEEXECUTE

COMBINED INST 2 & 3COMBINED INST 2 & 3

MACRO FUSION EXAMPLEMACRO FUSION EXAMPLE

CMP+JMP IN 1 CLOCKCMP+JMP IN 1 CLOCKMicroMicroFusionFusion

MacroMacroFusionFusion

*Graphics not representative of actual die photo or relative size

** Workload dependant

Energy Energy

Perf Perf

Page 52: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 52

Intel® Advanced Smart CacheDynamic L2 Cache Usage

ADVANTAGEADVANTAGE•• Higher Cache Hit RateHigher Cache Hit Rate•• Reduced BUS TrafficReduced BUS Traffic•• Lower Latency to DataLower Latency to Data

CoreCore™™ MicroarchitectureMicroarchitecture

Shared L2Shared L2Independent L2Independent L2

Dynamically,Dynamically,

BiBi--DirectionallyDirectionally

AvailableAvailable

CORE 1CORE 1 CORE 2CORE 2

L1L1

CACHECACHEL1L1

CACHECACHE

CORE 1CORE 1 CORE 2CORE 2

L1L1

CACHECACHE

NotNot

ShareableShareable

*Graphics not representative of actual die photo or relative siz*Graphics not representative of actual die photo or relative sizee

L1L1

CACHECACHE

x

Energy Energy

Perf Perf

DecreasedDecreased

TrafficTrafficIncreasedIncreased

TrafficTraffic

Page 53: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 53

HARDWAREHARDWARE

Mem. Dis.Mem. Dis.

PredictorPredictor

Intel® Smart Memory AccessHardware-based Memory Disambiguation

INST 2 INST 2 ““LOAD [Y]LOAD [Y]””

INST 1 INST 1 ““STORE [X]STORE [X]””

INST 2 INST 2 ““LOAD [Y]LOAD [Y]””

INST 1 INST 1 ““STORE [X]STORE [X]””

INST 1 INST 1 ““STORE [X]STORE [X]””

DECODE/SCHEDULEDECODE/SCHEDULE

INST 2 INST 2 ““LOAD [Y]LOAD [Y]””

OUTOUT

OFOF

ORDERORDER

ININ

ORDERORDER

EXECUTEEXECUTE

INST 2 INST 2 ““LOAD [Y]LOAD [Y]””

INST 1 INST 1 ““STORE [X]STORE [X]””

INST 2 INST 2 ““LOAD [Y]LOAD [Y]””

INST 1 INST 1 ““STORE [X]STORE [X]””

INST 1 INST 1 ““STORE [X]STORE [X]””

DECODE/SCHEDULEDECODE/SCHEDULE

INST 2 INST 2 ““LOAD [Y]LOAD [Y]””

EXECUTEEXECUTE STALLSTALL

ADVANTAGEADVANTAGE•• Higher Utilization of PipelineHigher Utilization of Pipeline•• Masks latency to data accessMasks latency to data access•• Higher PerformanceHigher Performance

CoreCore™™ MicroarchitectureMicroarchitecture OtherOther

Inst. 2 Inst. 2 ““LoadLoad””

Can OccurCan Occur

BeforeBefore

Inst. 1 Inst. 1 ““StoreStore””

Energy Energy

Perf Perf

Inst. 2 MustInst. 2 Must

Wait ForWait For

Inst. 1 Inst. 1 ““StoreStore””

To CompleteTo Complete

Page 54: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 54

Intel® Advanced Digital Media BoostSingle Cycle SSE

CoreCore™™ µµµµµµµµarch arch

PreviousPrevious

DECODEDECODE

X4X4

Y4Y4

X4opY4X4opY4

SOURCESOURCE

X1opY1X1opY1

DECODEDECODE

In Each Core In Each Core

X3X3

Y3Y3

X3opY3X3opY3

X2X2

Y2Y2

X2opY2X2opY2

X1X1

Y1Y1

X1opY1X1opY1

DESTDEST

SSE/2/3 OPSSE/2/3 OP

X2opY2X2opY2

X3opY3X3opY3X4opY4X4opY4

CLOCKCLOCK

CYCLE 1CYCLE 1

CLOCKCLOCK

CYCLE 2CYCLE 2

00127127

CLOCKCLOCK

CYCLE 1CYCLE 1

SSE OperationSSE Operation(SSE/SSE2/SSE3)(SSE/SSE2/SSE3)

ADVANTAGEADVANTAGE•• Increased PerformanceIncreased Performance•• 128 bit Single Cycle in each core128 bit Single Cycle in each core•• Improved Energy EfficiencyImproved Energy Efficiency

EXECUTEEXECUTEEXECUTEEXECUTE

FusionFusion

SupportSupport

SingleSingle

CycleCycle

SSESSE

*Graphics not representative of actual die photo or relative siz*Graphics not representative of actual die photo or relative sizee

Energy Energy

Perf Perf

Page 55: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 55

Backup: Next Gen Backup: Next Gen

TechnologiesTechnologies

Page 56: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 56

Traditional Operating Systems (Time-mux)

Page 57: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 57

Physical Host Hardware

What is Virtualization?

GFX

MemoryProcessors

Keyboard / Mouse

Graphics

StorageNetwork

Operating System

...App App App

Without VMs: Single OS owns all hardware resources

VM1VM0

Guest OS0

App AppApp ...

...Guest OS1

App ...

VM Monitor (VMM)

Physical Host Hardware

With VMs: Multiple OSes share hardware resources

A newA new

layer oflayer of

software...software...

AppApp

Virtualization enables multiple operating systems to run on the same platform

Page 58: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 58

Types of Virtualization

Hosted VMM

• launched from within an OS, e.g., VMplayer, WSX, GSX, Virtual PC, Virtual Server

– Cheap but lower performance

Hypervisor: A bootable layer on Bios

• Thick: embeds all the drivers, e.g., ESX

• Thin: has a service VM, e.g., Xen derivates

Virtual Appliances: dedicated Virtual machines, e.g., MojoPC

Page 59: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 59

Intel® Virtualization Technology (VT)

Intel® VT

First to market with native virtualization support

Broadest HW and SW ecosystem support

CoreTM Microarchitecture based systems

� Significant increase in performance and improved VT performance overall segments

� Mobile - Intel® Core™2 Duo Mobile Processor for Intel® Centrino®Duo Mobile Technology

� Desktop - Intel® Core™2 Duo Desktop Processor E6000 sequence -

� Server Dual and Quad Core Intel® Xeon® Processor 5000 series

Get More Done On Every ServerGet More Capabilities On Client

Processors with Intel® Virtualization Technology

Virtual Machine Monitor

..…OSOS

AppApp

OSOS

AppApp

OSOS

AppApp

OSOS

AppApp

and others …

11stst VT base SW VT base SW

SolutionsSolutions

Page 60: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 60

Trusted Execution Technology

Page 61: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 61

LT Hardware Ingredients

RAM

ICHICH

USBUSB

Intel Intel

CPUCPUIntel CPU

Intel(G)MCH

LPCLPC

TPM

CPU Extensions� Enables domain separation

� Sets policy for protected memory

LT = CPU + Chipset + TPM + Protected I/O

Protected Graphics� Trusted channel

between graphics and trusted SW

� Integrated or third party discrete graphics

Protected Keyboard & Mouse� Trusted channel between

keyboard/mouse and trusted SW

Protected Memory Mgmt� Enforces access policy to

protected memory

Trusted Platform Module v1.2� Protects keys, digital certificates

& attestation credentials

� Provides platform authentication

= LT-specific enhancement

Page 62: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 62

Backup: MiscBackup: Misc

Page 63: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 63

Moore’s Law Moving Forward

|---------------------ACTUAL---------------------------|--FORECAST-|

Production 1995 1997 1999 20012001 20032003 20052005 20072007 20092009 20112011

Generation 0.35 0.25 0.18 130130nmnm 9090nmnm 6565nmnm 4545nmnm 3535nmnm 2222nmnm

Gate Length 0.35 0.20 0.13 <70nm <<5050nmnm <<3535nmnm <<3535nmnm <<3535nmnm <<2222nmnm

Wafer Size (Wafer Size (mmmm)) 200200 200200 200200 300300 300300 300300 300300 300300? ? 300?300?

Integration CapacityIntegration Capacity <100M<100M 100M100M 200M200M 500M500M 1B1B >>1B1B >2B>2B >4B>4B >8B>8B

““Another decade is probably straightAnother decade is probably straight--forward forward ……There is certainly no end to creativity.There is certainly no end to creativity.””

-- Gordon Moore, speaking of extending MooreGordon Moore, speaking of extending Moore’’s Law at ISSCC, Feb 2003s Law at ISSCC, Feb 2003

Page 64: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 64

Multi-Core Power Efficiency

C1C1

C4C4

C2C2

C3C3

SmallSmall

corecore

Big coreBig core

CacheCache

CacheCache

11

22

33

44

11

22

11 11

11

22

33

44

11

22

33

44

PowerPower

PerformancePerformance

Power = Power = ¼¼

Performance = 1/2Performance = 1/2

Many core is more Many core is more

power efficientpower efficient

Power ~ areaPower ~ area

Single thread Single thread

performance ~ area**.5performance ~ area**.5

Page 65: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 65

Multi-Core and Memory Gap

Growing Performance GapGrowing Performance Gap

0

100

200

300

400

500

600

700

Pentium

66MHz

Pentium-Pro

200MHz

PentiumIII

1100MHz

Pentium4 2

GHz

19921992 19941994 19961996 19981998 20002000 20022002

LOGICLOGIC

MEMORYMEMORY

GA

PG

AP

Peak InstructionsPeak Instructions

Per DRAM AccessPer DRAM Access

Reduce DRAM access with large cachesExtra benefit: power savings. Cache is lower power than logic

Tolerate memory latency with multiple threadsMultiple coresHyper-threading

Page 66: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 66

Multi-threading tolerates memory latency

Ai

Ai Idle Ai+1

Ai+1Idle

Bi Idle Bi+1

Bi Bi+1

Serial Execution

Multi-threaded Execution

Execute thread B while thread A waits for memoryExecute thread B while thread A waits for memory

Multi-core has a similar effect

Page 67: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 67

Multi-core tolerates memory latency

Ai

Ai

Idle

Ai+1

Ai+1

Bi Idle

Bi+1

Serial Execution

Multi-core Execution

Execute thread A and B simultaneouslyExecute thread A and B simultaneously

Idle

Bi Idle

Bi+1

Page 68: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 68

How does Multicore Change Parallel Programming?

No change in fundamental programming model

Synchronization and communication costs greatly reduced

• Makes it practical to parallelize more programs

Resources now shared

• Caches

• Memory interface

• Optimization choices may be different

P1

cache

P2 P3 P4

cache cache cache

Memory

SMP

C1

cache

Memory

C2 C3 C4

cache cache cache

CMP

Page 69: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 69

Art of the Possible

Billion transistors realized in 65nm Si process

Multi-Billion transistors possible in future Si process

Large die sizes can be built

– 400 to 600 square millimeters

What can fit on a single die?

– For 65nm (rough est)

• 30 mm2 per proc.

• 15 mm2 per MB

72060054032 MB cache

48036030016 MB cache

8 cores

4 cores

2 cores

Die size (core + cache only) in mm2

Page 70: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 70

Quad Cores – here a quarter ago already!

Page 71: Introduction to Multi-Core - Rev2 - UMass · PDF fileIntroduction to Multi-Core Baskaran Ganesan ... (optimized along latency, bw, modularity, power, ...) ... • Multi-Core CPU is

Reach To TeachReach To TeachIntel Higher Education Program &Intel Higher Education Program &

Foundation for Advancement of Education and Research (FAER)Foundation for Advancement of Education and Research (FAER) 71

Multi-Core