multi-core processor technology: maximizing cpu performance in a power-constrained world paul teich...

23
Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich @ amd.com

Upload: warren-king

Post on 23-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Multi-Core Processor Technology:Maximizing CPU Performance in aPower-Constrained World

Paul TeichBusiness StrategyCPG Server/Workstation

AMDpaul.teich @ amd.com

Page 2: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

The IssuesThe Issues

Silicon designers can choose a variety of methods to increase processor performance

Commercial end-customers are demandingMore capable systems with more capable processors

That new systems stay within their existing power/thermal infrastructure

Processor frequency and power consumption seem to be scaling in lockstep

How can the industry-standard PC and Server industries stay on our historic performance curve without burning a hole in our motherboards?

This session is not about process technology

Page 3: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Session OutlineSession Outline

Definition: What is a processor?

Core Design

System Architecture

Manufacturing, Power, and Thermals

Multi-Core Processor Architecture

Performance Impacts

Page 4: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

What is a Processor? What is a Processor?

A single chip package that fits in a socket

≥1 core (not much point in <1 core…)Cores can have functional units, cache, etc.associated with them, just as today

Cores can be fast or slow, just as today

Shared resourcesMore cache

Other integration: Northbridge, memory controllers, high-speed serial links, etc.

One system interface no matter how many coresNumber of signal pins doesn’t scale with number of cores

Page 5: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

A Representative Multi-Core ProcessorA Representative Multi-Core Processor

Dual-core AMD Opteron™ processor is 199mm2 in 90nm

Single-core AMD Opteron processor is 193mm2 in 130nm

Page 6: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Multi-Core Processor ArchitectureMulti-Core Processor Architecture

Page 7: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Core DesignCore Design

FrequencyIs only as good as the rest of the core architecture

AGUAGU

Int Decode & Rename

FADD FMISCFMUL

44-entryLoad/Store

Queue

36-entry FP scheduler

FP Decode & Rename

ALU

AGU

ALU

MULT

ALU

Res Res Res

L1Icache64KB

L1Dcache64KB

FetchBranch

Prediction

Instruction Control Unit (72 entries)

Fastpath Microcode EngineScan/Align/Decode

µops

AMD Opteron processor core architecture

Page 8: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Core DesignCore Design

Functional unitsSuperscalar is known territory

Diminishing returns for adding more functional blocks

Alternatives like VLIW have been considered and rejected by the market

Single-threaded architectural performance is pegged

Data pathsIncreasing bandwidth between functional units in a core makes a difference

Such as comprehensive 64-bit design, but then where to?

Page 9: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Core DesignCore Design

PipelineDeeper pipeline buys frequency at expense of increased cache miss penalty and lower instructions per clock

Shallow pipeline gives better instructions per clock at the expense of frequency scaling

Max frequency per core requires deeper pipelines

Industry converging on middle ground…9 to 11 stagesSuccessful RISC CPUs are in the same range

CacheCache size buys performance at expense of die size, it’s a direct hit to manufacturing cost

Deep pipeline cache miss penalties are reduced by larger caches

Not always the best match for shallow pipeline cores, as cache misses penalties are not as steep

Page 10: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

ManufacturingManufacturing

Moore’s Law isn’t dead, more transistors for everyone!

But…it doesn’t really mention scaling transistor power

Chemistry and physics at nano-scaleStretching materials science

Voltage doesn’t scale yet

Transistor leakage current is increasing

As manufacturing economies and frequency increase, power consumption is increasing disproportionately

There are no process or architectural quick-fixes

Page 11: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Transistors Are Not FreeTransistors Are Not Free

The number of transistors in a core determines basic power consumption

Architectural efficiency matters a lot when designing new cores

More functional units means more transistors

Deeper pipelines mean more transistors

Larger caches mean more transistors

Page 12: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Static Current vs. FrequencyStatic Current vs. Frequency

Frequency

Sta

tic

Cu

rren

t

Embedded Embedded PartsParts

Very High LeakageVery High Leakageand Powerand Power

Fast, High Fast, High PowerPower

Fast, Low Fast, Low PowerPower

1.0 1.5

15

0

Non-linear as processors approach max frequency

Page 13: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Power vs. FrequencyPower vs. Frequency

In AMD’s process, for 200MHz frequency steps, two steps back on frequency cuts power consumption by ~40% from maximum frequency

(Gross relative numbers summarized from a mountain of real data)

0.5

1.0

1.5

2.0

n-5 n-4 n-3 n-2 n-1 n

Frequency

Po

wer

C

on

sum

pti

on

Page 14: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Thermal Density DecreasesThermal Density Decreases

Hot spotsTwice as many as in single-core

Farther apart than in single-core

With freq delta, cooler than in single-core

ΘCA same for single-core at n and dual-core at n-2

Larger die spreads heat more evenly in package

Use identical heat sink, slightly better cooling with dual-core

Works for this processor generation and next, ΘCA

changes over major generations

Thermal diode accuracy becomes an issue with dual-core

Page 15: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Total Effect on Dual-Core FrequenciesTotal Effect on Dual-Core Frequencies

Substantially lower power with lower frequency

Thermals easier to handle at any frequency

Result is dual-core running at n-2 in same thermal envelope as single-core running at top speed

Page 16: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Multi-Core Processor ArchitectureMulti-Core Processor Architecture

Why integrate?Most functions are really small compared to the cores and cacheAll integrated logic runs at core frequency regardless of I/O speeds

What to integrate?Northbridge crossbar switch is key

Look for innovation and differentiation in how cores areconnected on-chipMust integrate Northbridge to integrate anything else…

Memory controller to reduce memory latency and further reduce the need for cacheHigh-speed serial links for system I/O

What not to integrate?Most Southbridge functionsGraphics

Page 17: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

AMD Opteron Processor Integrated NorthbridgeAMD Opteron Processor Integrated Northbridge

SystemRequestInterface

(SRI)

AdvancedProgrammable

InterruptController

(APIC)

CPU 0Data

CPU 1Data

CPU 0Probes

CPU 1Probes

CPU 0Requests

CPU 1Requests

CPU 0Int

CPU 1Int

MemoryController

(MCT)

DRAMController

(DCT)

DRAM DataRAS/CAS/Cntl

Crossbar(XBAR)

HyperTransport™ Link 0

64-bit Data

64-bit Command/Address

16-bit Data/Command/Address

HyperTransport Link 2

HyperTransport Link 1

Page 18: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Multi-Core: Where Processor and System CollideMulti-Core: Where Processor and System Collide

Scales performanceDedicated resources for two simultaneous threadsMultiple cores will contend for memory and I/O bandwidth

Northbridge is the bottleneckIntegrating Northbridge eliminates much of bottleneckNorthbridge architecture has significant impact on performance

Cores, cache and Northbridge must be balanced for optimal performance

More aggregate performance for: Multi-threaded apps Transactions: many instances of same app Multi-tasking

Thread scheduling handled by OSBIOS notifies Windows of thread execution resources

Page 19: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Early Benchmark EstimatesEarly Benchmark Estimates

Decoder2P/2C – 2 proc. single-core

4P/4C – 4 proc. single-core

2P/4C – 2 proc. dual-core

4P/8C – 4 proc. dual-core

FrequenciesSingle-core = 2.4GHz

Dual-core = 2.0GHz

Identical system configsMemory, disks, network, etc.

Early dual-core validation system used, different motherboards

SPECint_rate2000 Peak Win SP1

100163

184308

2P/2C2P/4C4P/4C4P/8C

Pla

tfo

rms

% Comparison to Base 2P/2C

SPEC and the benchmark name SPECint are registered trademarks of the Standard Performance Evaluation Corporation. SPEC scores for AMD Opteron Model 270 and 870 based systems are estimated

OLTP Workload

100

148

165

244

2P/2C

2P/4C

4P/4C

4P/8C

Pla

tfo

rms

% Comparison to Base 2P/2C

Page 20: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Call to ActionCall to Action

Most application software doesn’t need to do anything to benefit from dual-coreBe aware that, for a processor within a given power envelope

Fewer cores will clock faster than more coresSingle-threaded performance-sensitive applications

More cores will out-perform fewer cores forMulti-threaded applicationsMulti-tasking response timesTransaction processing

Processor architecture impacts multi-core performance

Process technology is only the anteIntegration enables a balanced high-performance architecture

Page 21: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Community ResourcesCommunity Resources

Windows Hardware & Driver Central (WHDC)www.microsoft.com/whdc/default.mspx

Technical Communitieswww.microsoft.com/communities/products/default.mspx

Non-Microsoft Community Siteswww.microsoft.com/communities/related/default.mspx

Microsoft Public Newsgroupswww.microsoft.com/communities/newsgroups

Technical Chats and Webcastswww.microsoft.com/communities/chats/default.mspx

www.microsoft.com/webcasts

Microsoft Blogswww.microsoft.com/communities/blogs

Page 22: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich

Additional ResourcesAdditional Resources

Email: paul.teich @ amd.com

WinHEC Presentations“x86 Everywhere,” Chris Herring, AMD

“Maximizing Desktop Application Performance onDual-Core PC Platforms,” Rich Brunner, AMD

Web ResourcesAMD http://www.amd.com/

AMD Multi-Core http://www.amd.com/multicore/

AMD Opteron™ Processor http://www.amd.com/opteron/

AMD Multi-Core White Paper http://enterprise.amd.com/downloadables/33211A_Multi-Core_WP.pdf

HyperTransport™ Consortium http://www.hypertransport.org/

Page 23: Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich