intel technologies update - ieee · intel confidential 3 moore’s law continues 1,000 10,000...
TRANSCRIPT
1
Intel TechnologiesIntel Technologies
March 22, 2004March 22, 2004
Intel ConfidentialIntel Confidential
2
OutlineOutline
Manufacturing ProcessManufacturing Process Processor ArchitecturesProcessor Architectures Processor RoadmapsProcessor Roadmaps
Intel ConfidentialIntel Confidential
3
Moore’s Law ContinuesMoore’s Law Continues
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
10,000,000,000
1970 1980 1990 2000 2010
4004 8080
8086
8008
Pentium® Processor
486™ DX Processor386™ Processor
286
Pentium® II Processor
Pentium® III Processor
Itanium® Processor
Goal: Over 1 billion transistors by 2005
Pentium® 4 Processor
Itanium® 2 Processor
Intel ConfidentialIntel Confidential
4
By End of the Decade …By End of the Decade …
“…“…30 gigahertz devices, 10 nanometer or less delivering 30 gigahertz devices, 10 nanometer or less delivering a tera instruction of performance by 2010a tera instruction of performance by 2010”(1)”(1)
1) Pat Gelsinger, Intel CTO, Spring 2002 IDF
Itanium® procItanium® procPentiumPentium®® Pro Pro
PentiumPentium®® proc proc486486
100100
1,0001,000
10,00010,000
100,000100,000
’’0000
FrequencyFrequency(MHz)(MHz)
386386286286
11
1010
’’7070 ’’9090
8086808680858085
808080808008800840044004
’’80800.10.1
30GHz30GHz14GHz14GHz
6.5GHz6.5GHz3 GHz3 GHz
’’1010
~30 GHz~30 GHz
400440048008800880808080
80858085 80868086286286
386386486486Pentium procPentium proc
PentiumPentium®® Pro Pro
ItaniumItanium®®
1.8B1.8B
0.0010.001
0.010.01
0.10.1
11
1010
100100
1,0001,000
10,00010,000
’’7070 ’’8080 ’’9090 ’’0000 ’’1010
Tra
nsis
tors
Tra
nsis
tors
(MT)
(MT)
~2B Transistors~2B Transistors
Intel ConfidentialIntel Confidential
5
Process Process 130nm 130nmWafer Wafer 200mm 200mm
Silicon Design and Manufacturing | Volume ManufacturingSilicon Design and Manufacturing | Volume Manufacturing
World Class ManufacturingWorld Class Manufacturing
ProcessProcess 90nm 90nmWaferWafer 300mm 300mm
Capacity and Resource Benefits:Capacity and Resource Benefits:–240% more die/wafer240% more die/wafer–40% less energy and water/die40% less energy and water/die
Intel ConfidentialIntel Confidential
6
Pentium®
Processor
Pentium® ProProcessor
Pentium® IIProcessor
Pentium® IIIProcessor
Pentium® 4Processor
0.8µ 0.6µ 0.35µ 0.25µ 0.18µ 0.13µ
Intel’s Process Technology Intel’s Process Technology
Source: Intel
Process technology Process technology advances mean:advances mean:•Higher transistor densityHigher transistor density•Higher performanceHigher performance•Less power per transistorLess power per transistor•Lower cost per transistorLower cost per transistor
Intel ConfidentialIntel Confidential
7
Intel in Production With Intel in Production With Nanotechnology (< 100nm) Nanotechnology (< 100nm)
1000010000
10001000
100100
1010
1010
11
0.10.1
0.010.01
MicronMicron Nano-Nano-metermeter
1970 1980 1990 2000 2010 2020
Nominal feature sizeNominal feature size
NanotechnologyNanotechnology
130nm130nm90nm90nm
70nm70nm50nm50nm
Gate LengthGate Length
Intel has been in production with Intel has been in production with nanotechnology since 2000nanotechnology since 2000
Intel ConfidentialIntel Confidential
8
50nm50nm
Transistor for Transistor for 90nm Process90nm Process
Source: IntelSource: Intel
Influenza virusInfluenza virusSource: CDCSource: CDC
100nm100nm
Silicon Devices Shrink to Virus SizeSilicon Devices Shrink to Virus Size
Intel is shrinking transistors to increase performance Intel is shrinking transistors to increase performance and density, and to reduce power and density, and to reduce power
Intel ConfidentialIntel Confidential
9
December 2002December 2002Intel presents technical papers on Intel presents technical papers on 90nm processes for computing & 90nm processes for computing & communicationscommunications
August 2002August 2002Intel announces first use of strained Intel announces first use of strained silicon on 90nm processsilicon on 90nm process
March 2002March 2002Intel discloses first working 52Mb Intel discloses first working 52Mb SRAM on 90nm processSRAM on 90nm process
Ahead to 90nm: Ramping in Q4/2003Ahead to 90nm: Ramping in Q4/2003
52Mb SRAM52Mb SRAMwith 1µmwith 1µm 2 2 CellCell
StrainedStrainedSilicon forSilicon forFastestFastestTransistorsTransistors
September 2003September 2003Intel discloses details on Intel discloses details on implementation of strained siliconimplementation of strained silicon
Intel ConfidentialIntel Confidential
10
Strained Silicon Transistors are FasterStrained Silicon Transistors are Faster
Normal Silicon Lattice Strained Silicon Normal Silicon Lattice Strained Silicon LatticeLattice
Current FlowCurrent Flow
Normal Normal electron electron
flowflow
Faster Faster electron electron
flowflow
Intel ConfidentialIntel Confidential
11
30nm Prototype (IEDM2000)
20nm Prototype(VLSI2001)
25 nm
15nm
15nm Prototype15nm Prototype(IEDM2001)(IEDM2001)
50nm Length(IEDM2002)
65nm Node 2005
45nm Node 2007
90nm Node 2003
32nm Node 2009
22nm Node 2011
10nm Prototype10nm Prototype(ITJ 2002)(ITJ 2002)
Intel’s Transistor Research inIntel’s Transistor Research inDeep Nanotechnology Space Deep Nanotechnology Space
Experimental transistors for Experimental transistors for future process generationsfuture process generations
Intel ConfidentialIntel Confidential
12
Technologies Supporting Moore’s LawTechnologies Supporting Moore’s Law
Strained Strained SiSi
Strained Strained SiSi
Strained Strained SiSi
Strained Strained SiSi
Strained Strained SiSiSiSiSiSiSiSiChannelChannel
P1270P1270P1268P1268P1266P1266P1264P1264P1262P1262Px60Px60P858P858P856P856Process NameProcess Name
300300300300300300300300300300200/300200/300200200200200Wafer Size Wafer Size (mm)(mm)
MetalMetalMetalMetalMetalMetalPoly-Poly-siliconsilicon
Poly-Poly-siliconsilicon
Poly-Poly-siliconsilicon
Poly-Poly-siliconsilicon
Poly-Poly-siliconsiliconGate electrodeGate electrode
High-kHigh-kHigh-kHigh-kHigh-kHigh-kSiOSiO22SiOSiO22SiOSiO22SiOSiO22SiOSiO22Gate dielectricGate dielectric
??CuCuCuCuCuCuCuCuCuCuAlAlAlAlInter-connectInter-connect
22 nm22 nm32 nm32 nm45 nm45 nm65 nm65 nm90 nm90 nm0.130.13mm0.180.18mm0.250.25mmProcess Process GenerationGeneration
20112011200920092007200720052005200320032001200119991999199719971st 1st ProductionProduction
Introduction targeted at this timeIntroduction targeted at this time Subject to change Subject to change
Intel found a solution for High-k and metal gateIntel found a solution for High-k and metal gate
Intel ConfidentialIntel Confidential
13
OutlineOutline
Manufacturing ProcessManufacturing Process Processor ArchitecturesProcessor Architectures Processor RoadmapsProcessor Roadmaps
Intel ConfidentialIntel Confidential
14
ItaniumItanium®® 2 Processor Block Diagram 2 Processor Block Diagram
L1 Instruction Cache andFetch/Pre-fetch Engine
128 Integer Registers 128 FP Registers
L2 C
ache
– Q
uad
Port
Quad-PortL1
DataCache
andDTLB
BranchUnits
Branch & PredicateRegisters
Scor
eboa
rd, P
redi
cate
,NaT
s, E
xcep
tions
ALA
T
ITLB
B B B M M M M F F
IA-32Decode
andControl
Instruction Queue
FloatingPointUnits
8 bundles
Register Stack Engine / Re-Mapping
11 Issue Ports
L3 C
ache
Bus ControllerECC
ECC
Integerand
MM Units
I I
BranchPrediction
ECC
ECC
ECC
ECC
ECC
Processor StructureProcessor Structure
Intel ConfidentialIntel Confidential
15
Massive Register SetMassive Register Set128 Integer Registers128 Integer Registers
96 Framed, Rotating96 Framed, Rotating
GR1GR1
GR31GR31
GR127GR127
GR32GR32
GR0GR0NaTNaT
32 Static32 Static
006363 00
64 Predicate64 PredicateRegistersRegisters
11PR1PR1
PR63PR63
PR0PR0
PR15PR15PR16PR16
48 Rotating48 Rotating
16 Static16 Static
Large number of registers enables flexibility and performance
128 FP Registers128 FP Registers8181
96 Rotating96 Rotating
FR1FR1
FR31FR31
FR127FR127
FR32FR32
FR0FR0
32 Static32 Static
+ 0.0+ 0.000
+ 1.0+ 1.0
8 Branch Registers8 Branch Registersbit 0bit 0
BR7BR7
BR0BR06363
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries
Intel ConfidentialIntel Confidential
16
ItaniumItanium®® 2 Processor Memory System 2 Processor Memory SystemItanium ® 2 Processor
L2256KB8-way128B lines5-7 CLKSBanked
32 GB/s
L33MB12-way128B lines12-15 CLKS
ExternalMemory
6.4 GB/s
32 GB/s
32 GB/s
L1D16KB64B lines1 CLK
L1I16KB64B lines1 CLK
128 FP Registers
128 General Registers
Core Pipeline (functional units)
Intel, Itanium and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries
Intel ConfidentialIntel Confidential
17
Itanium® Architecture AdvancementsItanium® Architecture Advancements
thenthen
elseelsebranch
compare
branch
instr 1instr 1instr 2instr 2Br Br / / StoreStore
ld r1=ld r1=use =r1use =r1
BarrierBarrier
Itanium® ArchitectureItanium® ArchitectureTraditional Traditional
cmp p1,p2p2
p2
p1
p1
instr 1instr 1
Chk.s / Chk.s / ld.c r1ld.c r1 = =use =r1use =r1
instr 2instr 2brbr
ld.sld.s / ld.a r1 / ld.a r1
PredicationPredication– Increases performance Increases performance
by reducing penalty due by reducing penalty due to mispredicted to mispredicted branchesbranches
serial serial execution of execution of
loopsloops
Whole loop Whole loop computation computation
in parallelin parallel
Control / Data Control / Data SpeculationSpeculation
– Attacks memory latency Attacks memory latency bottleneck and exposes bottleneck and exposes more instruction level more instruction level parallelismparallelism
Software PipeliningSoftware Pipelining– Enables concurrent Enables concurrent
execution of several execution of several iterations of loopiterations of loop
Overcoming today’s bottlenecks with innovative architectural Overcoming today’s bottlenecks with innovative architectural features features
1
2
3
XNoNo
BarrierBarrier
Intel ConfidentialIntel Confidential
18
Itanium® Architecture: Explicit ParallelismItanium® Architecture: Explicit Parallelism
compilercompiler Implicitly Implicitly parallelparallel
HardwareHardwareOriginalOriginal SourceSource
CodeCode
......
......
Execution Units unused – Execution Units unused – reduced efficiencyreduced efficiency
Sequential Sequential Machine CodeMachine Code
OriginalOriginal SourceSource
CodeCodeItanium-Itanium-basedbased
compilercompiler
......
......
Multiple execution Multiple execution units units
resources used resources used more efficientlymore efficiently
Parallel Parallel Machine CodeMachine Code
TraditionalTraditional Itanium™ ArchitectureItanium™ Architecture
Massive Resources
Performance through ParallelismPerformance through Parallelism
Intel ConfidentialIntel Confidential
19
Intel NetBurst MicroarchitectureIntel NetBurst Microarchitecture
Intel ConfidentialIntel Confidential
20
Out-Of-Order CoreOut-Of-Order Core
Intel ConfidentialIntel Confidential
21
Intel Pentium M Processor Intel Pentium M Processor MicroarchitectureMicroarchitecture
Intel ConfidentialIntel Confidential
22
Hyper-ThreadingHyper-ThreadingTechnologyTechnology
Intel Enterprise Intel Enterprise MicroMicro-Architectures-ArchitecturesXeon™ Processor MP
3.2 GB/s3.2 GB/s
64 GB64 GB(PAE-36)(PAE-36)
2M2M
2 2x Integer2 2x Integer
1 1x Integer, 1 1x Integer, 1 MMx & SSE1 MMx & SSE
2 2 FloatingFloating
PointPoint2.8 GHz2.8 GHz
Itanium® 2 Processor 6M
6.4 GB/s6.4 GB/s
1024 TB1024 TB
88
1 2 3 4
Memory AddressingMemory Addressing
1 2 3 4 5 6 7 8 9 1011
System Bus BandwidthSystem Bus Bandwidth
On-die CacheOn-die Cache
Pipeline StagesPipeline Stages
On-die RegistersOn-die Registers
Execution UnitsExecution Units
Core FrequencyCore Frequency
Issue PortsIssue Ports
Performance via Performance via MegahertzMegahertz
Performance via Parallelism
On-die multi-threadOn-die multi-thread
264 Application Registers264 Application Registers+ 64 Predicate Registers*+ 64 Predicate Registers*
6 Instructions / Cycle6 Instructions / Cycle
24 Registers24 Registers
2020
Hyper-ThreadingHyper-ThreadingTechnologyTechnology
3 Instructions / Cycle3 Instructions / Cycle
6 MB6 MB
Instructions / ClkInstructions / Clk
6 Integer, 6 Integer, 3 Branch3 Branch
2 FP, 2 FP, 1 SIMD1 SIMD
2 Load and 2 Load and 2 Store2 Store
** Intel’s EPIC technology includes 64 single-bit predicate Intel’s EPIC technology includes 64 single-bit predicate registers to accelerate loop unrolling and branch intensive code registers to accelerate loop unrolling and branch intensive code execution. execution.
1.5 GHz1.5 GHz
Intel ConfidentialIntel Confidential
23
Hyper-ThreadingHyper-ThreadingTechnologyTechnology
Intel Enterprise Intel Enterprise MicroMicro-Architectures-ArchitecturesXeon® ProcessorXeon® Processor
w/ 64-bit Extensionsw/ 64-bit Extensions
6.4 GB/s6.4 GB/s
64 GB64 GB
1 MB1 MB
2 2x Integer2 2x Integer
1 1x Integer, 1 1x Integer, 1 MMx & SSE1 MMx & SSE
2 2 FloatingFloating
PointPoint3.8 GHz3.8 GHz
ItaniumItanium®® 2 Processor 9M 2 Processor 9M
6.4 GB/s6.4 GB/s
1024 TB1024 TB
88
Memory AddressingMemory Addressing
1 2 3 4 5 6 7 8 9 1011
System BusSystem Bus
On-die CacheOn-die Cache
Pipeline StagesPipeline Stages
On-die RegistersOn-die Registers
Execution UnitsExecution Units
Core FrequencyCore Frequency
Issue PortsIssue Ports
Performance via Performance via MegahertzMegahertz
Performance via ParallelismPerformance via Parallelism
On-die multi-threadOn-die multi-thread
264 Application Registers264 Application Registers+ 64 Predicate Registers*+ 64 Predicate Registers*
6 Instructions / Cycle6 Instructions / Cycle
40 Registers40 Registers
Hyper-ThreadingHyper-ThreadingTechnologyTechnology
3 Instructions / Cycle3 Instructions / Cycle
9 MB9 MB
Instructions / ClkInstructions / Clk
6 Integer, 6 Integer, 3 Branch3 Branch
2 FP, 2 FP, 1 SIMD1 SIMD
2 Load and 2 Load and 2 Store2 Store
** Intel’s EPIC technology includes 64 single-bit predicate registers to accelerate Intel’s EPIC technology includes 64 single-bit predicate registers to accelerate loop unrolling and branch intensive code execution loop unrolling and branch intensive code execution
1.8 GHz1.8 GHz
1 2 3 4 5 6
Up to 6Up to 6
>20>20
Intel ConfidentialIntel Confidential
24
OutlineOutline
Manufacturing ProcessManufacturing Process Processor ArchitecturesProcessor Architectures Processor RoadmapsProcessor Roadmaps
Intel ConfidentialIntel Confidential
8-w
ay8-
way
& a
bove
& a
bove
4–w
ay4–
way
2-w
ay2-
way
2H’042H’041H’041H’04
ProfusionProfusion®®* chipset * chipset
Public Server RoadmapPublic Server Roadmap
33rdrd Party / Enabled Chipset Party / Enabled Chipset
Enabled ChipsetEnabled Chipset
SystemSystem
IntelIntel® ® E8870 ChipsetE8870 Chipset // EnabledEnabled Chipset Chipset
IntelIntel® ® E8870 ChipsetE8870 Chipset // EnabledEnabled Chipset Chipset
IntelIntel® ® E8870 ChipsetE8870 Chipset // EnabledEnabled Chipset Chipset
IntelIntel®® Xeon™ Processor Xeon™ Processor>> 3.40Ghz / 1M 3.40Ghz / 1M
Next Generation ChipsetNext Generation ChipsetIntel® E7501 chipsetIntel® E7501 chipset
IntelIntel®® Xeon™ Processor MP Xeon™ Processor MP 4M iL3 / 3.00 GHz4M iL3 / 3.00 GHz
IntelIntel®® Itanium Itanium®® 2 Processor 2 Processor6M iL3 Cache / 1.5 GHz6M iL3 Cache / 1.5 GHz
IntelIntel®® Itanium Itanium®® 2 Processor 2 Processor6M iL3 Cache / 1.5 GHz6M iL3 Cache / 1.5 GHz
IntelIntel®® Itanium Itanium®® 2 Processor 2 Processor1.5M iL3 Cache / 1.40 GHz1.5M iL3 Cache / 1.40 GHzIntelIntel®® LV Itanium LV Itanium®® 2 Processor 2 Processor1.5M iL3 Cache / 1.00 GHz1.5M iL3 Cache / 1.00 GHz
IntelIntel®® Xeon™ Processor Xeon™ Processor>> 3.60 GHz / 1M 3.60 GHz / 1M
IntelIntel®® Xeon™ Processor MP Xeon™ Processor MP 4M iL3 / 3.00 GHz4M iL3 / 3.00 GHz
Next Generation chipsetNext Generation chipset
IntelIntel®® Itanium Itanium®® 2 Processor 2 Processor9M iL3 Cache / 9M iL3 Cache / >> 1.5 GHz 1.5 GHz
IntelIntel®® Itanium Itanium®® 2 Processor 2 Processor9M iL3 Cache / 9M iL3 Cache / >> 1.5 GHz 1.5 GHz
IntelIntel®® Itanium Itanium®® 2 Processor 2 Processor>> 1.5M iL3 Cache / 1.5M iL3 Cache / >> 1.40 GHz 1.40 GHzIntelIntel®® LV Itanium LV Itanium®® 2 Processor 2 Processor>> 1.5M iL3 Cache / 1.5M iL3 Cache / >> 1.00 GHz 1.00 GHz
Intel ConfidentialIntel Confidential
26
Performance and Price Over TimePerformance and Price Over TimeItanium® processor family: 1.5-2X Itanium® processor family: 1.5-2X
better performance* than Intel® Xeon™ better performance* than Intel® Xeon™ processor family by 2007processor family by 2007
Intel plans to converge to common Intel plans to converge to common platform with goal of achieving platform with goal of achieving
platform cost parityplatform cost parity
Platform Price**Platform Price**
‘‘0404 ’’07+07+
= 1.0= 1.0
= +30%-60% or in ’04**= +30%-60% or in ’04**
+0% in ’07++0% in ’07+
* For Enterprise & Technical Computing* For Enterprise & Technical ComputingApplication Segments; data based on Intel projectionsApplication Segments; data based on Intel projections
Itanium®-Based Platforms: Performance Leadership Now over RISC and Itanium®-Based Platforms: Performance Leadership Now over RISC and Intel® Xeon™ processors; Up to 2X Performance at expected Cost Parity Intel® Xeon™ processors; Up to 2X Performance at expected Cost Parity
with Intel® Xeon™ Processor-Based Platforms by 2007+with Intel® Xeon™ Processor-Based Platforms by 2007+
** ** 30%-60% or higher. Based on web pricing: 4P platforms from Ion Computer; 2P platform (4GB RAM) from Dell
‘‘0404 ’’07+07+
Moore’s LawMoore’s Law
+30%-50% +30%-50% in ‘04in ‘04
+50%-100% +50%-100% in ’07+in ’07+
Intel® Xeon™ Processor-Based Platforms
Intel® Xeon™ Processor-Based PlatformsItanium®-Itanium®-
Based Based
PlatformsPlatforms
Platform Performance*Platform Performance*