intel cornelius
TRANSCRIPT
-
8/13/2019 Intel Cornelius
1/125
InteInte ll ItaniumItanium
ArchitectureArchitecture
28-Jan-2003
Herbert CorneliusTechnical Marketing Manager
Intel EMEA, [email protected]
-
8/13/2019 Intel Cornelius
2/125
2
EMEA HPTC Virtual Team
Intel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Useful URLs
Intel Itanium 2 Processor:- www.intel.com/products/server/processors/server/itanium2/index.htm
Intel Software Products:- www.intel.com/products/software/
Intel Developer Services:- www.intel.com/ids/
Intel Technology Journal:- www.intel.com/technology/itj/index.htm
High-Performance Computing:- www.intel.com/ebusiness/trends/hpc.htm
-
8/13/2019 Intel Cornelius
3/125
3
EMEA HPTC Virtual Team
Intel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Agenda
Intel Itanium Architecture Intel Itanium Processor
Intel Itanium 2 Processor Platforms Software Tools
Some Tuning Tips
-
8/13/2019 Intel Cornelius
4/125
4
EMEA HPTC Virtual Team
Intel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
-
8/13/2019 Intel Cornelius
5/125
5
EMEA HPTC Virtual Team
Intel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
-
8/13/2019 Intel Cornelius
6/125
6
EMEA HPTC Virtual Team
Intel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Extending Intel Architecture
All dates specified are target dates provided for planning purposes only and are subject to change. ( **Codename) P e r f o r m a n c e ,
s c a
l a b i l i t y , m
i s s
i o n c r i
t i c a l
Madison**(Perf)
Madison**Madison**((Perf Perf ))
Deerfield**(Price/Perf)
Deerfield**Deerfield**(Price/(Price/ Perf Perf ))
0200 01
. .
. .
. .
. .
. .
. .
. .
. .
OutstandingPerformance for
Volume Applications
Extends IA for the MostDemanding Applications
(IA(IA --32)32)
03
Gallatin**
Gallatin**Gallatin**
-
8/13/2019 Intel Cornelius
7/1257
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Intel Itanium Processor
First Implementation of the
Intel Itanium Architectureusing innovative EPIC** Technology
**Explicit Parallel Instruction Computing
-
8/13/2019 Intel Cornelius
8/125
8
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Intel Itanium 2 Processor
Second Generation of theIntel Itanium Architecture
using an enhanced Micro-Architecture
-
8/13/2019 Intel Cornelius
9/125
9
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Product Features
400 MHz, 128-bit wide 6.4 GB/s bandwidth
System Bus
IntelE8870 chipset OEM custom chipsets
Chipset
Based on EPIC architecture Enhanced Machine Check Architecture (MCA)
with extensive Error Correcting Code (ECC) Operating system support: HP-UX*, Linux*,
Windows*
Features
Level 3: integrated 3 MB or 1.5 MB
Level 2: 256 KB Level 1: 32 KB
Cache
1GHz 900MHz
Available Speeds
DescriptionFeature
-
8/13/2019 Intel Cornelius
10/125
10
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Block Diagram
Schematic overview
-
8/13/2019 Intel Cornelius
11/125
11
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 SystemsHigh-end Itanium 2-based systems
>2X more than Itanium !
Racksaver DP/1U
1H 2003
Intel4P/4U2P/ 2U
Q4 2002/ Q2 2003
Unisys16P
Q4 2002
NEC32P
Shipping
SGI64/512P
Early 2003
IBM4P/8P/16P
Early 2003
HPDP/2UShipping
HP 2P WSShipping
-
8/13/2019 Intel Cornelius
12/125
12
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Initial Itanium 2 Application Areas
Enterprise solutions deployed on Itanium 2based systems focus on the following:
Applications for Business Intelligence
Mechanical ComputerAided Engineering (MCAE) Electronic Design Automation (EDA) Computeintensive custom applications Enterprise Resource Planning (ERP) Supply Chain Management (SCM) High Performance Computing (HPC) Large databases Security transactions
-
8/13/2019 Intel Cornelius
13/125
13
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium Application Areas
Large Memory Needs(>4GB direct memory access)
Large SMP Systems Complex high-end F.P. Apps 64-bit Integer Applications Customized Applications
Vector and Parallel Applications Enterprise Unix* Needs
-
8/13/2019 Intel Cornelius
14/125
14
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 ProcessorMicro-Architecture Enhancements
Itanium 2 processor builds on Itaniumprocessor features
Increased Clock Frequency Shorter Pipeline Expanded Functional Units Faster Floating Point Improved Cache Greater addressability
Enhanced TLB and ALAT Improved System Bus Long Branch Instruction Enhanced Thermal Management
-
8/13/2019 Intel Cornelius
15/125
15
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
308,620 tpmC at $14.96/tpmC32-way server TPC transactions
13940 MFLOPSLinpack-10K (4-way system)
Performance Number Benchmark
40,621 tpmC at $5.72/tpmC2-way server TPC-C transactions
101770 MFLOPSLinpack-HPC (32-way system)
1520 simultaneous connectionsSPECweb99*_SSL
80,495 tpmC at $4.83/tpmC4-way server TPC-C transactions
600 SD usersSAP 2-tier SD 4-way server
3534 MFLOPSLinpack-1000 (single processor)3700 MB/sStream TRIAD
1356SPECfp*_base2000
810SPECint*_base2000
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance ofIntelproducts as measured by those tests. Any difference in system hardware or software design or configuration may affect actualperformance. Buyers should consult other sources of information to evaluate the performance of systems or components they are consideringpurchasing. For more information on performance tests and on theperformance of Intel products, referencehttp://www.intel.com/procs/perf/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104.
Performance Data
-
8/13/2019 Intel Cornelius
16/125
16
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 ProcessorRecord setting Performance
1 Source: Itanium 2 processor results measured onHP Server rx5670 using 4 Itanium 2 processors1GHz with integrated 3MB L3 cache, 24GB ofmemory, 528GB disk space, HP-UX 11.23, SAP rev4.6D, Oracle 9i V.2
2 Source www.tpc.org: Itanium 2 processormeasurements done on a HP Server rx5670 using 4Itanium 2 processors 1GHz with integrated 3MB L3cache, 48GB memory, HP-UX 11.23, Oracli 9iV.2, at$4.83 per tpmC
3 Source: Itanium 2 processormeasurements done on a NEC ServerTX7/i9510 using 32 Itanium 2 processors1GHz with integrated 3MB L3 cache, 128GBmemory, Linux OS.
5 Source: Itanium 2 processor measurementsdone on a SGI Scalable Linux System using 64Itanium 2 processors, 128GB memory, LinuxOS.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Anydifference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems orcomponents they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/procs/perf/limits.htm or call (U.S.) 1-800-
628-8686 or 1-916-356-3104
BENCHMARK
SCALE
RESULT
SAP (2 Tier)SAP (2 Tier) 11Sales andSales and
DistributionDistribution
600600USERSUSERS
WORLDRECORDWORLDWORLD
RECORDRECORD
4 4 - - w ay w ay
TPCTPC -- CC22TransactionTransactionProcessingProcessing
80.4K80.4KtpmCtpmC
WORLDRECORDWORLDWORLD
RECORDRECORD
4 4 - - w ay w ay
Linpack3
HighPerformance
Computing
101GFLOPS
WORLDRECORDWORLDWORLD
RECORDRECORD
32-way
TPCTPC -- CC44TransactionTransactionProcessingProcessing
308K308KtpmCtpmC
IA SMPRECORDIA SMPIA SMP
RECORDRECORD
32 32 - - w ay w ay
Stream 5Platform
Bandwidth
120GB/sec
WORLDRECORDWORLDWORLD
RECORDRECORD
64-way
4 Source: Itanium 2 processor measurements done on aNEC TX7/i9510 Server using 32 Itanium 2 processors withintegrated 3MB L3 cache, 256GB memory, Windows .NETServer 2003, Datacenter Edition, Microsoft SQL Server 2000Enterprise Edition (64-bit) beta version, Availability date12/31/02.
-
8/13/2019 Intel Cornelius
17/125
17
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Intel Itanium Processor Family
800MHz
4MB L3-Cache460GX Chip-setOEM Chip-sets180nm
1GHz
3MB iL3-CacheE8870 Chip-setOEM Chip-sets180nm
1.5GHz
6MB iL3-CacheE8870 Chip-setOEM Chip-sets130nm
>1.5GHz
larger L3-CacheEnhanced Dual-CoreE8870 Chip-setOEM Chip-sets90nm
Madison** Montecito**
**codename
2001 2002 2003 2005
All dates specified are target dates, are provided for planning purposes only and are subject to change
common platform
Enhanced Core
2004
>1.5GHz
9MB iL3-CacheE8870 Chip-setOEM Chip-sets130nm
-
8/13/2019 Intel Cornelius
18/125
18
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
A new Architecture for
Business Computing
RISCTechnology
CISCTechnology
New Architectural features EPIC Predication Speculation
Enhanced floating pointperformance Massive Resources 64-bit instruction set, registers
& addressing
Enhancedreliabilityfeatures
IA-32 Enterprise classOS
-
8/13/2019 Intel Cornelius
19/125
19
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
64-Bit
Is it new ? Is it good or bad ?IA-32 already has 64-bit and more- 64-bit buses- 64-bit F.P. with 80-bit registers
- 64-bit Integer- 64/128-bit MMX/XMM registers- but only 32-bit address registers
Itanium has 64-bit address HW- It is one of many features
How fast and how many data can you transfer/store- 32-bit data items- 64-bit data items
-
8/13/2019 Intel Cornelius
20/125
20
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
64-Bit Addressing
32-bit Addressing- 1 cm- one CD cover height
64-bit Addressing- 429496 km- distance betweenEarth and Moon
32-bit .
64-bit
l
-
8/13/2019 Intel Cornelius
21/125
21
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium Processor ArchitectureSelected Features
64-bit Addressing Flat Memory Model Instruction Level Parallelism (6-way) Large Register Files Automatic Register Stack Engine
Predication Software Pipelining Support Register Rotation Loop Control Hardware Sophisticated Branch Architecture
Control & Data Speculation Powerful 64-bit Integer Architecture Advanced 82-bit Floating Point Architecture Multimedia Support (MMX Technology)
EMEA HPTC Vi l T
-
8/13/2019 Intel Cornelius
22/125
22
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
User Benefits
More Capacity and Capabilityl Big in-memory data structures and DB
l Large file system and data files
l Efficient large integer calculations
l Fast 64-bit F.P. calculations
l Fast Security processing
l More and faster transactions
l More servicesl Higher throughput
l Improved availability and manageability
EMEA HPTC Vi l T
I l I i A hi
-
8/13/2019 Intel Cornelius
23/125
23
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Broad Industry Investment
~20 OEMs worldwide shipping Itanium based systems today with 2X growth inhigh-end systems (8-32P+) expected withItanium 2 processor
7 operating system versions available todayfrom Windows* to HP-UX* and Linux, withmore versions coming in 03/04
More than 100 applications/tools availabletoday with 100s more in development forhigh-end enterprise and technical computing
(Founder)(Founder)
(Langchao)(Langchao)
OEMs
OpenVMS OpenVMS ,,NNonStop onStop Kernel Kernel
OSVs
ISVs
Itanium Architecture has established broad industryinvestment providing solution choice to high-end computing
EMEA HPTC Vi l T
I l I i A hi
-
8/13/2019 Intel Cornelius
24/125
24
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Linux* Supercomputer
1,400 next-generationIntel Itanium FamilyProcessors that are code-named McKinley andMadison, the new HPsupercomputer will have anexpected total peakperformance of more than8.3 teraflops.
April 16, 2002
http:/ /www.pnl.gov/news/2002/computer.htm
EMEA HPTC Vi t l T
I t l It i A hit t
-
8/13/2019 Intel Cornelius
25/125
25
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Performance ScalingItanium 2 running Itanium processor binaries
L i n p a c
k 1 0 0 0
S e c u r i t y 1
L i n p a c
k 1 0 0 0 0
- 4 P
S p e c I
n t 2 0 0 0
S p e c F
p 2 0 0 0
C A E E R P S e c
u r i t y 2
S p e c J
B B 2 0 0 0
I M D B
Performance tests and ratings are measured using specific comput er systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in systemhardware or software design or configuration may affect actual p erformance. Buyers should consult other sources of information to evaluate the performance of systems or component s they are considering
purchasing. For more information on performance tests and on theperformance of Intel products, reference www.intel.com/procs/perf/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104
G A M E S
S
Performance Scaling %Itanium 800MHz/4MB to Itanium 2 1GHz/3MB
Itanium 2 delivers an average of 1.5-2X performance improvement
Source: Intel Labs
1.00
1.25
1.50
1.75
2.00
2.25
EMEA HPTC Vi t l T
Intel It ni m Architect re
-
8/13/2019 Intel Cornelius
26/125
26
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium Processor Family Value PropositionIntel Itanium 2 Processor / Intel E8870 Platform Advancements
PerformancePerformance
ScalabilityScalability
Availability Availability
InvestmentInvestmentProtectionProtection
ChoiceChoice
l E8870 chipset scalability port for 8P+ systemsl Cache line size increased to 128 from 64l Support for larger page size (4 GB), addressing (1024 TB)
l Hot Plug Processor Boards, Memory, I/Ol Fail-over redundancyl Extensive error detection, correction and logging
l Major OEMs worldwide shipping Itanium-based systemsl Support from broad list of leading OSVsl S/W application and platform reach expands over time
l Platform compatible w/ future Itanium processorsl Compatible with Itanium-based OS/ softwarel Common set of S/W tools for Itanium processor family
l Up to ~1.5-2X performance increase over Itanium proc.l 3X increase in FSB bandwidthl 2X improvement in cache latencies
EMEA HPTC Vi t l T
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
27/125
27
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Intel Itanium Architecture
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
28/125
28
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Fundamental Architecture Challenges Sequentiality inherent in traditional architectures Complex hardware needed to (re)extract ILP Limited ILP available within basic blocks Branches make extracting ILP difficult Memory dependencies further limit ILP Increasing latency exacerbates ILP need Limited resources : A fundamental constraint Shared resources create more overhead Loop ILP extraction costs code size
And the challenges continue ...
Itanium architecture overcomes thesefundamental challenges!
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
29/125
29
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium ArchitecturePerformance Features
Parallelism - inherent in Itaniums EPIC architecture Frees up hardware for parallel execution Predication reduces branches, enhances ILP Control Speculation breaks branch barrier, enhances ILP Data Speculation breaks data dependence, increases ILP Control and Data Specn address memory latency Itanium arch has abundant reg & mem resources Stack/ RSE reduces call overhead and management
Loop support yields performance w/o overhead And the performance features continue ...
Itanium Architecture : Beyond RISC
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
30/125
30
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium Processor Block Diagram
(schematic overview)
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
31/125
31
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Instruction 241 bits
Instruction 141 bits
Instruction 041 bits
Template5 bits
128 bits (bundle)
Basis for increased parallelism
M=MemoryF=Floating-pointI=Integer L=Long Immed.B=Branch
(MMI)Memory (M)Memory (M)e.g. Integer (I)
Itanium Architecture:Explicitly Parallel
Template specifies instruction types MFI, MMI, MII, MLX, MIB, MMF, MFB, MMB, MBB, BBB
Stops specify group breaks (dependencies) Intra-bundle (M;;MI or MI;;I) and Inter-bundle stop
Most common template combinations covered
Headroom for additional templates Simplifies hardware requirements Scales compatibly to future generations
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
32/125
32
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
EPIC ( Explicit Parallel Instruction Computing)
Source Cod e
InstructionBundles
(3 Instr. each,
128 bit wide)
Instruction Groups(series of bundles)
Up to 6 instructions executed per clock
M i chael S.Schlan sker, B.Rama kr i shna Rau: EPIC: Expli cit Parall el I nstr ucti onComputing; I EEE Comp ut er, February 2000, pp.37-45
Instructions
Compiler
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
33/125
33
EMEA HPTC Virtual TeamIntel Itanium Architecture
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
M F I M F I
Load 4 DP (8 SP) opsvia 2 ld-pair
2 ALU ops (post++)
4 DP FLOPS
(8 SP FLOPS)
2 ALU ops
6 instructionsprovides12 parallel ops/clock (SP: 20 parallel ops/clock)for digital content creation& scientific computing
2 Loads +2 ALU ops (post++)
M I B M I B
2 ALU ops 1 Branch Hint +1 Branch instr
6 instructionsprovides8 parallel ops / clock for enterprise &Internet applications
Itanium processor delivers greater ILPthan any contemporary processor
Breakthrough Parallelism
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
34/125
34
EMEA HPTC Virtual Team
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Floating-Point:High performance and High precision
Floating-Point Architecture
Fused Multiply Add Operation An efficient core computation unit
Abundant Register resources 128 registers (32 static, 96 rotating)
High Precision Data computations 82-bit unified internal format for all data types
Software divide/square-root High throughput achieved via pipelining
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
35/125
35
EMEA HPTC Virtual Team
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Floating Point Featuresl Native 82-bit hardware provides support for multiple numeric modelsl 2 Extended precision pipelined FMACs deliver 4 EP / DP FLOPs/cyclel Performance for security, efficient use of hardware: Integer mul-add, s/w dividel Balanced with plenty of operand bandwidth from registers / memory
6 x 82-bit operands
L2L2CacheCache
128 entry128 entry8282 --bitbit
RFRF
2 x 82-bit results
4Mbyte4MbyteL3L3
CacheCache
2 stores/clk
2 DPOps/clk
4 DPOps/clk
(2 x Fld-pair)
odd
even
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
36/125
36
EMEA HPTC Virtual Team
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Parallel, deep, and dynamic pipelinedesigned for maximum throughput
Itanium Processor Pipeline
6-Wide EPIC hardware under compiler control Parallel hardware and control for predication & speculation Efficient mechanism for enabling register stacking & rotation Software-enhanced branch prediction
10-stage in-order pipeline designed for: Single cycle ALU (4 ALUs globally bypassed) Low latency from data cache
Dynamic support for run-time optimization Decoupled front end with prefetch to hide fetch latency Aggressive branch prediction to reduce branch penalty
Non-blocking caches, register scoreboard to hide load latency
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
37/125
37
EMEA HPTC Virtual Team
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
PredicationControl Flow to Data Flow
Traditional Arch.
then
else
br cmp
br
cmp p1,p2p2
p2
p1
p1
Itanium Architcteureif if
Removes/Reduces Branches andEnables Parallel Execution
64 predicate registers
Can be combined with logical ops
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
38/125
38
EMEA HPTC Virtual Team
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Loop support: ILP+++, Overhead---
Software Pipelining Support
High performance loops withoutcode size overhead
No prologue/epilogue Register rotation (rrb) Predication
Loop control registers (LC, EC) Loop branches (br.ctop,br.wtop) Especially valuable for integer loops
with small trip counts
Whole loop computation in parallel
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
39/125
39
EMEA HPTC Virtual Team
Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Software Pipelining (cont.)
Traditional architectures use loop unrolling Results in code expansion and increased cache misses
Itanium-Processor Software Pipelining uses rotatingregisters Allows overlapping execution of multiple loop instances
Predication controls the pipeline stages
Sequential Loop
T i m e
Software-Pipelined Loop
T i m e
loadload
computecompute
storestore
-
8/13/2019 Intel Cornelius
40/125
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
41/125
41Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Register Rotation GR32-127 and FR32-127 can rotate (specified range)
Separate rotating register base for each set (GR, FR) Loop branches decrement all register rotating bases (RRB) Instructions contain a virtual register number
physical register # = RRB + virtual register #
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7
same
phy.reg.
Predicate register range also rotates.diff.
virtualnumber
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
42/125
42Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Control & Data Speculation
Control Speculationmoves loads above
branches / calls
Barrier instr. 2
ld r1=use = r1use = r1
branch st[?]
instr. 1instr. 2instr. 1
ld r1=
Barrier
Data Speculation movesloads above possibly
conflicting stores
Speculation reduces the impactof memory latency
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
43/125
43Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Control Speculation
Control Speculation moves loads above branches Detected exception indicated using NaT bit / NaTVal
Check raises detected exceptions Branch barrier broken to minimize memory latency
Barrier instr. 2
chk.s r1use = r1use = r1
ld.s r1=
branch branch
instr. 1instr. 2instr. 1
ld r1=
Itanium Traditional Arch. Detect exception
Deliver exception
P r o p a g a t e e x c e p t i o n
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
44/125
44Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Hoisting Uses
Barrier instr. 2
chk.s r1use = r1use = r1
ld.s r1=
branch branch
instr. 1
instr. 2instr. 1
ld r1=
ItaniumItanium
Traditional Arch.use = r1
Recovery code
Speculativeuse
ld r1=
branch
All computation instructions propagate NaTs to reducenumber of checks to allow single check on results
Compares also propagates when writing predicates
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
45/125
45Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Data Speculation
Barrier instr. 2
ld.c r1use = r1use = r1
ld.a r1=
st[?] st[?]
instr. 1instr. 2instr. 1
ld r1=
Itanium Traditional Arch.
Data Speculation moves loads above possiblyconflicting stores
- Keeps track of load addresses used in advance (ALAT)
Advanced-loaded data can be used speculatively
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
46/125
46Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Advanced Load Address Table: ALAT
ld.a inserts entries Conflicting stores remove entries
also ld.c.clr, chk.a.clr
Presence of entry indicates success chk.a branches when no entry is found
reg#reg#reg#
reg#
::
addr addr addr
addr
::
ld.a reg# =
chk.a reg# ?
st[addr]
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
47/125
47Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Hoisting Uses
Barrier
instr. 2
chk.a r1use = r1use = r1
ld.a r1=
st[?] st[?]
instr. 1
instr. 2instr. 1
ld r1=
Itanium Traditional Arch.
Data and Control Speculationcan be combined
use = r1
Recovery code
Speculativeuse
ld r1=
branch
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
48/125
48Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Intel Itanium 2 Processor
Architecture
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
49/125
49Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Intel Itanium 2 Processor
Codename McKinley Target for 2H2002 Enhanced Itanium design 100% Itanium binary compatible
1.0GHz clock-rate 6 Integer units 256KB L2 cache 1.5MB or 3MB iL3 cache
6.4GB/s system bus 1.5-2x Performance increase overItanium based systems
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
50/125
50Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Optimizations
Improved dynamic properties Production frequency is 1 GHz Reduced L1, L2, L3 latencies
L3 cache has been incorporated on die Improved L2 cache capacity Improved FSB bandwidth Lower branch prediction penalties
Itanium 2 provides significant speed-ups onexisting Itanium processor binaries
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
51/125
51Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Optimizations
Reduced execution paths More parallelism/resources
More integer, multi-media units and memory ports
Short latencies Fully bypassed functional units Very Low L1D/L2/L3 Cache Latencies Low latency FP execution
Many more ways to issue/execute 6 insts/clk
Itanium 2 provides performance headroom forre-optimized binaries
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
52/125
52Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
System Bus64 bits wide133MHz/266 MT/s2.1 GB/s
Width2 bundles per clock4 integer units2 load or stores per clock
9 issue ports
CachesL1 2X16KB - 2 clock latencyL2 96K 9 clock latencyL3 - 4MB external 21 clk
12.8 GB/s bandwidth
Addressing
44 bit physical addressing50 bit virtual addressingMaximum page size of 256MB
System Bus
Core800 MHz
L3 Cache BSB
System Bus128 bits wide200MHz/400 MT/s6.4 GB/s
Width2 bundles per clock6 integer units2 loads and 2 stores per clock
11 issue ports
CachesL1 2X16KB - 1 clock latencyL2 256K 5 clock latencyL3 - 3MB 12 clk
32 GB/s bandwidth
Addressing
50 bit physical addressing64 bit virtual addressingMaximum page size of 4GB
Core1 GHz
L3 Cache
System Bus
Itanium Processor Itanium 2 Processor
2X
3X
1.5X
2X
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
53/125
53Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Processor Block Diagram
(schematic overview)
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
54/125
54Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Architectural ChangesBeneficial to compilers
Improved data/control speculation support. ALAT - fully associative = minimize thrashing. processor directly vectors to recovery code for reducedprocessor speculation costs
64-bit Long Branch Instruction
Beneficial to OS and System designs Full 64-bit virtual addressing Full 2**24 virtual address spaces 4GB virtual pages = reduced TLB pressure
50-bit Physical addressing = very large memory/IO spaces
More flexibility for compiler, OS and systemdesigns
-
8/13/2019 Intel Cornelius
55/125
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
56/125
56Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
25.6GB/s
25.6GB/s
Memory Cache HierarchyItanium 2 Processor (1GHz)
L1D16KB64B CL1 CLK
L1I16KB64B CL1 CLK
L2-Cache256KB128B CL8-way5-7 CLKS
L3-Cache1.5/3MB128B CL12-way12-15 CLKS
32GB/s
6.4 GB/s
Itanium Processor (800MHz)
L1D16KB32B CL2 CLK
L1I16KB32B CL
2 CLK
L2-Cache96KB64B CL6-way6-9 CLKS
2.1 GB/sMemory(Controller)
32
GB/s
32GB/s
12.8GB/s
L3-Cache2/4MB64B CL4-way20 CLKS
Memory(Controller)
210 CLKS
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
57/125
57Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Cache Hierarchy
3 level caching on Itanium 2 processor 1st level cache optimized for latency 2nd level cache optimized for bandwidth 3rd level cache optimized for size
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
58/125
58Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Large Register Set
BR7
BR0
Branch Registers
63 0
96 Framed, Rotating
GR1
GR31
GR127
GR32
GR0NaT
32 Static
0
Integer Registers
63 0
PredicateRegisters
PR1
PR63
PR0
PR15PR16
48 Rotating16 Static
96 Rotating
FR1
FR31
FR127
FR32
FR0
32 Static
+ 0.0
F.P. Registers
81 0
+ 1.01
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
59/125
59Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Functional Units
Itanium Itanium 2
Integer
F.P.
Multimedia
Load/Store
Branch
F.P. MAC
F.P. MAC
ALU/INT/MM
ALU/INT/MM
ALU/MM/MEM
ALU/MM/MEM
ALU/MM/MEM
ALU/MM/MEM
BRANCHBRANCH
BRANCH
Issue Ports/Units
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
60/125
60Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Dispersal Matrix
Possible Itanium 2 full issuePossible Itanium processor and Itanium 2 full issue
* hint in first bundleMFB*
MMB*
BBB
MBB
MIB*
MMF
MFI
MMI
MLI
MII
MFMMBBBBBMBBMIBMMFMFIMMIMLIMII
Itanium 2 allows more compiler dispersal options
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
61/125
61Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
A simple Example
..double precision, dimension(10000) :: a,b,c,d do i=1,10000
a(i)=a(i)*b(i)+c(i)*d(i)enddo..
DAXPY like loop over floating-point vectors can be optimized differently for Itanium
and Itanium 2
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
62/125
62Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium vs. Itanium 2 Assembly Code
3 clockticks on Itanium
.b1_2:
{ .mmf(p16) ldfd f37=[r8],8
(p16) ldfd f45=[r3],8(p19) fma.d f52=f40,f48,f0 ;;
}{ .mmi
(p16) ldfd f32=[r33](p16) ldfd f40=[r2],8
nop.i 0 ;;}
{ .mfi(p23) stfd [r40]=f51
(p20) fma.d f48=f36,f44,f53nop.i 0
}{ .mib
(p16) add r32=8,r33nop.i 0
br.ctop.sptk .b1_2 ;;}
2 clockticks on Itanium 2 !
.b1_2:
{ .mfi(p16) ldfd f43=[r8],8
(p19) fma.d f51=f46,f50,f0nop.i 0
}{ .mmf
(p16) ldfd f47=[r3],8(p23) stfd [r32]=f56
(p21) fma.d f54=f37,f42,f53 ;;}
{ .mii(p16) ldfd f32=[r33]
nop.i 0nop.i 0
}{ .mmb
(p16) ldfd f37=[r2],8(p16) add r32=8,r33
br.ctop.sptk .b1_2 ;;}
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
63/125
63Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
6.4 GB/s6.4 GB/s128 bits wide128 bits wide
400 MHz400 MHz
Itanium 2 Processor Itanium 2 Processor Itanium Processor Itanium Processor
1010
4 Integer,3 Branch
2 FP,2 SIMD
2 Loador 2 Store
1 2 3 4 5 6 7 8 9
PipelinePipelineStagesStages
328 on328 on--board Registersboard Registers
6 Instructions / Cycle6 Instructions / Cycle
4 MB L3 on board, 96k L2, 32k L1 on4 MB L3 on board, 96k L2, 32k L1 on--di edi e
2.1 GB/s2.1 GB/s
64 bits wide64 bits wide266 MHz266 MHz
800 MHz800 MHz
IssueIssuePortsPorts
88
2 FP,1 SIMD
2 Load &2 Store
1 2 3 4 5 6 7 8 9
328 on328 on--board Registersboard Registers
6 Instructions / Cycle6 Instructions / Cycle
3 MB L3, 256k L2, 32k L1 all on3 MB L3, 256k L2, 32k L1 all on--diedie
1 GHz1 GHz
1011
Large onLarge on--die cache,die cache,reduced latencyreduced latency
IncreasedIncreasedCore frequencyCore frequency
Additional AdditionalExecution unitsExecution units
Additional AdditionalIssue portsIssue ports
3X increase3X increaseSystem bus bandwidthSystem bus bandwidth
McKinley delivers performance through:McKinley delivers performance through: Bandwidth and cache improvementsBandwidth and cache improvements MicroMicro --architecture enhancementsarchitecture enhancements Increased frequencyIncreased frequency
System busSystem bus
Itanium 2221 million transistors total
25 million in CPU core
6 Integer,3 Branch
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
64/125
64Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Architectural ChangesBeneficial to compilers
Improved data/control speculation support ALAT - fully associative = minimize thrashing processor directly vectors to recovery code for reduced
speculation costs
64-bit Long Branch Instruction
Beneficial to OS and System designs Full 64-bit virtual addressing Full 2**24 virtual address spaces 4GB virtual pages = reduced TLB pressure
50-bit Physical addressing = very large memory/IO spaces
Changes provide more flexibility to compiler,OS and system designs
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
65/125
65Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Pipelines
L2 Queue Nominate/Issue (4)L2N-L2IInteger and FP Register File read (6)REG
Integer and FP Register Rename (6 inst)
Expand, Port Assignment and Routing
Instruction Rotate and Buffer (6 inst)
IP Generate, L1I Cache (6 inst) and TLBaccess
L2A-W
FP1-WB
WB
DET
EXE
L2 Access, Rotate, Correct, Write (4)
FP FMAC pipeline (2) + reg writeREN
Writeback, Integer Register updateEXP
Exception Detect, Branch CorrectionROT
ALU Execute(6), L1D Cache and TLBaccess + L2 Cache Tag Access(4)
IPG
Short 8-stage in-order main pipeline
In-order issue, out-of-order completion Reduced branch misprediction penalties Fully interlocked, no way-prediction or flush/replay mechanism
Pipelines are designed for very low latency
RENEXPROTIPG DET WBEXEREGL2WL2CL2DL2ML2A L2IL2N
WBFP4FP3FP2FP1FPU
CoreL2
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
66/125
66Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Issue Ports
Issue ports 4 Mem/ALU/Multi-Media 2 Integer/ALU/Multi-Media 2 FMAC 3 branch
4 memory ports Integer: allow 2 load AND 2 store per clk FP: 2 FP load pairs AND 2 store per clk to feed 2 FMACs
L1 instruction cache
two instructionbundles
ALU/MEM
1
ALU/MEM
2
ALU/MEM
3
ALUMEM
4
six arithmeticlogic units
two load portstwo store ports(1 cycle latency)
ALU/INT1
ALU/INT2
L1datacache
Itanium 2
Substantial performance headroom forFP and integer kernels
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
67/125
67Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Unit Latencies
Consuming Class Instruction
Producing Class Instruction Integer Multi- Load Storemedia Address Data
Mem/integer ports ALU 1 2 1 1
Integer only ports ALU 1 2 1 1
Multimedia 3 2 3 3
Integer Loads (L1D hit) 1 2 2 1
Short latencies and full bypasses, improveperformance for re-optimized code
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
68/125
68Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Floating Point Latencies
Short latencies = performance upside for re-optimized FP code
6INT FP (setf)
4FMISC5FP INT (getf)
4FMAC
6FP Load (L2 Cache hit)
Itanium 2 LatencyOperation
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
69/125
69Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Floating Point Architecture
DIV and SQRT are done in software to enable better ILP full pipelining higher throughput more flexibility support full IEEE.754 compliance versions optimized for latency and throughput also available for SIMD F.P. operations
Source: Intel Technology Journal Q4, 1999
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
70/125
70Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Floating-Point DIV ThroughputOptimized
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
71/125
71Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Floating-Point SQRT ThroughputOptimized
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
72/125
72Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Integer DIV
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
73/125
73Copyright 2002-2003 Intel Corporation*Other brands and names are the property of their respective owners
Itanium 2 Branch PredictionZero clock branch prediction
2 level branch prediction hierarchy L1IBR Level 1 Branch Cache
Part of the L1 I-cache 1K trigger predictions+0.5K target addresses
L2B - Level 2 Branch Cache (12K histories) PHT - Pattern History Table (16K counters)Reduced prediction penalties
IP-relative branch w/correct prediction - 0 cycle IP-relative branch w/wrong target - 1 cycle Return branch w/correct prediction - 1 cycle
Last branch in counted loop prediction - 0 cycle Branch Misprediction 6 cycle
Reduced branch penalties speed up existing code
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
74/125
74Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Instruction PrefetchingStreaming prefetching
Initiated by br.many (hint on branch inst) CPU prefetches ahead the sequential execution stream Streaming prefetch is cancelled by:
a predicted-taken branch in the front-end
a branch misprediction occurs on the back-end Software cancels the prefetch with a brp instruction
Branch Prefetching Hints Initiated by brp.few, brp.many or mov_to_br
One time prefetch for the target Two hint prefetches can be initiated per cycle
Software initiated instruction prefetching improvesperformance by lower instruction fetch penalties
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
75/125
75Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium 2 Caches
R: 32 GBs
W: 32 GBs
R: 32 GBs
W: 32 GBs
R: 16 GBs
W: 16 GBs
R: 32 GBsBandwidth
WB (WA)WB (WA+ RA)
WT (RA)-Write Policy
12INT: 5FP: 6
INT:1I-Fetch:1Latency(load to use)
NRUNRUNRULRUReplacement
12844Ways
128B128B64B64BLine Size
3M on die256K 16K 16K Size
L3L2L1DL1I
All caches are physically indexed, pipelined, and non-blocking:score boarded registers allow continued execution until load use
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
76/125
76Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
L1D (1 clock Integer Data Cache)
High Performance 32GB/s, 2 ld AND 2 st ports Write Through all stores are pushed to the L2 FP loads force miss, FP stores invalidate True dual-ported read access no load conflicts pseudo-dual store port write access
2 store coalescing buffers/port hold data until L1D update
Store to load forwarding
One clock data cache provides a significantperformance benefit
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
77/125
77Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
L2 and L3 CacheL2 256KB, 32GB/s, 5 clk
Data array is pseudo-4 ported - 16 banks of 16KB eachNon-blocking/out-of-order
L2 queue (32 entries) - holds all in-flight load/stores out-of-order service - smoothes over load/store/bank conflicts, fills Can issue/retire 4 stores/loads per clock Can bypass L2 queue (5,7,9 clk bypass) if
no address or bank conflicts in same issue group
no prior ops in L2 queue want access to L2 data arrays
Large iL3 3MB, 32GB/s, 12 clk cache on die !! Single ported full cache line transfers
Large on die L2 and L3 cache provides significantperformance potential
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
78/125
78Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
TLBs
2-level TLB hierarchy DTC/ITC (32/32 entry, fully associative, .5 clk)
Small fast translation caches tied to L1D/L1I Key to achieving very fast 1-clk L1D, L1I cache accesses
DTLB/ITLB (128/128 entry, fully associative, 1 clk) All architected page sizes (4K to 4GB) Supports up to 64/64 ITR/DTRs
TLB miss starts hardware page walker
Small fast TLBs enable low latency caches
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
79/125
79Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
System Bus Enhancements
Extension of the Itanium processor bus Same protocol with minor extensions Increased to 6.4GB/s bandwidth
frequency 200MHz, 400MHz data, 128-bit data bus Bus is non-blocking and out of order
Most transactions can be deferred for later service Buffering
18 bus requests/CPU are allowed to be outstanding 16 Read Line + 6 Write Line + two 128 byte WC buffers
Itanium 2 significantly extends the system busperformance level
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
80/125
80Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
New Bus Transactions
L3 cast-outs (Normally silent L3 replacement (E->I, S->I)) Reduces snoop traffic in Directory based systems Backward inquiry for L2, L1 coherency
Memory read current
non-destructive (non-coherent) snoop of CPU lines Used in high bandwidth graphic based systems
Cache Cleanse writes all modified lines to memory M->E, Used in fault tolerant systems invoked via PAL
Itanium 2 provides several new bus transactionsto improve performance/reliability
EMEA HPTC Virtual Team
Intel Itanium Architecture
E F t
-
8/13/2019 Intel Cornelius
81/125
81Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Error FeaturesError detection on all major arrays
Parity coverage on L1D, L1I, and TLBs ECC on L2 and L3
double bit detection single bit correction - Out of path repair all errors are fully contained
Bus is covered with parity/ECC double bit detection single bit correction on transmission Error Isolation (end-to-end error detection)
From memory: unique FSB 2xECC syndrome encoding cantolerant additional single bit errors in transmission
Error not reported until referenced by a consuming process
Itanium 2 provides extensive errordetection/correction/containment
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
82/125
82Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Thermal Management
Programmable fail-safe thermal trip Itanium 2 will reduce power consumption Reduce power consumption to ~60% of peak Execution rate dropped to 1 inst per clock Correct Machine Check notification posted to OS Full speed execution resumes when temperature
drops
never invoked in properly designed andoperating cooling systems
even on worse case power code
Itanium 2 provides a thermal fail-safemechanism in the event of a cooling failure
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
83/125
83Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium 2 Processor
Itanium 2 builds on and extends the Itanium processorfamily to meet the needs of the most demandingenterprise and technical computing environments Enhanced Itanium 2 features are a result of extensible Itanium
architecture Itanium 2 is binary compatible with Itanium processor software
Major enhancements include : Increased frequency Enhanced micro-architecture more execution units, issue ports
Efficient data handling; higher bandwidth and reduced latencies
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
84/125
84Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel Itanium 2 Processor
Platforms
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
85/125
85Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Performance Scaling
Scale-Out(Cluster)
Scale-Up(SMP, ccNUMA)
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
86/125
86Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
DP/1U 4P/4U
16P
32P
64-512P
4P/8P/16PDP/2U
Increase Capacity andCapability
Scaling Out and Scaling Up Scaling Right
Do more, better and
faster at lower costs.
EMEA HPTC Virtual Team
Intel Itanium Architecture
Itanium Processor Family
-
8/13/2019 Intel Cornelius
87/125
87Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium Processor FamilyOEM Server Designs
4P 8-16P >16P
17 OEMs Shipping
>20 OEM Platforms 10 OEM Designs
4 OEMs Shipping
6 OEM Designs
1 OEM Shipping
Itanium 2(Madison)
ItaniumProcessor
Substantial investment by OEMs in custom high-end platforms and growing
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
88/125
88Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium 2 SystemsHigh-end Itanium 2-based systems
>2X more than Itanium !
Racksaver DP/1U1H 2003
Intel4P/4U2P/ 2U
Q4 2002/ Q2 2003
Unisys16PQ4 2002
NEC32PShipping
SGI64/512PEarly 2003
IBM4P/8P/16PEarly 2003
HPDP/2UShipping
HP 2P WSShipping
EMEA HPTC Virtual Team
Intel Itanium Architecture
Itanium 2 based Servers
-
8/13/2019 Intel Cornelius
89/125
89Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium 2-based ServersBringing High-End Capabilities to Intel Architecture
Large Memory CapacityEx. 4P node w/48GB
512P+ system w/512GB
Scalable to High-EndMulti-Processing
32P+ SMP systems512P+ Clustered configurations
High-Bandwidth,Flexible I/O
Large Qty PCI-X slotsDual GbE LANUltra 320 SCSI
Remote I/O capabilitiesPartitioningMultiple System ImagesStatic/Dynamic Domains
High-End RASIntelligent Platform
Management,Hardware redundancy
for Fault-Tolerance,Modular and Hot-PlugCapabilities
Selected examples of somehigh-end OEM platformcapabilities. Not all capabilities
found on all platforms.
OEMs will offer datacenter computing capabilitieswith their Itanium 2-based servers
EMEA HPTC Virtual Team
Intel Itanium Architecture
Th Chi
-
8/13/2019 Intel Cornelius
90/125
90Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
The Chipset
I/OBridge
Processors
Memory &I/O
Controller MemoryBridge
Memorymodules
I/ODevices
Chipset
The chipset is a key ingredient to platform design and performance
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
91/125
91Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Optimized for: 1-2P workstations
2-4P servers
Designed for great cost &performance
Great developers desktops
High-performance clusters
Features: 6.4 GB/s processor bandwidth
12.8 GB/s memory bandwidth 4.0 GB/s I/O bandwidth
Extremely low latency
Hewlett-Packard zx1 Chipset
HP zx1memory & I/O
controller
HP zx1
I/Oadapter
HP zx1scalablememoryexpander
DIMMs
HP zx1chipset
PCI bus
PCI-X bus AGP bus
HP scalable processor chipset zx13 modular components
IntelItanium 2
processors(1-4)
EMEA HPTC Virtual Team
Intel Itanium Architecture
b d
-
8/13/2019 Intel Cornelius
92/125
92Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
HP Itanium 2 Processor basedSystems
1 GHz Itanium 24-wayHP zx1 chipset
900MHz/1GHz Itanium 21-2 way HP zx1 chipset
AGP4X OEM graphics
900MHz Itanium 21-way HP zx1 chipset
AGP4X OEM graphics
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
93/125
93Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium 2 Workstations
HP zx6000 HP zx2000
EMEA HPTC Virtual Team
Intel Itanium Architecture
C i 2 b d S
-
8/13/2019 Intel Cornelius
94/125
94Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
NEC Itanium 2-based Server
TX7 Series"TX7/i6010,i6510,i9010/i9510"
LINPACK HPC of 101.77GFLOPS on 32 CPUs
http://www.nec.co.jp/press/en/0207/0901.html
EMEA HPTC Virtual Team
Intel Itanium Architecture
Shared Memory via ccNUMA
-
8/13/2019 Intel Cornelius
95/125
95Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Shared Memory via ccNUMAhttp:// www.sgi.com/features/2003/jan/altix /
EMEA HPTC Virtual Team
Intel Itanium Architecture
Intel E8870 Chip Set
-
8/13/2019 Intel Cornelius
96/125
96Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel E8870 Chip-Set
EMEA HPTC Virtual Team
Intel Itanium Architecture
Intel E8870 Block Diagram
-
8/13/2019 Intel Cornelius
97/125
97Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
System Bus:
16 Bytes Wide Double Pumped 200MHz/400MT/s 6.4 GB/sec
Memory:
Quad MemoryChannels 6.4 GB/sec peak 16 DDR DIMM Sites 32 GB max
I/O Busses: Hot Plug PCI-X up to 133MHz Direct Attached InfiniBand*
Hub Interface 2.0 : 4 pt-to-pt Busses 16 Bits Wide A Total of 4 GB/sec
Scalability Ports: 2 pt-to-pt Connects 16 Bits Wide 6.4 GB/sec Full Duplex
Intel E8870 Block Diagram
I n f i n
i B a n
d * 1 0 0
1 0 0
S P 0
S P 1
Data BusSystem Bus
PCI 32/33
Video
Processor
FWH LPC
MRHD
MRHD
MRHD
MRHD870SNC
4 MemoryChannels
870SIOH
HL1 @266MB/s266MB/s
LPCHL2 HL2
Processor Processor Processor
1 3 3
1 3 3
870P64H2
870P64H2
SCSI
LAN
FWH
BMC
FWH
1 3 3
870ICH
1 0 0
870VXB
870P64H2
EMEA HPTC Virtual Team
Intel Itanium Architecture
Intel E8870 Chipset Architecture
-
8/13/2019 Intel Cornelius
98/125
98Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel E8870 Chipset ArchitectureKey Featuresl Open platform architecture
Efficient use of building blocks End user ease of upgrade value
l Versatile chipset spanningmultiple segments 4 and 8 way Servers Scalability port building block enables
up to 512 way configurations
l Balanced systemperformance Memory, scalability port, I/O bandwidth Maximizes system throughput
l Persistent/scalableinterfaces Reuse spans processor generations Systems scalability headroom
l Robust RAS features
ScalabilityPort Switch
MemoryMemory MemoryMemory
ScalabilityNode
Controller
I/O Hub
PCIPCI--(X)(X)BridgeBridge
LegacyI/O
PCIPCI--(X)(X)BridgeBridge
LegacyI/O
ScalabilityPort Switch
ScalabilityNode
Controller
I/O Hub
PCIPCI--(X)(X)BridgeBridge
PCIPCI--(X)(X)BridgeBridge
Processors Processors
EMEA HPTC Virtual Team
Intel Itanium Architecture
4 Way 4U High Performance
-
8/13/2019 Intel Cornelius
99/125
99Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
4-way Itanium 2 Intel E8870 chipset 16 DDR DIMMs (32GB) PCI-X up to 133MHz
Lower MTBR Tool Less Insertion Extraction Blind Mate Modules No Cable Assembly
4-Way, 4U, High Performance,Modular Platform
EMEA HPTC Virtual Team
Intel Itanium Architecture
-
8/13/2019 Intel Cornelius
100/125
100Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel Itanium 2 Processor
Software Environment
EMEA HPTC Virtual Team
Intel Itanium Architecture
C/C++D t M d l
-
8/13/2019 Intel Cornelius
101/125
101Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
C/C++Data Model
OS Implements the Data ModelsILP32
int, long and ptr are 32 bits Used by 32-bit OSs
LP64 int is 32 bits long and pointer are 64 bits Used by 64-bit UNIX/Linux OSs
P64
int and long are 32 bits; pointer is 64 bits Used by Win64* and Modesto*
3232
3232
3232
ILP32ILP32sizesize
(bits)(bits)
6464
3232
6464
LP64LP64sizesize
(bits)(bits)
3232
3232
6464
P64P64sizesize
(bits)(bits)
longlong
intint
pointer pointer
default settingsdefault settings
EMEA HPTC Virtual Team
Intel Itanium Architecture
OSV Support for Itanium Processor Family
-
8/13/2019 Intel Cornelius
102/125
102Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
pp y
OpenVMS OpenVMS NonStop NonStop Kernel, Kernel,
ConvergedConvergedEnterpr is e UNIX Enterpr is e UNIX
HP-UX*: Fully supported 1.5release now, Version 1.6update planned for 2H '02
Red Hat*, SuSE*, Caldera*,Turbolinux* Linux in
production today
l Windows* XP 64 bit for1-2 way workstations inproduction today
l 64-bit version ofWindows* AdvancedServer, Limited Editionavailable for earlyadopters now
l Windows .Net Serverscheduled for 1H03
l Port to Itaniumarchitectureunderway
l Developer versionstarget 03, productionversions in 04
EMEA HPTC Virtual Team
Intel Itanium Architecture
Software Solution Support
-
8/13/2019 Intel Cornelius
103/125
103Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
High-End Enterprise Applications(Databases, Business Intelligence, ERP / SCM)
Beta version since Q1 02,focusing on optimization
Developer version availablesince Q4 01
DB2 early adopter releaseavailable since 2H 01
Engaged with early adopterend-users, strong performance
Production version targets
mid-02, performance for largedata sets
Initial porting work complete,optimization on-going
Future product plans from: Ariba, Autonomy, BEA, BMC Software, Check PointSoftware, Citrix, Commvault, Computer Associates, Covalent, Entrust, IBM WebSphere,Informix, Intershop, JD Edwards, Manugistics, MigraTEC, Network Associates, Nuance,Oasis, Oblix, Openshop, TimesTen, Tivoli Systems, Verisign, Veritas, Zeus
Software Solution Support
-
8/13/2019 Intel Cornelius
104/125
EMEA HPTC Virtual Team
Intel Itanium Architecture
I t l S ft T l
-
8/13/2019 Intel Cornelius
105/125
105Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel Software Tools
Optimized for on
EMEA HPTC Virtual Team
Intel Itanium Architecture
Intel Software Development Tools
-
8/13/2019 Intel Cornelius
106/125
106Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel Software Development Tools
Compilers
Intel
ThreadingTools
VTune
Performance Analyzer
PerformanceLibraries
SW Products Developer Services
www.intel.com/ids
EMEA HPTC Virtual Team
Intel Itanium Architecture
Intel Compilers
-
8/13/2019 Intel Cornelius
107/125
107Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel Compilers
Targeted for Intel Architecture basedWindows* and Linux* platforms
Optimized for the latest Intelmicroprocessors: Intel Pentium 4 Processor Intel Xeon Processor Intel Itanium Processor Intel Itanium 2 Processor
Auto-vectorization and OpenMP support Integration of CVF technologies in 2003
http://developer.intel.com/software/products/compilers
EMEA HPTC Virtual Team
Intel Itanium Architecture
De elopment Tools for Windo s*
-
8/13/2019 Intel Cornelius
108/125
108Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Development Tools for Windows*
Compilers- MSFT C/C++ Platform SDK - Intel C/C++- Intel Fortran95
Performance Tools- Intel IPP Library
- Intel MKL Library- Intel VTune Performance Analyser- Intel KAI KAP/Pro* Toolset
Java- IBM JDK - BEA JRockit* JDK - TowerJ*
EMEA HPTC Virtual Team
Intel Itanium Architecture
Development Tools for Linux*
-
8/13/2019 Intel Cornelius
109/125
109Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Development Tools for Linux*
Compilers- GNU gcc- Intel C/C++- Intel Fortran95
Performance Tools- Intel IPP Library- Intel MKL Library- Intel VTune Performance Analyzer Collector- Intel KAI KAP/Pro* Toolset- Linux glibc Library
Java- IBM JDK
EMEA HPTC Virtual Team
Intel Itanium Architecture
Intel Software Toolset
-
8/13/2019 Intel Cornelius
110/125
110Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel Software Toolset
KAI OpenMP Intel Compilers Intel Performance Libraries Intel VTune Perf. Analyser
KAI Assure Intel Thread CheckerKAI GuideView Intel Thread Profiler
being integrated during 2003
EMEA HPTC Virtual Team
Intel Itanium Architecture
Intel Compiler Architecture
-
8/13/2019 Intel Cornelius
111/125
111Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel Compiler Architecture
C/C++
Front End
C/C++C/C++
Front EndFront End
Interprocedural analysis and optimizations:inlining, constant prop, whole program detect, mod/ref, points-to
Interprocedural analysis and optimizations:Interprocedural analysis and optimizations:inlininginlining , constant prop, whole program detect, mod/ref, points, constant prop, whole program detect, mod/ref, points --toto
Loop optimizations:data deps, prefetch, scalar repl, unroll/interchange/fusion/dist, auto-parallel/OpenMP
Loop optimizations:Loop optimizations:datadata depsdeps , prefetch, scalar, prefetch, scalar replrepl , unroll/interchange/fusion/dist, auto, unroll/interchange/fusion/dist, auto --parallel/parallel/ OpenMPOpenMP
Global scalar optimizations:partial redundancy elim, dead store elim, strength reduction, dead code elim
Global scalar optimizations:Global scalar optimizations:partial redundancypartial redundancy elimelim , dead store, dead store elimelim , strength reduction, dead code, strength reduction, dead code elimelim
Code generation:predication, software pipelining, global scheduling, register allocation, code generation
Code generation:Code generation:predication, software pipelining, global scheduling, register alpredication, software pipelining, global scheduling, register al location, code generationlocation, code generation
FORTRAN 77/95
Front End
FORTRAN 77/95FORTRAN 77/95
Front EndFront End
D i s
a m
b i g u a
t i o n :
t y p e s , a r r a y , p o
i n t e r , s
t r u c
t u r e ,
d i r e c
t i v e s ,
l d s a f e t y
D i s a m
b i g u a
t i o n :
D i s a m
b i g u a
t i o n :
t y p e s ,
a r r a y , p o
i n t e r , s
t r u c t u r e ,
d i r e c t
i v e s
, l d s a
f e t y
t y p e s ,
a r r a y , p o
i n t e r , s
t r u c t u r e ,
d i r e c t
i v e s
, l d s a
f e t y
M a ch i n
eM
o d el
M a ch i n
eM
o d el
M a ch i n
eM
o d el
P r of i l er
P r of i l er
P r of i l er
EMEA HPTC Virtual Team
Intel Itanium Architecture
Intel Compilers Version 7 0
-
8/13/2019 Intel Cornelius
112/125
112Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel Compilers Version 7.0
Released November 2002 Improved stability and optimization Supports Itanium 2 (-tpp2) More OpenMP 2.0 support Improved C99 standard support Improved gcc compatibility More and better reporting switches New Fortran directives (e.g. PREFETCH) Bridge to Version 8.0 (CVF IVF), improved
compatibility with CVF
EMEA HPTC Virtual Team
Intel Itanium Architecture
Performance Counter
-
8/13/2019 Intel Cornelius
113/125
113
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Light-weight performance analysis tool to complement
VTune Leverage HPs excellent pfmon on Itanium Architecture Linux64
EMEA HPTC Virtual Team
Intel Itanium Architecture
Intel Performance Libraries
-
8/13/2019 Intel Cornelius
114/125
114
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel MKL (Math Kernel Library) Highly optimized library to provide high performance on critical
kernel operations in science and engineering Parallelism built into the library for automatic SMP support Vector Math Library (VML)
Intel IPP (Integrated Performance Primatives) Highly optimized functions to provide high performance on
critical kernel operations for multi-media data types Available on multiple platforms to increases the portability of
performance-based applications
http://developer.intel.com/software/products/perflib
EMEA HPTC Virtual Team
Intel
Itanium Architecture
VML Performance
-
8/13/2019 Intel Cornelius
115/125
115
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
X87
Pentium 4 Processor
Pentium III Processor
Itanium Processor
EMEA HPTC Virtual Team
Intel
Itanium Architecture
Worldwide Support & Solution Centers
-
8/13/2019 Intel Cornelius
116/125
116
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Worldwide Support & Solution Centers
OEM OEM SI/SP
EMEA HPTC Virtual Team
Intel
Itanium Architecture
General Optimizations
-
8/13/2019 Intel Cornelius
117/125
117
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
General Optimizations
-O0: disables optimization -O1: optimizes for speed without increasing code size -O2: optimizes for speed (default) -O3: enables -O2 plus more aggressive optimizations,
may not improve performance for all programs -tpp2: Itanium 2 Code Generation (instruction mix) -fno-alias: assumes no aliasing in program (may be
unsafe) -align: analyzes and reorders memory layout for
variables and arrays (FTN only) -pad: enables changing variable and array memory
layout (FTN only)
EMEA HPTC Virtual Team
Intel
Itanium Architecture
Interprocedural Optimization
-
8/13/2019 Intel Cornelius
118/125
118
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
te p ocedu a Opt at o
Extends optimizations across file boundaries.
Compile & OptimizeCompile & Optimize
Compile & OptimizeCompile & Optimize
Compile & OptimizeCompile & Optimize
Compile & OptimizeCompile & Optimize
file1.c
file2.c
file3.c
file4.c
Without IPO (or withWithout IPO (or with --ipip ))
Compile & OptimizeCompile & Optimize
file1.c
file4.c file2.c
file3.c
With IPOWith IPO
EMEA HPTC Virtual Team
Intel
Itanium Architecture
How IPO Works
-
8/13/2019 Intel Cornelius
119/125
119
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
How IPO Works
foo(optimizedexecutable)
Link programicc -o foo -ipo foo.o
2a. Compiler performs whole-programoptimizations
2b. Compiler invokes linker to produceexecutable
foo.o(fake object file)
Compile programicc -c -ipo foo.c
foo.il
(un-optimizedintermediatelanguage files)
-
8/13/2019 Intel Cornelius
120/125
-
8/13/2019 Intel Cornelius
121/125
EMEA HPTC Virtual Team
Intel
Itanium Architecture
How PGO Works
-
8/13/2019 Intel Cornelius
122/125
122
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
foo(instrumented
executable)Compile+link to add instrumentationicc o foo -prof_gen foo.c
12345678.dyn(dynamic profile)
Execute instrumented program./foo
pgopti.dpi
(merged .dyn files)foo
(optimizedexecutable)
Compile+link using feedback
icc o foo -prof_use foo.c
-
8/13/2019 Intel Cornelius
123/125
EMEA HPTC Virtual Team
Intel
Itanium Architecture
Itanium Tuning Tips
-
8/13/2019 Intel Cornelius
124/125
124
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
g p
Enable the Compiler Software pipelining of key loops Pointer disambiguation in C codes Interprocedural Optimization Profile guided optimizations
Utilize Cache Hierarchy (spacial & temporal locals) Use tuned libraries
Use tuning tools Intel VTune Performance Analyzer
Use Web resources http://developer.intel.com/itanium
-
8/13/2019 Intel Cornelius
125/125
Thank You.
www.intel.com