-- satya p. vedula intel © – itanium tm architecture
Post on 26-Mar-2015
223 Views
Preview:
TRANSCRIPT
-- Satya P. Vedula
Intel©– ItaniumTM Architecture
Intel – Itanium Architecture
1. History2. Introduction3. Block Diagram4. Pipeline5. Register Set6. Instruction Set7. EPIC8. x86 Compatibility9. Database on Itanium10. Security & Itanium11. Itanium and Java12. Itanium and Win64
Agenda
Intel – Itanium Architecture
History
1978-81 1984
8086/8088 80286
1987-88
80386 DX/SX
1990-92
80486 SX/DX
1993-95
Pentium
1997
Pentium MMX
1 2 3 4 5 5+
29k 134k 275k 1.2M 3.1M 4.5MTransistors
FPU 8087 80287 80387 None/
built-In
built-in built-in
Cache 8k – L1 16k L1 32k L1
Generation
Intel – Itanium Architecture
1995
Pentium Pro
1997
Pentium II
2001
Itanium
1999
Pentium III
2001
Pentium 4
1997
Mobile Pentium
6 6+ 7
5.5M –
7.5M
27.4M 9.3M 42M 25MTransistors
8
Cache 16k L1
512k L2
32k – L1
96k – L2
4M – L3
32k L1
History contd..
Generation
Intel – Itanium Architecture
The Intel® ItaniumTM processor is the first in a family of processors based on the new Itanium architecture.
Introduction - Itanium
Explicitly Parallel Instruction Computing (EPIC) technology enables up to 20 operations/clock.
Three levels of cache reduce memory latency: 2MB or 4MB Level 3 cache, 96K Level 2 cache, and 32K Level 1 cache.
Operating frequencies of 733MHz and 800MHz.
266MHz data bus enables fast system bus transactions with 2.1 GB/sec bandwidth.
Advanced error detection, correction and containment provided by Machine Check Architecture (MCA), comprehensive error logging, and Error Correcting Code (ECC) on caches and the system bus.
IA-32 instruction binary compatibility in hardware.
6.4 giga flops at peak performance
Product Highlights
Intel – Itanium Architecture
2. Block Diagram
Complex block diagram
Simple block diagram
Intel – Itanium Architecture
Itanium – 10 stages
Pentium III - 12-stages
Alpha 21264 – 8 stages
Pentium 4 - 20 stages
Athlon - 10 stages
Pipeline
10 stage In-Order pipeline
Comparison with
others
Intel – Itanium Architecture
general-purpose integer registers (each 64 bits wide), - 128
floating-point registers (each 82 bits wide), - 128
1-bit predicate registers - 64
branch registers - 8
Register Set
Each task can have individual set of registers
Intel – Itanium Architecture
Instructions are 41 bits long.
It takes 7 bits to specify one of 128 GPR
2 source-operand fields and a destination field = 21 bits
Predication = 6 bits (64 combination)
1 Bundles = 128 bits (Instructions are given in bundles)
three 41-bit instructions (making 123 bits), plus one 5-bit template
Instruction categories = 4
integer, load/store, floating-point, and branch operations.
Instruction Set
Intel – Itanium Architecture
-Conditional (predicated) execution
-hinted and speculative loads
(LD.A – Load Advanced, uses special buffer ALAT)
-64 free-form predicate bits
(Earlier Chips have (zero), V (overflow), S (sign), and N (negative) flags )
-One conditional branch with 64 predicate bits
-VLIW features
-Groups of independent instructions
-Simple hardware
-Exploit Instruction Level Parallelism (ILP) with Compiler
EPIC
EPIC: Explicitly Parallel Instruction Computing
It is a combination of features from RISC and VLIWAdvantages
-Large increase in code size
-Blocking caches
Disadvantages
Intel – Itanium Architecture
1. Compare x to 4
2. If not equal go to line 5
3. z = 9
4. go to line 6
5. z = 0
6. // Program continues from here
if (x == 4) z = 9
else z = 0;
1. Compare x to 4 and store result in a
predicate bit (we'll call it A)
2. If A==1; z = 9
3. If A==0; z = 0
EPIC – Power to Compilers
C source code:
Compiled on Pentium
32-bit compiled code
64-bit compiled code
Compiled on Itanium
Intel – Itanium Architecture
Data Speculation
A sequence of instructions which consist of an advanced load, zero or more
instructions dependent on the value of that load, and a check instructionCode speculation
It is a Compiler Concept.
An instruction or a sequence of instructions is executed before it is
known that the dynamic control flow of the program will actually
reach the point in the program where the sequence of instructions is
needed
Prediction
Preprocessing
1) Register use, 2) Loop optimization, 3) Instruction execution
order, and 4) logical program layout
Branch prediction now given to Programmers. For dynamic runtime
branch prediction
EPIC Features
Intel – Itanium Architecture
-Complexity shifts to compilers
-Methods to express compile time information
-Optimized FPUs for multimedia applications
-Reliability and performance – server side
Compiler advantages
EPIC Features contd..
Intel – Itanium Architecture
- Supports all x86 instructions including MMX, SSE (not
SSE2),
Protected, Virtual 8086, and Real mode features
- Run entire OS in x86 mode, or run the applications under
a new IA-64 OS.
- X86 compatible registers: AR24 through AR31
- JMPE: Switch instruction to switch between x86 and new
mode
x86 compatibility
x86 – Register compatibility
Intel – Itanium Architecture
Transistors: 325 million
Processor chip: 25 million
(including L1 and L2 caches)
each of the four L3 cache: 75 million
Pentium III : 24 million
Pentium 4: 42 million
Itanium Code: 2x Pentium (estimated)
30% more than other RISC
How does it looks like?
Intel – Itanium Architecture
Itanium - anatomy
Intel – Itanium Architecture
Photograph of Alpha 21264 Slot B module UltraSPARC-III chipsMIPS 20K processor
IBM Power4 module
Other 64 bit processors
Intel – Itanium Architecture
Overview of the processors
Intel – Itanium Architecture
It’s just beginning
2001
2004
2006
2003
Merced
McKinley
Madison
Deerfield
Itanium Code names
Intel – Itanium Architecture
DatabasesA quantum leap
Intel – Itanium Architecture
The Coming Content “Big Bang”
2000 3B
2001 6B
2002 12B
2003 24B
40,000 BCEcave paintings
bone tools 3500writing
0 C.E.
paper 1051450
printing1870
electricity, telephone
transistor 1947computing 1950
Late 1960sInternet
(DARPA)1993
The web
1999
GIGABYTES
Source: IBM Informix Conference, 2001 Las Vegas
Databases – Storage needs Contd..
Intel – Itanium Architecture
Data Explosion!
• We are in the midst of a data explosion – “The Big Bang”!
• Terabytes of data– Common corporate expression– Petabytes(10^15) & Exabytes(10^18) is fast approaching
• 2-3 Exabytes = total volume of all information generated worldwide annually
• Storage capacities are growing– 72 GB Hard Drive (HD) becoming industry standard– 180 GB High Density HD – in production
Source: IBM Informix Conference, 2001 Las Vegas
Databases – Storage – Requirements
Intel – Itanium Architecture
The Need for Speed
• Memory access speeds desired – long term– Memory latency averaging 235-360 nano
seconds– Max = 256 GB of RAM– 64 bit => 20 Exabytes addressing
capabilities• Disk access speeds are the reality – near term
– Disk latency averaging 3-4 milli seconds – 4 “orders of magnitude slower”
• DW tables contain Billions of rows• Light table Scan – 100 byte row @ 1 GB/s
– ~ 9 million rows/sec– ~ 540 million rows/minute– 5.4 billion rows (500GB) ~ 10 minutes
Source: IBM Informix Conference, 2001 Las Vegas
Databases contd..
Intel – Itanium Architecture
Databases – Itanium advantages
64-bit addressing
Tens of Gigabytes to thousands of Terabytes stored in nanosecond access main memory eliminates millisecond disk access times thus improving application response time.
Large number of Registers and innovative register modelData and intermediate calculations stored in on-chip registers reduce the repetitive load and store of intermediate data values thus improving the response time of an application’s database request.
Instruction set parallelismAbility to execute instructions in parallel allows quick access simultaneously and manipulation of data derived from multiple rows and columns of a large in-memory database table or tables.
Predication
Predication allows the conditional execution of instructions before it is known whether the execution is needed. Predication allows more code to execute in parallel, the performance penalty of branch-dependent code is less, and applications with heavy branching speed Up.
Intel – Itanium Architecture
Databases – Itanium advantages contd..
Control/Data SpeculationControl speculation allows certain load instructions to be scheduled before conditional branch instructions, rather than after. Data speculation is similar to control speculation but allow loads to be scheduled above stores. Both allow a reduction in the CPU wait states generated by branch-intensive code with high latency RAM accesses thus speeding application performance.Instruction/Data Prefetch
Instruction prefetches can be signaled on branch instructions. Data can be prefetched with explicit prefetch instructions. Both prefetches speed application performance by reducing wait states.
Advantages
Big databases like,-Data warehousing-Decision Support-Web-Enabled ERP
Intel – Itanium Architecture
Security
Intel – Itanium Architecture
-Common encryption algorithms run 3-5 times faster
-EPIC parallelism with register rotation makes algorithms more faster
-Performance boost to CAD/CAE applications due to increased floating
point registers
-Performance boost to 3d applications
-82-bit floating-point unit offers high precision
-RSA computations are 512-bits to 1024-bits in length
-New Multiply-Add Instruction comes to aide
-Parallelism comes to aide (2 128-bit computations are performed
in parallel)
-Predication eliminates branches (if) from RSA computations
-RSA, AES, SHA-1 algorithms are improved, as they use only counted
loops utilizing Register Rotation
-Vast number of registers
-Large Physical Memory for Security Cache: Directory Services can be
stored on Memory
-Network traffic can be encrypted
Security
Intel – Itanium Architecture
Security contd..
Performance statistics – Encryption algorithms
RSA
ECC AES DES RC6 SHA
Multi-precision arithmetic X X X X
Multi-precision logical operation
X X X X X
Fixed data rotate X X
Variable data rotate X X X X
Integer multiplication X X X X
Sbox lookup X X X
Logical Operation X X
Intel – Itanium Architecture
Java
Intel – Itanium Architecture
-Garbage Collection
-Object-oriented programming (OOP)
-Byte code vs. native machine code
-Variability of performance because of interpretation
-Multithreaded applications
-Java Native Interface Vs. Native Method Interface
-Network Performance
-Limitations with current architectures
-EJB involves frequent invocation of method calls
-Java needs dynamic bounds checking, null checking, exception handling
-Java has a 64 bit integer data type – long
-Java Object Handles (ObjId) is 64-bit
Java
Common Java Limitations (J2SE 1.3)
Intel – Itanium Architecture
-Streamlined Garbage Collection reduces pause time
-OOP: IBM Java uses Thread Local Heaps allowing variable sized thread local
heaps
-Just-In-Time compiler translates to optimized native code
-Mixed Mode Interpreter does Selective Compilation
-Multi-threading now has light weight and full power mode
-JNI enhanced and NMI removed in Java 2
-N/w Performance: Java Socket API overhead removed
Advantages using IBM Java2
Java Contd..
Intel – Itanium Architecture
-Predication: Branching caused by Java technology’s bounds checking is benefited
-Speculation: Multiway branching allows address locations and data needed for
Java’s bounds and null checks to be prefetched increasing performance
-Instruction Parallelism: Multiple execution units run instructions concurrently
increasing the performance
-Register Set: Smaller methods need not contend for registers as more registers
are available
Advantages using Itanium
Java Contd..
Intel – Itanium Architecture
Win64
Intel – Itanium Architecture
Win64
Type Name What it is
LONG32, INT32 32-bit Signed
LONG64, INT64 64-bit Signed
ULONG32,UNIT32, DWORD32
32-bit Unsigned
ULONG64,UNIT64, DWORD64
64-bit Unsigned
Type Name What it is
INT_PTR,LONG_PTR
Signed Int, Pointer Precision
UINT_PTR,ULONG_PTRDWORD_PTR
Unsigned Int, Pointer Precision
SIZE_T Unsigned Count,Pointer Precision
SSIZE_T Signed Count,Pointer Precision
Win64 data types
Intel – Itanium Architecture
Win64 Issues
Win64 Contd..
- LLP64 issues
-Porting issues (32-bit to 64-bit)
-Polymorphic data usage
-Pointer/length combinations
-RPC and COM
-Supports RPC between IA-32 and IA-64
-Supports LocalServer style (out-of-proc) COM
between IA-32 and IA-64 bit processes
-IA-32 DLL cannot be loaded into 64-bit process
-IA-64 DLL cant be loaded into 32-bit process
-Use COM as out-of-proc (Solves prev 2 problems)
-PnP should be RPCable enabled
Intel – Itanium Architecture
Questions?
top related