advanced computer architectureadiaz/arqcomp/02...cisc, risc, advanced memory systems (caches,...
TRANSCRIPT
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 1
Computer OrganizationComputer Organization
Arquitectura de ComputadorasArquitectura de ComputadorasArturo DArturo Dííaz Paz Péérezrez
Centro de InvestigaciCentro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPNLaboratorio de TecnologLaboratorio de Tecnologíías de Informacias de Informacióónn
[email protected]@cinvestav.mx
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 2
Levels of OrganizationLevels of Organization
SPARCstation 20
Processor
Computer
Control
Datapath
Memory Devices
Input
Output
Workstation Design Target:25% of cost on Processor25% of cost on Memory(minimum memory size)Rest on I/O devices,power supplies, box
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 3
The SPARCstation 20The SPARCstation 20
MemoryController SIMM Bus
Memory SIMMs
Slot 1MBus
Slot 0MBus
MSBI
Slot 1SBus
Slot 0SBus
Slot 3SBus
Slot 2SBus
MBus
SEC MACIO
Disk
Tape
SCSIBus
SBus
Keyboard
& Mouse
Floppy
Disk
External Bus
SPARCstation 20
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 4
The Underlying InterconnectThe Underlying Interconnect
SPARCstation 20
MemoryController
SIMM Bus
MSBI
Processor/Mem Bus:MBus
SEC MACIO
Standard I/O Bus:
Sun’s High Speed I/O Bus:SBus
Low Speed I/O Bus:External Bus
SCSI Bus
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 5
Processor and CachesProcessor and Caches
SPARCstation 20
Slot 1MBus
Slot 0MBus
MBus
MBus Module
External Cache
DatapathRegisters
InternalCache Control
Processor
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 6
MemoryMemory
SPARCstation 20
MemoryController
Memory SIMM Bus
SIM
M S
lot 0
SIM
M S
lot 1
SIM
M S
lot 2
SIM
M S
lot 3
SIM
M S
lot 4
SIM
M S
lot 5
SIM
M S
lot 6
SIM
M S
lot 7
DRAM SIMM
DRAM
DRAM
DRAM
DRAMDRAMDRAMDRAM
DRAMDRAMDRAM
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 7
Input and Output (I/O) DevicesInput and Output (I/O) Devices
SPARCstation 20
Slot 1SBus
Slot 0SBus
Slot 3SBus
Slot 2SBus
SEC MACIO
Disk
Tape
SCSIBus
SBus
Keyboard
& Mouse
Floppy
Disk
External Bus
♦
SCSI Bus: Standard I/O Devices♦
SBus: High Speed I/O Devices
♦
External Bus: Low Speed I/O Device
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 8
Standard I/O DevicesStandard I/O Devices
SPARCstation 20
Disk
Tape
SCSIBus
♦
SCSI = Small Computer Systems Interface♦
A standard interface (IBM, Apple, HP, Sun ... etc.)
♦
Computers and I/O devices communicate with each other
♦
The hard disk is one I/O device resides on the SCSI Bus
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 9
High Speed I/O DevicesHigh Speed I/O Devices
SPARCstation 20
Slot 1SBus
Slot 0SBus
Slot 3SBus
Slot 2SBus
SBus
♦
SBus
is SUN’s
own high speed I/O bus♦
SS20 has four SBus
slots where we can plug in
I/O devices♦
Example: graphics accelerator, video adaptor, ... etc.
♦
High speed and low speed are relative terms
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 10
Slow Speed I/O DevicesSlow Speed I/O Devices
SPARCstation 20
Keyboard
& Mouse
Floppy
Disk
External Bus
♦
The are only four SBus
slots in SS20--”seats”
are expensive
♦
The speed of some I/O devices is limited by human reaction time--very very slow by computer standard
♦
Examples: Keyboard and mouse♦
No reason to use up one of the expensive SBus
slot
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 11
SummarySummary
♦
All computers consist of five components■
Processor: (1) datapath
and (2) control
■
(3) Memory■
(4) Input devices and (5) Output devices
♦
Not all “memory”
are created equally■
Cache: fast (expensive) memory are placed closer to the processor
■
Main memory: less expensive memory--we can have more
♦
Interfaces are where the problems are -
between functional units and between the computer and the outside world
♦
Need to design against constraints of performance, power, area and cost
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 12
Summary: Computer System Summary: Computer System ComponentsComponents
Proc
CachesBusses
Memory
I/O Devices:
Controllers
adapters
DisksDisplaysKeyboards
Networks
♦
All have interfaces & organizations
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 13
Processor Architecture ReviewProcessor Architecture Review
Arquitectura de ComputadorasArquitectura de ComputadorasArturo DArturo Dííaz Paz Péérezrez
Centro de InvestigaciCentro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPNLaboratorio de TecnologLaboratorio de Tecnologíías de Informacias de Informacióónn
[email protected]@cinvestav.mx
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 14
Levels of RepresentationLevels of Representation
High Level Language Program
Assembly Language Program
Machine Language Program
Control Signal Specification
Compiler
Assembler
Machine Interpretation
temp = v[k];v[k] = v[k+1];v[k+1] = temp;
lw
$15,
0($2)lw
$16,
4($2)
sw $16,
0($2)
sw $15,
4($2)
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
°°
ALUOP[0:3] <= InstReg[9:11] & MASK
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 15
Execution CycleExecution Cycle
InstructionFetch
InstructionDecode
OperandFetch
Execute
ResultStore
NextInstruction
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor instruction
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 16
Top 10 80x86 InstructionsTop 10 80x86 Instructions
° Rank instruction Integer Average Percent total executed1 load 22%2 conditional branch 20%3 compare 16%4 store 12%5 add 8%6 and 6%7 sub 5%8 move register-register 4%9 call 1%10 return 1%
Total 96%° Simple instructions dominate instruction frequency
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 17
MachineMachine OrganizationOrganization
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 18
Basic Basic ProcessorProcessor ArchitectureArchitecture
What is in a microprocessor today ?♦
Integer Unit (was 32 bits, going to 64)■
Register File
■
ALU■
Logical / Shifts
■
PC Unit♦
Floating Point Unit (64 bits)■
Register File
■
Adder / Multiplier / Divide♦
Virtual Memory Support■
TLB
♦
Memory System (split I/D)■
Fast cache memory, and associated controller
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 19
BlockBlock DiagramDiagram
ICacheICache ITLBITLB DTLBDTLB DCacheDCache
Integer UnitInteger Unit
Floating Point UnitFloating Point Unit
PC Bus Addr Bus
Data BusInst Bus
Quite Simplified•No external interface
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 20
IntegerInteger UnitUnit
Core of the machine
♦
Main
part
is
a 32 bit
(moving
to
64 bit) dataflow
♦
Register
file■
Holds
intermediate
results
■
Almost
all
machines have
at
least
32 registers■
Some
have
register
windows
(Sparc, 2900)
■
Multi-ported■
Need
2 read
/ 1 write
for
each
instruction
■
Bypass
logic
for
pipelinig
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 21
IntegerInteger UnitUnit
♦
Execute
Unit■
Shifter
(bits / bytes)
■
ALU■
Integer
Mult
/ Div
♦
Ld/St
interface■
Address
generation
■
MDRout, MDRin, Addr
registers
♦
Sequences
instructions
(Program
Counter)■
Needs
an
incrementer
and
adder
■
Ports
to
transfer
PC to
/ from
registers■
Some
registers
for
holding state
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 22
FloatingFloating PointPoint UnitUnit
Usually performs IEEE compatible FPHas lots of hard stuff in itDenom, FP exceptions, rounding modesHardware often only does the common case, trap to software
♦
Register file■
16 to 32 double precision (64bit) registers
♦
Adder■
Often pipelined
■
Contains large shifters to align numbers as well as an adder♦
Multiplier■
Build tree multipliers / pipelined
■
Sometimes partial trees and iterate♦
Divider■
Either SRT algorithm, or iterative using the multiplier
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 23
Virtual Virtual MemoryMemory
All modern processor use virtual addresses♦
Internal operations generate a virtual address■
Address needs to be mapped to a physical memory location
■
Mapping is done by the Operating System» Contains protection information too» Allows OS to move virtual memory to disk» Allows OS to run multiple programs on same machine
♦
Problems it causes for the hardware■
Need to translate address before memory fetch
» Need to store all the translation» Translation must be fast
■
Sometimes the requested address is not in memory■
Sometimes the requested translation in not where you want it
» Both cause a machine exception that need to be handled
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 24
Memory TranslationMemory Translation
Translation addresses is usually done using a small cache
♦
Store frequently used translations in a Translation Lookaside
Buffer
■
Really a translation cache■
Usually pretty small, 64-1K entries
■
Stores mapping from virtual page # to physical page #■
Page 4K byte and getting larger
■
New TLB support super-pages (very large pages)
CAM RAM
Virtual Page Physical PageProtection Bits
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 25
MemoryMemory TranslationTranslation
■
Problem
is
what
happens
on
TLB miss ?■
Take
an
exception, or
hardware FSM to
reload
?
CAM RAM
Virtual Page Physical PageProtection Bits
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 26
Memory SystemMemory System
♦
Usually multi-level with caches♦
Need memory to keep up with processor■
Can’t use DRAM
■
Use a fast SRAM to hold working set of program♦
Most accesses to this fast memory
♦
If data is not present, cache misses■
Fetch data from memory (or larger cache)
♦
First level cache often integrated on chip■
Separate I/D caches for more bandwidth
Tag Word0 Word1 Word2 Word3
Cmp Mux
Physical Addr
DataHit ?
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 27
Machine PerformanceMachine Performance
♦
Depends on the average time between instruction fetches
= Ninst
* CPI * Tcycle
♦
Indirectly related to how long it takes to complete an instruction■
Can start next instruction before previous one is finished
■
Relation is set by the amount of ILP and pipeline structure
♦
Also depends on the memory system design■
What percentage of refs. hit in the cache
■
How long does it take when they miss
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 28
PipeliningPipelining
1 2 3 4
1 2 3 41 2 3 4 1 2 3 4
A way of exploiting instruction level parallelism
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 29
PipeliningPipelining
♦
Time per instruction■
TPI = CPI * CPU cycle time
♦
Speedup
■
Requires all stages to be perfectly balanced■
No latch overhead
■
Real speedup will be less
♦
Not visible to programmer■
That is in the ideal case
■
Instruction scheduling depends on the pipeline
SpeedupTPITPI Number of pipeline stageswithout pipeline
with pipeline= =
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 30
PipelinedPipelined ExecutionExecution
♦
Ideally
we
get
this:
♦
But
in real life
there
are pipeline
hazards:■
Structural
» Some
resource
is
not
available
this
cycle■
Data
» Data needed
has not
been
produced
yet■
Control
» Which
instruction
to
execute
is
not
known
Instruction Number Clock number1 2 3 4 5 6 7 8 9
Instruction i IF ID EX MEM WBInstruction i+1 IF ID EX MEM WBInstruction i+2 IF ID EX MEM WBInstruction i+3 IF ID EX MEM WBInstruction i+4 IF ID EX MEM WB
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 31
Modern Processor ArchitectureModern Processor Architecture
R10000 233 Mhz
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 32
Today Conventional MicroprocessorsToday Conventional Microprocessors
♦
Instructions sets■
CISC, RISC,
♦
Advanced memory systems■
(caches, memory, virtual memory)
♦
Advanced Instruction Level Parallelism■
(pipelining, superscalar, vectors, VLIW)
♦
Storage systems (I/O)♦
Interconnection Technology
♦
Basic parallel processing■
Double
core
■
Quad
core
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 33
In 10 Years!In 10 Years!
R10000 233 Mhz
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 34
Using the SiliconUsing the Silicon
PE PE PE
PE PE PE
M
MPP
More Cache
PE
CISC
PE
M
MMX
FFT VIZ RC5
64-way Superscalar
Vector
PE
M
PE
ReconfigurableProcessor
PE
M
ReconfigurableLogic
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Organization- 35
SummarySummary
♦
Modern processors have a pipeline architecture♦
All stages in a pipeline must be balanced
♦
Several resources used for different purposes♦
Not all instructions take the same time■
Clocks per instruction
♦
Memory access instruction are among the most frequent (34%)
♦
Main idea to increase performance is to exploit instruction level parallelism
♦
Performance is major concern in designing modern processors since more space is available