synar systems networking and architecture group cmpt 886: computer architecture primer dr. alexandra...
TRANSCRIPT
SYNAR Systems Networking and Architecture GroupSYNAR
Systems Networking and Architecture Group
CMPT 886: Computer Architecture Primer
Dr. Alexandra FedorovaSchool of Computing Science
SFU
SYNAR Systems Networking and Architecture Group
Outline
• Caches• Branch prediction• Out-of-order execution• Instruction Level Parallelism
SYNAR Systems Networking and Architecture Group
Caches
• Level 1 / Level 2 / Level 3• Instruction/Data or unified
SYNAR Systems Networking and Architecture Group
Direct-Mapped Cache
Line size = 32 bytes
Cache eviction
SYNAR Systems Networking and Architecture Group
Set-Associative Cache
• 4-way set associative cache• The data can go into any of the four locations• When the entire set is full, which line should we replace? • LRU – least recently used (LRU stack)
SYNAR Systems Networking and Architecture Group
Cache Hit/Miss
• Cache hit – the data is found in the cache• Cache miss – the data is not in the cache• Miss rate:– misses per instruction– misses per cycle– misses per access (also miss ratio)
• Hit rate:– the opposite
SYNAR Systems Networking and Architecture Group
Cache Miss Latency
• How long you have to wait if you miss in the cache
• Miss in L1 L2 latency (~20 cycles)• Miss in L2 memory latency (~300 cycles)
(if there is no L3)
SYNAR Systems Networking and Architecture Group
Writing in Cache
• Write through – write directly to memory• Write back – write to memory later, when the
line is evicted
SYNAR Systems Networking and Architecture Group
Caches on Multiprocessor Systems
Bus
cache
memory
cachecache
© Herlihy-Shavit 2007
SYNAR Systems Networking and Architecture Group
Processor Issues Load Request
Bus
cache
memory
cachecache
datadata
© Herlihy-Shavit 2007
SYNAR Systems Networking and Architecture Group
Another Processor Issues Load Request
Bus
cache
memory
cachecache
data
dataBus
I got data
dataBus
I want data
© Herlihy-Shavit 2007
SYNAR Systems Networking and Architecture Group
memory
Bus
Processor Modifies Data
cache cachecache
data
datadata
Now other copies are invalid
data
© Herlihy-Shavit 2007
SYNAR Systems Networking and Architecture Group
Send Invalidation Message to Others
memory
Bus
cache cachecache
data
datadata data
Invalidate!
Bus
Other caches lose read permission
No need to change now: other caches can provide valid data
© Herlihy-Shavit 2007
SYNAR Systems Networking and Architecture Group
Processor Asks for Data
memory
Bus
cache cachecache
data
datadata
Bus
I want data
data
© Herlihy-Shavit 2007
SYNAR Systems Networking and Architecture Group
Shared Caches
• Filled on demand• No control over cache shares• An aggressive thread can grab a large cache share, hurt others
Thread 1 Thread 1 Thread 2 Thread 2
Thread 1 Thread 1Thread 1 Thread 1
Thread 1 Thread 1
Thread 1 Thread 1Thread 1 Thread 1Thread 1 Thread 2
SYNAR Systems Networking and Architecture Group
Outline
• Caches• Branch prediction• Out-of-order execution• Instruction Level Parallelism
SYNAR Systems Networking and Architecture Group
Branching and CPU Pipeline
CPU pipeline
SYNAR Systems Networking and Architecture Group
Branching Hurts Pipelining
SYNAR Systems Networking and Architecture Group
Branch Prediction
SYNAR Systems Networking and Architecture Group
Outline
• Caches• Branch prediction• Out-of-order execution• Instruction Level Parallelism
SYNAR Systems Networking and Architecture Group
Out-of-order Execution
• Modern CPUs are super-scalar• They can issue more than one instructions per
clock cycle• If consecutive instructions depend on each
other instruction-level parallelism is limited• To keep the processor going at full speed,
issue instructions out of order
SYNAR Systems Networking and Architecture Group
Speculative Execution
• Out-of-order execution is limited to basic blocks• To go beyond basic blocks, use speculative execution
SYNAR Systems Networking and Architecture Group
Outline
• Caches• Branch prediction• Out-of-order execution• Instruction Level Parallelism
SYNAR Systems Networking and Architecture Group
Instruction-Level Parallelism
• Many programs fail to keep processor busy– Code with lots of loads– Code with frequent and unpredictable branches
• CPU cycles are wasted: power is consumed, no useful work is done
• Running multiple threads on the chip helps this