jaguar microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfcops, they are dispatched...

20
Jaguar Microarchitecture Alex Avery, Cody Smith

Upload: others

Post on 21-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

JaguarMicroarchitecture

Alex Avery, Cody Smith

Page 2: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Agenda

● AMD Processors● Jaguar Overview● Example Hardware● Core Pipeline● Instruction Fetch and Cache● Instruction Decoding● Scheduling● Integer & FP Execution● Memory● Cache

Page 3: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

What is a Microarchitecture?

Microarchitecture is the Computer Organization

Microarchitecture + Instruction Set Architecture = Computer Architecture

A Microarchitecture describes the electrical circuitry of the device, it is how the ISA is implemented.

Page 4: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

AMD Processors● Bobcat (2011)● Piledriver (2012)● Jaguar (2013)● Steamroller (2014)● Puma (2014)● Excavator (2015)

Page 5: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Jaguar Overview

● Targets 2-25W Devices● Low cost● 28 nm Technology● Up to 4 Cores● Split L1 Cache - 32 KiB instruction and 32 KiB data per core● Unified L2 Cache - 1-2 MiB, 16 way● Out-of-order and Speculative Execution● Integrated memory controller● Two-way integer execution● Two-way 128-bit floating-point execution

Page 6: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Example Hardware● Gaming Consoles

○ Xbox One○ PS4

● Desktop Processors○ Athlon 5350○ Sempron 3850

● Laptops/Mini PCs○ A6-5200○ E2-3000

● Tablets○ A6-1450

● Embedded Processors○ GX-420CA

Page 7: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Jaguar Core Pipeline

Page 8: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Instruction Fetch and Cache● 6 Stages● 32KB 2 way set associative L1 cache● Pseudo least recently used (LRU)

replacement algorithm● 32B Instruction fetch window● Branch predictors exploit

characteristics of both direct and indirect branches as well as branch density

Page 9: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Instruction Decoding● Can decode two x86 Instructions per cycle● Variable length x86 instructions are decoded

into complex micro-operations (COPs)● Can handle 128-bit vector units as well as

x86 Advanced Vector Extensions (AVX)

Page 10: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Scheduling● Out-of-order execution● After instructions are decoded into

COPs, they are dispatched● Each COP allocates a Retire

Control Unit (RCU) entry

Page 11: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Integer Execution● Separate Integer and Floating Point

Units● 2 Symmetrical integer pipelines● Integer addition/subtraction takes 3

cycles○ Read operands○ Execute○ Write back

● 6 Cycle multiplication● Separate hardware divider

Page 12: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Floating Point Execution● Designed for 128-bit wide execution● Targets SSE and AVX vector

extensions● 2 Asymmetrical FP pipelines● 4-7 cycles per addition/subtraction

○ Read operands (2 cycles)○ Execute (1-4 cycles)○ Write back (1 cycle)

● Co-processor architecture○ Dedicated decode, rename, out-of-order

scheduler and retire queue

Page 13: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Memory● Separate load and store pipelines● Aggressive re-ordering

○ Loads can occur out-of-order

○ Loads can be moved ahead of stores before the target address is resolved

● Memory Ordering Queue and Store Queue handle memory ordering

Page 14: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

L1 Data Cache● 32KB● 8-way associative● Parity protected writeback cache● Pseudo-LRU replacement algorithm● Can handle a 128-bit read and a 128-bit write each cycle● Average latency of 3 cycles for a L1 hit

Page 15: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

L2 Cache● 1 - 2 MB (depending on application)● 16-way set associative● Unified, shared by 2 to 4 cores● ECC Memory (Error Correcting Code) for tag and data arrays● Forms an EDC/ECC cache structure● Minimum of 25 cycles per hit

Page 16: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Jaguar Benchmarks● Athlon 5350● Athlon 5150● Sempron 3850

Page 17: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Athlon 5350 vs. Intel Core i3 3220 vs. Celeron J1900

Page 18: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Athlon 5350 vs. Intel Core i7 5930KThe Athlon 5350 is much lower performance, however:

● Much better efficiency● Much lower cost● Better performance per

watt● Better performance per

dollar

Page 19: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Zen● Entirely new core design● New design family ‘Summit Ridge’● Simultaneous Multithreading● New Cache System● FinFET manufacturing process

Page 20: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer

Resourceshttp://www.anandtech.com/show/6976/amds-jaguar-architecture-the-cpu-powering-xbox-one-playstation-4-kabini-temash

http://www.realworldtech.com/jaguar/

http://www.tomshardware.com/reviews/microsoft-xbox-one-console-review,3681-3.html

https://nathanlamont91.wordpress.com/2015/03/22/my-report-on-the-amd-jaguar-quad-core-cpu/

https://www.deepdyve.com/lp/institute-of-electrical-and-electronics-engineers/the-floating-point-unit-of-the-jaguar-x86-core-1TVYueOORA

http://www.xbitlabs.com/news/cpu/display/20120904201534_AMD_Discloses_Peculiarities_of_Next_Generation_Jaguar_Micro_Architecture.html