![Page 1: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/1.jpg)
NVIDIA’S KEPLER
ARCHITECTURE
Tony Chen
2015
![Page 2: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/2.jpg)
Overview
1. Fermi
2. Kepler a. SMX Architecture
b. Memory Hierarchy
c. Features
3. Improvements
4. Conclusion
5. Brief overlook into Maxwell
![Page 3: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/3.jpg)
Fermi
● ~2010
● 40 nm TSMC (some mobile used 28 nm)
● 16 Streaming Multiprocessors o 32 CUDA cores
o 16 load/store units
o 4 Special Function Units (SFUs)
Sine, cosine, reciprocal, square root
● CUDA core o One Integer FPU + ALU (floating point)
![Page 4: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/4.jpg)
![Page 5: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/5.jpg)
Kepler
● ~2012 - 2014
● 28 nm technology TSMC
● On most GeForce 600, 700, and 800M
series
● Designed with energy efficiency in mind o 2 Kepler cores uses 90% of one Fermi core
● Unified GPU clock
![Page 6: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/6.jpg)
![Page 7: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/7.jpg)
SMX Architecture
● 15 SMX (Next Generation Streaming
Multiprocessor) o 192 single precision CUDA cores
o 64 double precision units
o 32 loads/store units
o 32 SFUs
o 16 texture units
o 65,536 32-bit registers
o 4 Warp Scheduler
![Page 8: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/8.jpg)
![Page 9: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/9.jpg)
Feature Overview
• Quad Warp Scheduler
• Shuffle Instructions
• Texture Improvements
• Atomic Operations
• Memory Hierarchy
• Dynamic Parallelism
• Hyper-Q
• Grid Management Unit
• GPU Direct
• NVENC
• General
improvements/features
![Page 10: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/10.jpg)
Quad Warp Scheduler
● A warp is 32 parallel threads
● Each SMX contains 4 warp
scheduler o Each contains 2 instruction dispatch units
allowing 2 independent instruction per
cycle
o Allows double precision operations
alongside other operations (Fermi did not
allow this)
![Page 11: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/11.jpg)
● Removal of complex hardware that prevents
data hazards o A multi‐port register scoreboard
o dependency checker block
● Used compiler to determine possible hazards o Simple hardware block provides this pre-determined
information to the instruction
● Replaces power expensive hardware stage with
simple hardware block
● Frees up die space
Quad Warp Scheduler (Cont.)
![Page 12: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/12.jpg)
![Page 13: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/13.jpg)
Shuffle Instructions
● Allows threads within a warp to share data o Previously needed separate store and load
operations to pass data to shared memory
● Instead, move the thread so they can access
another thread’s register ● Store and load is carried in a single step
● Reduces amount of shared memory needed
● 6% performance gain in FFT using shuffle
![Page 14: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/14.jpg)
![Page 15: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/15.jpg)
Texture Improvements
● Texture state is now saved in memory o Fermi used a fixed size binding table
Assigned a entry when GPU needed to reference
a texture
Basically resulted in a 128 texture limit
● Obtained on demand
● Reduces CPU overhead and improves GPU
access efficiency
![Page 16: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/16.jpg)
![Page 17: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/17.jpg)
Atomic Operations
● Read, write, modify operations performed
without interruptions from other threads
● Important for parallel programming
● Added atomicMin, atomicMax, atomicAnd,
atomicOr, atomicXor operations
● Native support for 64 bit Atomic ops
![Page 18: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/18.jpg)
Memory Hierarchy
● Configurable 64KB shared memory o 16/32/48 KB L1 cache
o 48/32/16 KB shared memory
● 48 KB read only cache
● 1536 KB L2 cache
● Protected by Single‐Error Correct
Double‐Error Detect (SECDED) ECC code
● More bandwidth at each level compared to
previous
![Page 19: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/19.jpg)
![Page 20: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/20.jpg)
![Page 21: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/21.jpg)
Dynamic Parallelism
● Allows the GPU to
generate, synchronize,
and control new work for
itself
● Traditionally CPU issues
work to the GPU
● Does not need to involve
the CPU for new work
![Page 22: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/22.jpg)
Hyper-Q
● Fermi had 16 concurrent work streams but all
were multiplexed into 1 hardware work queue ● Created false dependencies
● Increased number of hardware managed
connections (work queues) to 32 o Each CUDA stream is internally managed and intra-
stream dependencies are optimized
![Page 23: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/23.jpg)
![Page 24: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/24.jpg)
Grid Management Unit (GMU)
● Grid = group of blocks o block = group of threads
● Manages and prioritizes grids that are to be
passed into the CWD (CUDA Work
Distributor) to be sent to the SMX units for
execution
● Keeps the GPU efficiently utilized
![Page 25: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/25.jpg)
![Page 26: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/26.jpg)
GPU Direct
● Allows direct access to GPU memory from
third party devices. o NICs, SSDs, etc
● Remote Direct Memory Access(RDMA)
● Does not need to involve the CPU
![Page 27: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/27.jpg)
![Page 28: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/28.jpg)
NVENC
● New hardware-based H.264 video encoder
● Previous models used CUDA cores
● 4 times faster while using less power
● Up to 4096x4096 encode
● 16 minute long 1080p, 30 fps video will take
approximately 2 minutes
![Page 29: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/29.jpg)
Improvements of Kepler ● Access up to 255 register per thread
(compared to 63 for Fermi)
● Removal of shader clock o Fermi used a shader clock typically 2x the GPU clock
Achieves higher throughput
Uses more power
o Runs off GPU clock
![Page 30: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/30.jpg)
Cont.
● Up to 4 displays on one card
● 4k support
● GPU Boost o Dynamically scale GPU clock based on operating
conditions
● Adaptive V-sync o Turns off v-sync when frames per sec drops below
60
o Turns on v-sync when above 60 fps
![Page 31: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/31.jpg)
![Page 32: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/32.jpg)
Cont.
● FXAA (Fast Approximate anti-aliasing) o Comparable sharpness to MSAA (Multisample anti-
aliasing)
o Uses less computation power
o Smooths edges using pixels rather than the 3D model
![Page 33: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/33.jpg)
![Page 34: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/34.jpg)
Cont.
● TXAA (Temporal anti-aliasing) o Mix of hardware anti-aliasing, custom CG film style
AA resolve
o high-quality resolve filter to work with the HDR-
correct post processing pipeline
o TXAA 1 offers visual quality on par with 8xMSAA
with the performance hit of 2xMSAA, while TXAA 2
offers image quality that is superior to 8xMSAA, but
with performance comparable to 4xMSAA.
![Page 35: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/35.jpg)
![Page 36: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/36.jpg)
Benchmarks
![Page 37: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/37.jpg)
![Page 38: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/38.jpg)
![Page 39: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/39.jpg)
In Conclusion
● Improve Performance
● Improve energy efficiency
● “Many hands make light work”
![Page 40: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/40.jpg)
Maxwell
• 28nm TSMC
• Early 2014 (ver 1)
• Late 2014 (ver 2 current version)
• GTX 980, 970
• New SM architecture (SMM)
• Efficiency - more active threads per SMM
• Larger shared memory
• Larger L2 cache
![Page 41: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/41.jpg)
![Page 42: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/42.jpg)
![Page 43: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/43.jpg)
![Page 44: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/44.jpg)
![Page 45: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/45.jpg)
![Page 46: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/46.jpg)
![Page 47: NVIDIA KEPLER ARCHITECTUREmeseec.ce.rit.edu/722-projects/spring2015/3-2.pdf · NVIDIA’S KEPLER ARCHITECTURE Tony Chen 2015 . Overview 1. Fermi 2. Kepler a. SMX Architecture b. Memory](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f1f75353927f46e761fd010/html5/thumbnails/47.jpg)
Questions?