nvidia’s tegra k1 system-on-chip€¦ · nvidia’s tegra k1 system-on-chip . tegra k1 battery...

Michael Ditty, Tegra Architecture

Co-authors: John Montrym, Craig Wittenbrink

NVIDIA’S TEGRA K1 SYSTEM-ON-CHIP

Tegra K1

Battery Saver Core

2x ISP

ARM7 2160p30 VIDEO

ENCODER

2160p30 VIDEO

DECODER AUDIO

USB 3.0

SECURITY ENGINE

HDMI Dual

DISPLAY UART

MIPI DSI/CSI/

E,MMC 4.5

DDR3L LPDDR2 LPDDR3

SPI SDIO

I2S I2C

Kepler

GPU Kepler GPU (192 CUDA Cores)

Open GL 4.4, OpenGL ES3.1+AEP, DX12, CUDA 6

Quad Core Cortex A15 “r3”

With 5th Battery-Saver Core; 2MB L2 cache

Dual Denver CPU

CAMERA Dual High Performance ISP

1.2 Gigapixel throughput, 100MP sensor

POWER Lower Power

28HPM, Battery Saver Core

DISPLAY 4K panel, 4K HDMI

DSI, eDP, LVDS, High Speed HDMI 1.4a

Overview

Kepler into Mobile

Tegra ISP

Power Management

Mobile Enablement

Demo Intro

A Major Discontinuity in Mobile Graphics

ES3.1+AEP, OGL4.4, DX12 Tessellation, Compute Shaders,

ASTC, GPGPU

ES2.0, DX9 Programmable Pixel

Shaders

Mobile Roadmap Meets GeForce

MOBILE ARCHITECTURE

Maxwell

Kepler

Tegra 3

Tegra 4

Tegra K1

GEFORCE ARCHITECTURE

Advancem

Tegra K1

Metric Tegra 4 Tegra K1 Units

FP32 ops 48 384 Per clock

Z-only Primitives 0.1 1 Per clock

Zcull - 256 Pixels/clk

Raster 8 64 Samples/clk

Texture 4 8 Bilinear filters/clk

ZROP 8 64 Samples/clk

L2 size 32 128 KBytes

L2 Cache

GigaThread Engine

Memory Interface

ROP ROP ROP ROP

Raster Engine

Polymorph Engine 2.0

Tegra K1 / Kepler

Graphics Core Architecture 192 CUDA cores

Unified Memory Cache

Dedicated Accelerators Geom / Tessellation

Z Cull

Z / Color ROP

Power Efficiency

Clock and power gating Multi-level Clock Gating

Power Gating

Rail Gating

Architectural power improvements Interconnect and Data Paths architected for mobile

Shader Bypass

GPU L2 Cache and Compression

Work reduction Aggressive Culling Of Z, Stencil, Attribute Fetch

Early Z

DuaI Next Gen ISP

Performance 1.2Gp total pixel throughput

600Mp each ISP

4096 simultaneous focus points

14 bits input

100Mp camera support

Interoperability Reconfigurable ISP fabric

Full GPGPU interoperability

Memory or Isochronous sourcing

Tegra K1 Computational Photography Architecture

Kernels

ATO LS NR LAC0 AP

H1 FB AT1 DS FX

ATO LS NR LAC0 AP

H1 FB AT1 DS FX

Kernels

CPU Kernels

Frame/Image Bus

State Bus

F0 F1 Fn

S0 S1 Sn

GPU + ISP + CPU

GPU Power Management

GPU Idle State Transitions

Active

Idle Transition

Power gating

Rail gating

Multi-core Gaming

Kepler GPU

Encode

Decode

ISP ISP

Display

Multi-core gaming power management

Balance power & performance

across cores and power rails.

Clocking policies must look at

more than active time.

Power optimization must be

done globally, not locally to

each unit.

Multi-core video processing “Live” Local Tone Mapping

Original

Kepler GPGPU Processing

Kepler GPU

Encode

Decode

ISP ISP

Display

Multi-core video processing

Utilize burst performance for latency reduction

Tegra K1 Benchmarks

GFXBench 3.0Manhattan

GFXBench 3.0Trex-HD

AndEBench-Pro

ance r

tive t

Shield Tablet

Competitor X

Shield Portable

Scalability Across Platforms

Mobile Compute

NV JETSON

Tango Tablet Automotive Computer Vision

VisionWorks Toolkit

Renderscript

Tegra K1 Compute Benchmark

Compubench RS (Geometric Mean)

ance r

tive t

Shield Tablet

Competitor X

Consumer Devices

Xiaomi MiPad Shield Tablet

Acer Chromebook 13

NVIDIA Dabbler Improving the user experience with Tegra K1

Watercolor

GPGPU simulates realistic water

Oil painting

3D modeling enables realistic

lighting

Low Pen-to-Ink latency

Optimized GPU rendering paths to

reduce latency.

Conclusion

New capabilities in mobile

Compute, OpenGL 4.4, Advanced Imaging Pipeline

Great performance

Over 2x the performance of current mobile devices

Enabling new platforms and ecosystems

Acknowledgment

We would like to thank the GPU & Tegra teams across NVIDIA who

collaborated to make this chip possible.

nvidia’s tegra k1 system-on-chip€¦ · nvidia’s tegra k1 system-on-chip . tegra k1 battery...

Documents

nvidia’s tegra line of processors for mobile devices2 2

nvidia: a pinmux configuration tool for...

mobile 3d mapping with tegra k1 - gpu technology...

nvidia tegra k1

whitepaper nvidia tegra x1 -...

computer vision - king tec 2169 · nvidia jetson tegra k1...

collision avoidance on nvidia tegra®€¦ · nvidia tegra...

arxiv:1611.01642v1 [cs.cv] 5 nov 2016 · of embedded...

nvidia® tegra software license agreement linux for tegra

tegra k1 processor module en

tegra k1 and the automotive industry - gtc on-demand...

the architecture of graphic processor unit – gpu and cuda...

tegra k1 is the first-ever console-class mobile processor,...

exploring task parallelism for heterogeneous systems using...

tegra k1 embedded platform design guide

beyond exascale (?) - unical › hpc2014 › pdfs ›...

whitepaper nvidia tegra 4 family cpu architecture … figure...

image and vision processing on tegra k1 | gtc...

testdroid helps mobile game developers access nvidia tegra...

cutting edge graphics for android -...