© nvidia corporation 2009 background founded 2006 by nvidia chief scientist david kirk mission:...

26
NVIDIA Research Overview David Luebke

Post on 19-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

NVIDIA Research Overview

David Luebke

© NVIDIA Corporation 2009

Background

Founded 2006 by NVIDIA Chief Scientist David Kirk

Mission: long-term strategic research

Discover & invent new markets

Influence product roadmaps

Follow, support, and focus academic research

Improve parallel computing education

© NVIDIA Corporation 2009

Topics

Visual computingReal-time rendering, cinematic rendering, animation, modeling, visualization, computational photography

Parallel computingProgramming languages, compilers, numerics, HPC applications, architecture, circuit design, interconnects

Mobile computingLow-power computing, networks, HCI

© NVIDIA Corporation 2009

Personnel

Currently 25 full-time researchers in CA, NC, MI, MN, VA, UT, Berlin, Helsinki

2 National Academy members

1 Academy Award

5 recent former faculty

© NVIDIA Corporation 2009

External Research Collaborations

UC Berkeley: parallel programming

UC Davis – parallel algorithms

U British Columbia – imaging, architecture

U North Carolina – ray tracing, hybrid rendering

U Virginia – architecture, perceptual psychology

UCLA – oceanography

U Massachusetts – real-time rendering

Chalmers University – real-time rendering

U Utah – HPC, ray tracing

NC State – rendering algorithms

Johns Hopkins – data-intensive computing

Brown – computer vision

Saarland U – ray tracing

U Illinois – parallel programming

Weta – cinematic rendering

Williams College – real-time rendering

© NVIDIA Corporation 2009

Example: Skin Rendering

Real-time subsurface scattering

Multilayer translucent materials

~5 minutes ~11 ms

No precomputation

Key insight: project diffusion profiles onto sum-of-Gaussians basis

© NVIDIA Corporation 2009

Raytracing

© NVIDIA Corporation 2009

NVIRT: CUDA Ray Tracing API

© NVIDIA Corporation 2009

Example: Programming Languages

Copperhead: Cu + Python

Copperhead is a subset of Python, designedfor data parallelism

Python: extant, well accepted high level scripting language

Already understands things like map and reduce

Comes with a parser & lexer

The current Copperhead compiler takes a subset of Python and produces CUDA code

© NVIDIA Corporation 2009

Copperhead is not Pure Python

Copperhead is not for arbitrary Python codeMost features of Python are unsupported

Connecting Python & Copperhead code will require binding similar to Python-C interaction

Copperhead is compiled, not interpreted

Statically typed

Python

Copperhead

© NVIDIA Corporation 2009

Saxpy: Hello world

Some things to notice:Types are implicit

The Copperhead compiler uses a Hindley-Milner type system with typeclasses similar to Haskell

Typeclasses are fully resolved in CUDA via C++ templates

Functional programming:map, lambda (or equivalent in list comprehensions)

you can pass functions around to other functions

Closure: the variable ‘a’ is free in the lambda function, but bound to the ‘a’ in its enclosing scope

def saxpy(a, x, y):return map(lambda xi, yi: a*xi + yi, x, y)

© NVIDIA Corporation 2009

Example: Parallel Programming

thrust is a library of data parallel algorithms & data structures with an interface similar to the C++ Standard Template Library for CUDA

C++ template metaprogramming automatically chooses the fastest code path at compile time

Data Structures

•thrust::device_vector•thrust::host_vector•thrust::device_ptr•Etc.

Algorithms

•thrust::sort•thrust::reduce•thrust::exclusive_scan•Etc.

© NVIDIA Corporation 2009

thrust::sort

sort.cu#include <thrust/host_vector.h>

#include <thrust/device_vector.h>

#include <thrust/generate.h>

#include <thrust/sort.h>

#include <cstdlib>

int main(void)

{

// generate random data on the host

thrust::host_vector<int> h_vec(1000000);

thrust::generate(h_vec.begin(), h_vec.end(), rand);

// transfer to device and sort

thrust::device_vector<int> d_vec = h_vec;

// sort 140M 32b keys/sec on GT200

thrust::sort(d_vec.begin(), d_vec.end());

return 0;}

© NVIDIA Corporation 2009

thrust::sort

sort.cu#include <thrust/host_vector.h>

#include <thrust/device_vector.h>

#include <thrust/generate.h>

#include <thrust/sort.h>

#include <cstdlib>

int main(void)

{

// generate random data on the host

thrust::host_vector<int> h_vec(1000000);

thrust::generate(h_vec.begin(), h_vec.end(), rand);

// transfer to device and sort

thrust::device_vector<int> d_vec = h_vec;

// sort 140M 32b keys/sec on GT200

thrust::sort(d_vec.begin(), d_vec.end());

return 0;}

© NVIDIA Corporation 2009

thrust::reduce

reduce.cu

#include <thrust/host_vector.h>

#include <thrust/device_vector.h>

#include <thrust/generate.h>

#include <thrust/reduce.h>

int main(void)

{

// generate random data on the host

thrust::host_vector<int> h_vec(1000000);

thrust::generate(h_vec.begin(), h_vec.end(), rand);

// compute sum

thrust::device_vector<int> d_vec = h_vec;

int x = thrust::reduce(d_vec.begin(), d_vec.end(),

thrust::plus<int>());

return 0;}

© NVIDIA Corporation 2009

Thrust

thrust.googlecode.com

Open source (Apache2 license)

© NVIDIA Corporation 2008

Example: Sparse Matrix-Vector

CPU Results from “Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms", Williams et al, Supercomputing 2007

© NVIDIA Corporation 2009

Example: Sort Radix Sorting Rate

1,000 10,000 100,000 1,000,000 10,000,000 -

20,000,000

40,000,000

60,000,000

80,000,000

100,000,000

120,000,000

140,000,000

160,000,000

GTX 280

9800 GTX+

8800 Ultra

8800 GT

8600 GTS

Sequence Size (key-value pairs)

Rad

ix S

ort

ing

Rat

e (p

airs

/sec

)

© NVIDIA Corporation 2009

Example: Fluid Dynamics

HOT

COLD

CIRCULATINGCELLS

INITIALTEMPERATURE

Rayleigh-Bénard Convection

© NVIDIA Corporation 2009

Rayleigh-Bénard Results

Double precision

384 x 384 x 192 grid (max that fits in 4GB)

Vertical slice of temperature at y=0

Transition from stratified (left) to turbulent (right)

Regime depends on Rayleigh number: Ra = gαΔT/κν

8.5x speedup versus Fortran code running on 8-core 2.5 GHz Xeon

© NVIDIA Corporation 2009

Mission: Support Academic Research

Serve as academic liaison

Follow, inform, and influence external research

Direct support – funding and equipment

© NVIDIA Corporation 2009

Sponsored Research

Donate and discount equipment

Professor Partnerships

Ph.D. Fellowships

CUDA Centers of Excellence

New programs:

CUDA Fellows

CUDA Research Awards

© NVIDIA Corporation 2009

Mission: Support Parallel Computing Education

Supporting courses & curricular efforts

Creating & gathering online training materials

Teaching courses (and putting them online)

Writing textbooks

© NVIDIA Corporation 2009

Final Thoughts – Education

We should teach parallel computing in CS 1 or CS 2

Computers don’t get faster, just wider

Manycore is the future of computing

Insertion Sort Heap Sort Merge Sort

Which goes faster on large data?

students need to understand this!

now

ALL Early!

Questions?

[email protected]

http://nvidia.com/cuda

© NVIDIA Corporation 2009

NVIDIA Research SummitSept 30 – Oct 2, 2009 – The Fairmont San Jose, California

A cross-disciplinary forum for researchers using GPUs across science and engineering

Join your colleagues, researchers in other fields, and the NVIDIA Research team for this valuable opportunity to gather, learn, and collaborate.

Share your work with peers from many disciplines; learn from experts at NVIDIA and elsewhere.

In-depth sessions on numeric computing, computational science, visual computing trends, and advanced CUDA programming & optimization

Opportunities:

Call for Posters open. Showcase your work, learn from your peers.

Research Roundtables Moderated discussions led by your peers. Submit a roundtable to shape the hot topics in GPU computing!

Co-located with the GPU Technology Conference, a technical event focused on developers, engineers, researchers, senior executives, venture capitalists, press and analysts