© nvidia corporation 2009 background founded 2006 by nvidia chief scientist david kirk mission:...
Post on 19-Dec-2015
214 views
TRANSCRIPT
© NVIDIA Corporation 2009
Background
Founded 2006 by NVIDIA Chief Scientist David Kirk
Mission: long-term strategic research
Discover & invent new markets
Influence product roadmaps
Follow, support, and focus academic research
Improve parallel computing education
© NVIDIA Corporation 2009
Topics
Visual computingReal-time rendering, cinematic rendering, animation, modeling, visualization, computational photography
Parallel computingProgramming languages, compilers, numerics, HPC applications, architecture, circuit design, interconnects
Mobile computingLow-power computing, networks, HCI
© NVIDIA Corporation 2009
Personnel
Currently 25 full-time researchers in CA, NC, MI, MN, VA, UT, Berlin, Helsinki
2 National Academy members
1 Academy Award
5 recent former faculty
© NVIDIA Corporation 2009
External Research Collaborations
UC Berkeley: parallel programming
UC Davis – parallel algorithms
U British Columbia – imaging, architecture
U North Carolina – ray tracing, hybrid rendering
U Virginia – architecture, perceptual psychology
UCLA – oceanography
U Massachusetts – real-time rendering
Chalmers University – real-time rendering
U Utah – HPC, ray tracing
NC State – rendering algorithms
Johns Hopkins – data-intensive computing
Brown – computer vision
Saarland U – ray tracing
U Illinois – parallel programming
Weta – cinematic rendering
Williams College – real-time rendering
© NVIDIA Corporation 2009
Example: Skin Rendering
Real-time subsurface scattering
Multilayer translucent materials
~5 minutes ~11 ms
No precomputation
Key insight: project diffusion profiles onto sum-of-Gaussians basis
© NVIDIA Corporation 2009
Example: Programming Languages
Copperhead: Cu + Python
Copperhead is a subset of Python, designedfor data parallelism
Python: extant, well accepted high level scripting language
Already understands things like map and reduce
Comes with a parser & lexer
The current Copperhead compiler takes a subset of Python and produces CUDA code
© NVIDIA Corporation 2009
Copperhead is not Pure Python
Copperhead is not for arbitrary Python codeMost features of Python are unsupported
Connecting Python & Copperhead code will require binding similar to Python-C interaction
Copperhead is compiled, not interpreted
Statically typed
Python
Copperhead
© NVIDIA Corporation 2009
Saxpy: Hello world
Some things to notice:Types are implicit
The Copperhead compiler uses a Hindley-Milner type system with typeclasses similar to Haskell
Typeclasses are fully resolved in CUDA via C++ templates
Functional programming:map, lambda (or equivalent in list comprehensions)
you can pass functions around to other functions
Closure: the variable ‘a’ is free in the lambda function, but bound to the ‘a’ in its enclosing scope
def saxpy(a, x, y):return map(lambda xi, yi: a*xi + yi, x, y)
© NVIDIA Corporation 2009
Example: Parallel Programming
thrust is a library of data parallel algorithms & data structures with an interface similar to the C++ Standard Template Library for CUDA
C++ template metaprogramming automatically chooses the fastest code path at compile time
Data Structures
•thrust::device_vector•thrust::host_vector•thrust::device_ptr•Etc.
Algorithms
•thrust::sort•thrust::reduce•thrust::exclusive_scan•Etc.
© NVIDIA Corporation 2009
thrust::sort
sort.cu#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <cstdlib>
int main(void)
{
// generate random data on the host
thrust::host_vector<int> h_vec(1000000);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
// transfer to device and sort
thrust::device_vector<int> d_vec = h_vec;
// sort 140M 32b keys/sec on GT200
thrust::sort(d_vec.begin(), d_vec.end());
return 0;}
© NVIDIA Corporation 2009
thrust::sort
sort.cu#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <cstdlib>
int main(void)
{
// generate random data on the host
thrust::host_vector<int> h_vec(1000000);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
// transfer to device and sort
thrust::device_vector<int> d_vec = h_vec;
// sort 140M 32b keys/sec on GT200
thrust::sort(d_vec.begin(), d_vec.end());
return 0;}
© NVIDIA Corporation 2009
thrust::reduce
reduce.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/reduce.h>
int main(void)
{
// generate random data on the host
thrust::host_vector<int> h_vec(1000000);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
// compute sum
thrust::device_vector<int> d_vec = h_vec;
int x = thrust::reduce(d_vec.begin(), d_vec.end(),
thrust::plus<int>());
return 0;}
© NVIDIA Corporation 2008
Example: Sparse Matrix-Vector
CPU Results from “Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms", Williams et al, Supercomputing 2007
© NVIDIA Corporation 2009
Example: Sort Radix Sorting Rate
1,000 10,000 100,000 1,000,000 10,000,000 -
20,000,000
40,000,000
60,000,000
80,000,000
100,000,000
120,000,000
140,000,000
160,000,000
GTX 280
9800 GTX+
8800 Ultra
8800 GT
8600 GTS
Sequence Size (key-value pairs)
Rad
ix S
ort
ing
Rat
e (p
airs
/sec
)
© NVIDIA Corporation 2009
Example: Fluid Dynamics
HOT
COLD
CIRCULATINGCELLS
INITIALTEMPERATURE
Rayleigh-Bénard Convection
© NVIDIA Corporation 2009
Rayleigh-Bénard Results
Double precision
384 x 384 x 192 grid (max that fits in 4GB)
Vertical slice of temperature at y=0
Transition from stratified (left) to turbulent (right)
Regime depends on Rayleigh number: Ra = gαΔT/κν
8.5x speedup versus Fortran code running on 8-core 2.5 GHz Xeon
© NVIDIA Corporation 2009
Mission: Support Academic Research
Serve as academic liaison
Follow, inform, and influence external research
Direct support – funding and equipment
© NVIDIA Corporation 2009
Sponsored Research
Donate and discount equipment
Professor Partnerships
Ph.D. Fellowships
CUDA Centers of Excellence
New programs:
CUDA Fellows
CUDA Research Awards
© NVIDIA Corporation 2009
Mission: Support Parallel Computing Education
Supporting courses & curricular efforts
Creating & gathering online training materials
Teaching courses (and putting them online)
Writing textbooks
© NVIDIA Corporation 2009
Final Thoughts – Education
We should teach parallel computing in CS 1 or CS 2
Computers don’t get faster, just wider
Manycore is the future of computing
Insertion Sort Heap Sort Merge Sort
Which goes faster on large data?
students need to understand this!
now
ALL Early!
© NVIDIA Corporation 2009
NVIDIA Research SummitSept 30 – Oct 2, 2009 – The Fairmont San Jose, California
A cross-disciplinary forum for researchers using GPUs across science and engineering
Join your colleagues, researchers in other fields, and the NVIDIA Research team for this valuable opportunity to gather, learn, and collaborate.
Share your work with peers from many disciplines; learn from experts at NVIDIA and elsewhere.
In-depth sessions on numeric computing, computational science, visual computing trends, and advanced CUDA programming & optimization
Opportunities:
Call for Posters open. Showcase your work, learn from your peers.
Research Roundtables Moderated discussions led by your peers. Submit a roundtable to shape the hot topics in GPU computing!
Co-located with the GPU Technology Conference, a technical event focused on developers, engineers, researchers, senior executives, venture capitalists, press and analysts