accelerating rigid body simulation on today’s...

36
1 | Accelerating Rigid Body Simulation on Today’s GPUs Accelerating Rigid Body Simulation on Today’s GPUs Takahiro Harada [email protected]

Upload: others

Post on 24-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • 1 | Accelerating Rigid Body Simulation on Today’s GPUs

    Accelerating Rigid Body Simulation

    on Today’s GPUs

    Takahiro Harada

    [email protected]

  • 2 | Accelerating Rigid Body Simulation on Today’s GPUs

    RIGID BODY SIMULATION PIPELINE

    Broad phase collision detection

    – Quick check using bounding volumes

    Narrow phase collision detection

    – Detailed check using geometry

    Solve A

    B

    C

    D

    E

    F

  • 3 | Accelerating Rigid Body Simulation on Today’s GPUs

    WHY USE GPU?

    Large scale simulation requires a lot of bodies

    – Destruction

    Increase the cost of a simulation

    GPU has high-

    – Peak performance

    – Memory bandwidth

    GPU is different from CPU

  • 4 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

  • 5 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Rahul Sathe, Rigid Body Collision

    Detection on the GPU, Sig

    Poster(2006)

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    Cube map

    Sathe(2006)

    Cube map

    ! "#"$%&' $( %) ' **"+"' , %- . /. 0/"' , %' , %/1. %2 34%! "#$%&' "(#)&*+"#$%,- ,. "(#) / 01()%,234 56&7 8"4 &9": )&*"8"4 ,(,%": ) / 01()%,234 5&

    ; "4 01/ 4"56 ($&

    #

    @)#%/ , 5)-6 "C#)+"#$%&41"6 #-' #)-, 2#-0#)B&4A"*)' #(%"#-, )"%' "*)-, 37#

    N"):' #*&, ' -2"%#)B&4A"*)' #@#( , 2#O7#D-3/ %"#J#' +&B' #)+"#(13&%-)+6 #-, #(*)-&, 7#! "#' +&&)#%(9' #0%&6 #*", )%&-2#&0#&4A"*)#@#))+"#?"%)"E#&0#

    &4A"*)#O7#! "#*(1*/ 1()"#)+"#2-' )( , *"#4")B"", #)+"' "#)B$&-, )' #P2JQ#-, #B&%12#' $(*"7#= ' -, 3#)+"#2-' )( , *"#*/ 4"56 ($#1&&R5/ $C#B"#*( , #0-, 2#

    &/ )#+&B#0(%#&4A"*)#@:' #4&/ , 2(%9#-' #0%&6 #-)' #*", )%&-2#-, #)+"#

    2-%"*)-&, #&0#)+"#?"%)"E#&0#&4A"*)#O#P2HQ7#! "#)+", #*&6 $(%"#2H#( , 2#2J7#G0#2H#ST#2JC#)+", #)+"#(*)/ (1#2-' )( , *"#4")B"", #)+"#)B$&-, )' #-' #

    1"' ' #)+(, #)+"#2-' )( , *"#0%&6 #*", )%&-2#)-)' #4&/ , 2(%97#8+-' #6 "( , ' #

    )+()#B+", #2H#ST#2J#)+"#&4A"*)' #(%"#-, )"%' "*)-, 37#! "#%"$"( )#)+-' #' )"$#

    0&%# (11# )+"# ?"%)-*"' # ( , 2# )+", # B-)+# )+"# &4A"*)' # @# ( , 2# O#-, )"%*+(, 3"2C#/ , )-1#)+-' #*&, 2-)-&, #-' #)%/ "7##

    #

    ! "

    # " $

    # " %

    # " &

    # " '

    # $'

    # $$

    # $"

    ! $

    # " "

    , $, "

    , $

    , "-. /012345/0, 6, 78/963:6; 01/03. 6"

    :13? 6! " 6@87. >68A@510B? 526433CB@2

    , $ DE/@546, 78/96F0/G00. 6! " 65. , 6#$"

    H710E/73. 63:6; 01/7E0863:6234=>3. 6$6:13? 6E0. /016! " 63:6234=>3. 6"

    , "

    , " I 6/0

    #C0

  • 6 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Takahiro Harada et al. , Smoothed

    Particle Hydrodynamics on GPUs,

    Proc. of CGI(2007)

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Cube map

    Sathe(2006)

    Cube map

    Uniform

    Grid

    (Atomics)

    Penalty

  • 7 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Takahiro Harada, Real-time Rigid

    Body Simulation on GPUs, GPU

    Gems 3 (2007)

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007) Cube map

    Sathe(2006)

    Cube map Particles

    Uniform

    Grid

    (Atomics)

    Penalty

  • 8 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Scott Le Grand, Broad-phase

    Collision Detection with CUDA,

    GPU Gems3(2007)

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007) Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Particles

    Uniform

    Grid

    (Sort)

    Uniform

    Grid

    (Atomics)

    Penalty

  • 9 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Peter Kipfer, LCP Algorithms for

    Collision Detection using CUDA,

    GPU Gems 3(2007)

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007)

    LCP

    Kipfer(2007)

    Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Geometry Particles

    Uniform

    Grid

    (Sort)

    Uniform

    Grid

    (Atomics)

    Penalty

  • 10 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Takahiro Harada, Parallelizing the

    Physics Pipeline, GDC(2009)

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007)

    LCP

    Kipfer(2007)

    Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Geometry Particles

    Constraint

    Partially

    Serial

    Batching

    Batching

    Harada(2009)

    Uniform

    Grid

    (Sort)

    Uniform

    Grid

    (Atomics)

    Penalty

  • 11 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Richard Tonge, PhysX GPU Rigid

    Bodies in Batman, Game

    Programming Gems 8(2010)

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007)

    LCP

    Kipfer(2007)

    Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Geometry Particles

    Constraint

    Partially

    Serial

    Batching

    Parallel

    Batching

    Constraint

    Batching

    Harada(2009) Batching

    Tonge(2010)

    Uniform

    Grid

    (Sort)

    Uniform

    Grid

    (Atomics)

    Penalty

  • 12 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Liu et al., Real-time Collision Culling

    of a Million Bodies on Graphics

    Processing Units, Siggraph

    Asia(2010)

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007)

    LCP

    Kipfer(2007)

    GPU Sweep & Prune

    Liu(2010)

    Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Geometry Particles

    Uniform

    Grid

    (Atomics)

    Penalty Constraint

    Partially

    Serial

    Batching

    Parallel

    Batching

    Constraint

    Batching

    Harada(2009) Batching

    Tonge(2010)

    Uniform

    Grid

    (Sort)

    Sweep &

    Prune

  • 13 | Accelerating Rigid Body Simulation on Today’s GPUs

    Architecture & Algorithm

  • 14 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU ARCHITECTURE

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    ALU

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    ALU

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    ALU

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    ALU

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    Global Memory

  • 15 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Used GPU as a processor with

    many ALUs

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007)

    LCP

    Kipfer(2007)

    GPU Sweep & Prune

    Liu(2010)

    Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Geometry Particles

    Constraint

    Partially

    Serial

    Batching

    Parallel

    Batching

    Constraint

    Batching

    Harada(2009) Batching

    Tonge(2010)

    Uniform

    Grid

    (Sort)

    Sweep &

    Prune

    Uniform

    Grid

    (Atomics)

    Penalty

  • 16 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU ARCHITECTURE

    Global Memory

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    LDS

    AL

    U

    AL

    U

    ALU

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    ALU

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    LDS

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    LDS

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    LDS

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    LDS

    AL

    U

    AL

    U

    ALU

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    ALU

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    LDS

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    LDS

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    AL

    U

    LDS

    Compute Unit (CU)

  • 17 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    LCP

    – A CU processes a pair

    – Synchronization

    – LDS

    Broad phase collision

    – Radix sort

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007)

    LCP

    Kipfer(2007)

    GPU Sweep & Prune

    Liu(2010)

    Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Particles

    Constraint

    Partially

    Serial

    Batching

    Parallel

    Batching

    Constraint

    Batching

    Harada(2009) Batching

    Tonge(2010)

    Geometry

    Uniform

    Grid

    (Sort)

    Sweep &

    Prune

    Uniform

    Grid

    (Atomics)

    Penalty

  • 18 | Accelerating Rigid Body Simulation on Today’s GPUs

    GPU RIGID BODY TECHNIQUES

    Solver using CU

    Approximated narrowphase using

    CU

    – Performance

    GPU Graphics GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007)

    LCP

    Kipfer(2007)

    GPU Sweep & Prune

    Liu(2010)

    Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Geometry Particles

    Constraint

    Partially

    Serial

    Batching

    Parallel

    Batching

    Constraint

    Batching

    Harada(2009) Batching

    Tonge(2010)

    Uniform

    Grid

    (Sort)

    Sweep &

    Prune

    Batching

    Constraint

    Representati

    on for GPU

    Uniform

    Grid

    (Atomics)

    Penalty

  • 19 | Accelerating Rigid Body Simulation on Today’s GPUs

    Compute Unit Aware

    Rigid Body Simulation

  • 20 | Accelerating Rigid Body Simulation on Today’s GPUs

    PIPELINE

    Copy body and pair buffer

    GPU allocates big buffers

    – Contact

    – Constraints

    Narrow phase and solve is done on the GPU

    Don’t have to read back big buffers

    Bo

    dy

    Pai

    r

    Bo

    dy

    Co

    nta

    ct

    Co

    nst

    rain

    t

    Pai

    r

    Merge

    Dispatch Logic

    CPU

    Broad phase Collision

    GPU

    NP Collision

    Solve

    Body, Pair

    Body Copy

    Copy

    Copy

    Copy

  • 21 | Accelerating Rigid Body Simulation on Today’s GPUs

    NARROWPHASE – CU LEVEL PARALLELIZATION

    Collision of a pair is independent

    Use a SIMD lane for a pair

    – Not enough resource for a SIMD lane

    Use a CU for a pair

    A CU processes and “append” colliding pairs

    – HW accelerated append operation on AMD GPU

    Load balancing

    – CU can fetch a pair from queue

    – Explicit pair split

    CU level parallelization

    Contact buffer

    Pair buffer

    Narrowphase

    CU CU CU CU CU CU CU CU

  • 22 | Accelerating Rigid Body Simulation on Today’s GPUs

    NARROWPHASE – SIMD LANE PARALLELIZATION

    A collision has to be split into parallel works

    Parallel collision detection

    Support arbitrary convex shapes

    Too many contact points

    – Increase cost of solver

    Redundancy

    – Use sync and LDS for vector wide operation

    – Eliminate redundant contacts

    Contact buffer

    Pair buffer

    Narrowphase

    Read

    Reduce

    Write

    Collide

  • 23 | Accelerating Rigid Body Simulation on Today’s GPUs

    SHAPE REPRESENTATION

  • 24 | Accelerating Rigid Body Simulation on Today’s GPUs

    SHAPE REPRESENTATION

  • 25 | Accelerating Rigid Body Simulation on Today’s GPUs

    SOLVER

    Constraint based solver uses a Gauss Seidel method

    – Velocity is input and output

    – Penalty method

    Sequential solve

    Parallel solve is possible only if

    – Bodies are not shared among constraints

    When a body is colliding to several bodies, parallel

    solve cannot be used

    Solution

    – Split constraints into batches

    – Constraints in a batch doesn’t share bodies

  • 26 | Accelerating Rigid Body Simulation on Today’s GPUs

    BATCHING

    2 level batching

    – Global batching

    Split into independent sections

    – Local batching

    Batch in each section

    Contact buffer

    Contact buffer

    Local batching

    CU CU CU CU CU CU CU CU

    Global batching

    CU CU CU CU CU CU CU CU

    Contact buffer

  • 27 | Accelerating Rigid Body Simulation on Today’s GPUs

    ITERATIVE LOCAL BATCHING

    A CU processes a section of contact buffer

    – Local batching Global batching

    – Extract independent pairs in a set

    Stream processing pair buffer

    – Read to local, reorder

    Pairs are localized

    – X Parallel batching

    – X Serial batching

    – O Iterative batching

    Contact buffer

    Local batching

    Read

    Extract Batch

    Write

    Contact buffer

  • 28 | Accelerating Rigid Body Simulation on Today’s GPUs

    SOLVER

    A CU processes a section of contact buffer

    GPU dispatches works by itself

    – Read constraints

    – Fill batch buffer

    – Solve

    – If the batch is done, go to the next batch

    Previous work

    – CPU had to dispatch a kernel per batch

    Our method

    – CPU dispatches some

    – GPU dispatches works by itself

    Solver

    Read

    Solve

    Write

    Contact buffer

    Fill local batch

  • 29 | Accelerating Rigid Body Simulation on Today’s GPUs

    DEMO

    OpenCL

    Radeon HD5870

    20K objects

    About 30fps

    GPU collide and solve ≈ 30ms

    (Inc. data transfer)

  • 30 | Accelerating Rigid Body Simulation on Today’s GPUs

    DEMO

    OpenCL

    Radeon HD5870

    12K objects

    About 30fps

    GPU collide and solve ≈ 30ms

    (Inc. data transfer)

  • 31 | Accelerating Rigid Body Simulation on Today’s GPUs

    OFFLINE DEMO

    400K objects

    GPU collide and solve < 1s

  • 32 | Accelerating Rigid Body Simulation on Today’s GPUs

    LIMITATIONS

    GPU prefers uniform work granularity

    Some works are better to use CPUs

  • 33 | Accelerating Rigid Body Simulation on Today’s GPUs

    HETEROGENEOUS ERA

    GPU Graphics Heterogeneous GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007)

    LCP

    Kipfer(2007)

    GPU Sweep & Prune

    Liu(2010)

    Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Geometry Particles

    Uniform Grid

    (Atomics)

    Penalty Constraint

    Partially

    Serial

    Batching

    Parallel

    Batching

    Constraint

    Batching

    Harada(2009) Batching

    Tonge(2010)

    Uniform

    Grid

    (Sort)

    Sweep &

    Prune

    Batching?

    Constraint?

    Representati

    on for GPU

  • 34 | Accelerating Rigid Body Simulation on Today’s GPUs

    HETEROGENEOUS ERA

    GPU Graphics Heterogeneous GPU Compute

    Bro

    ad p

    hase

    N

    arr

    ow

    phase

    Solv

    e

    SPH

    Harada(2007)

    Particle based rigid body

    Harada(2007)

    LCP

    Kipfer(2007)

    GPU Sweep & Prune

    Liu(2010)

    Cube map

    Sathe(2006)

    Sort based grid

    Le Grand(2007)

    Cube map Geometry Particles

    Uniform Grid

    (Atomics)

    Penalty Constraint

    Partially

    Serial

    Batching

    Parallel

    Batching

    Constraint

    Batching

    Harada(2009) Batching

    Tonge(2010)

    Uniform

    Grid

    (Sort)

    Sweep &

    Prune

    Batching?

    Constraint?

    Representati

    on for GPU

    AMD Fusion

    – Llano, Zacate

    Advantages

    – Shared memory

    – CPU & GPU can work together

  • 35 | Accelerating Rigid Body Simulation on Today’s GPUs

    HETEROGENEOUS PARTICLE BASED SIMULATION G

    PU

    CP

    U0

    CP

    U1

    CP

    U2

    Bu

    ild

    Ac

    c. S

    truc

    ture

    SS

    Co

    llisio

    n

    S

    Integ.

    LS

    Co

    llisio

    n

    LS

    Co

    llisio

    n

    LS

    Co

    llisio

    n

    Synchronization

    Merge Merge Merge

    LL Coll.

    L

    Integ.

    Synchronization

    S S

    ortin

    g

    S S

    ortin

    g

    S S

    ortin

    g

    Synchronization

    Takahiro Harada, Physics Simulation on Fusion Architecture, AMD Fusion

    Developer Summit(2011)

  • 36 | Accelerating Rigid Body Simulation on Today’s GPUs

    REFERENCES

    Peter Kipfer, LCP Algorithms for Collision Detection using CUDA, GPU Gems 3(2007)

    Takahiro Harada et al. , Smoothed Particle Hydrodynamics on GPUs, Proc. of CGI(2007)

    Takahiro Harada, Real-time Rigid Body Simulation on GPUs, GPU Gems 3 (2007)

    Liu et al., Real-time Collision Culling of a Million Bodies on Graphics Processing Units, Siggraph

    Asia(2010)

    Takahiro Harada, Parallelizing the Physics Pipeline, GDC(2009)

    Richard Tonge, PhysX GPU Rigid Bodies in Batman, Game Programming Gems 8(2010)

    Rahul Sathe, Rigid Body Collision Detection on the GPU, Sig Poster(2006)

    Scott Le Grand, Broad-phase Collision Detection with CUDA, GPU Gems3(2007)

    Takahiro Harada, Physics Simulation on Fusion Architecture, AMD Fusion Developer Summint(2011)