rendering on the gpu · 2010. 11. 19. · visibility the visibility term of the form factor...

60
Rendering on the GPU Rendering on the GPU Tom Fili Tom Fili

Upload: others

Post on 01-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Rendering on the GPURendering on the GPUTom FiliTom Fili

  • AgendaAgenda

    Global Illumination using Global Illumination using RadiosityRadiosityRay TracingRay TracingGlobal Illumination using Global Illumination using RasterizationRasterizationPhoton MappingPhoton MappingRendering with CUDARendering with CUDA

  • Global Illumination using Global Illumination using RadiosityRadiosity

    Global Illumination using Progressive Global Illumination using Progressive Refinement Refinement RadiosityRadiosity by Greg by Greg CoombeCoombeand Mark Harris (GPU GEMS 2: Chapter and Mark Harris (GPU GEMS 2: Chapter 39) 39) The The radiosityradiosity energy is stored in energy is stored in texelstexels, , and fragment programs are used to do and fragment programs are used to do computation. computation.

  • Global Illumination using Global Illumination using RadiosityRadiosity

    It breaks the scene into many small It breaks the scene into many small elements and calculates how much energy elements and calculates how much energy is transferred between the elements.is transferred between the elements.

    Function of the distance and relative Function of the distance and relative orientation.orientation.V is 0 if objects are occluded, 1 if they are V is 0 if objects are occluded, 1 if they are fully visible.fully visible.

  • Global Illumination using Global Illumination using RadiosityRadiosity

    Only works if objects are very small.Only works if objects are very small.To increase speed we use larger areas To increase speed we use larger areas and approximate them with oriented discs.and approximate them with oriented discs.

  • Global Illumination using Global Illumination using RadiosityRadiosity

    The classic The classic radiosityradiosity algorithm solve a large algorithm solve a large system of linear equations composed of the system of linear equations composed of the pairwisepairwise form factors.form factors.These equations describe the These equations describe the radiosityradiosity of an of an element as a function of the energy from every element as a function of the energy from every other element, weighted by their form factors other element, weighted by their form factors and the element's reflectance, and the element's reflectance, rr..The classical linear system requires The classical linear system requires OO((NN 22) ) storage, which is prohibitive for large scenes. storage, which is prohibitive for large scenes.

  • Progressive RefinementProgressive Refinement

    Instead we use Progressive refinement.Instead we use Progressive refinement.Each element in the scene maintains two Each element in the scene maintains two energy values: an energy values: an accumulatedaccumulated energy energy value and value and residualresidual (or "(or "unshotunshot") energy.") energy.All energy values are set to 0 except the All energy values are set to 0 except the residual energy of light sources. residual energy of light sources.

  • Progressive RefinementProgressive Refinement

    To implement this on the GPU we use 2 To implement this on the GPU we use 2 textures (accumulated and residual) for textures (accumulated and residual) for each element.each element.We render from the POV of the shooter.We render from the POV of the shooter.Then we iterate over receiving elements Then we iterate over receiving elements and test for visibility.and test for visibility.We then draw each visible element into We then draw each visible element into the frame buffer and use a fragment the frame buffer and use a fragment program to compute the form factor.program to compute the form factor.

  • Progressive RefinementProgressive Refinement

    initialize shooter residual Einitialize shooter residual Ewhile not convergedwhile not converged{{

    render scene from POV of shooterrender scene from POV of shooterfor each receiving element for each receiving element {{

    if element is visibleif element is visible{{

    compute form factor FFcompute form factor FFDE = r * FF * EDE = r * FF * Eadd DE to residual textureadd DE to residual textureadd DE to add DE to radiosityradiosity texturetexture

    } } } } shooter's residual E = 0shooter's residual E = 0compute next shootercompute next shooter

    } }

  • VisibilityVisibilityThe visibility term of the form factor equation is The visibility term of the form factor equation is usually computed using a usually computed using a hemicubehemicube..

    The scene is rendered onto the five faces of a cube The scene is rendered onto the five faces of a cube map, which is then used to test visibility.map, which is then used to test visibility.

    Instead, we can avoid rendering the scene five Instead, we can avoid rendering the scene five times by using a vertex program to project the times by using a vertex program to project the vertices onto a hemisphere.vertices onto a hemisphere.

    The The hemispherical projectionhemispherical projection, also known as a , also known as a stereographic projectionstereographic projection, allows us to compute the , allows us to compute the visibility in only one rendering pass.visibility in only one rendering pass.The objects must be The objects must be tesselatedtesselated at a higher level to at a higher level to conform to the hemisphere.conform to the hemisphere.

  • VisibilityVisibilityvoid hemiwarp(float4 Position: POSITION, // World Posvoid hemiwarp(float4 Position: POSITION, // World Pos

    uniform half4x4 uniform half4x4 ModelViewModelView, // , // ModelviewModelview MatrixMatrixuniform half2 uniform half2 NearFarNearFar, // Near/Far planes, // Near/Far planesout float4 out float4 ProjPosProjPos: POSITION) // Projected Pos: POSITION) // Projected Pos

    {{// transform the geometry to camera space// transform the geometry to camera space

    half4 half4 mposmpos = = mul(ModelViewmul(ModelView, Position);, Position);

    // project to a point on a unit hemisphere// project to a point on a unit hemispherehalf3 half3 hemi_pthemi_pt = normalize( = normalize( mpos.xyzmpos.xyz ););

    // Compute (// Compute (ff--nn), but let the hardware divide z by ), but let the hardware divide z by thisthis// in the w component (so // in the w component (so premultiplypremultiply x and y)x and y)half half f_minus_nf_minus_n = = NearFar.yNearFar.y -- NearFar.xNearFar.x;;ProjPos.xyProjPos.xy = = hemi_pt.xyhemi_pt.xy * * f_minus_nf_minus_n;;

    // compute depth // compute depth projproj. independently,. independently,// using OpenGL orthographic// using OpenGL orthographicProjPos.zProjPos.z = (= (--2.0 * 2.0 * mpos.zmpos.z -- NearFar.yNearFar.y -- NearFar.xNearFar.x););

    ProjPos.wProjPos.w = = f_minus_nf_minus_n;;}}

    boolbool Visible(half3 Visible(half3 ProjPosProjPos, , // camera// camera--space posspace posuniform fixed3 uniform fixed3 RecvIDRecvID, // ID of receiver , // ID of receiver sampler2D sampler2D HemiItemBufferHemiItemBuffer ))

    {{// Project the // Project the texeltexel element onto the hemisphereelement onto the hemispherehalf3 half3 projproj = = normalize(ProjPosnormalize(ProjPos););

    // Vector is in [// Vector is in [--1,1], scale to [0..1] for texture lookup1,1], scale to [0..1] for texture lookupproj.xyproj.xy = = proj.xyproj.xy * 0.5 + 0.5;* 0.5 + 0.5;

    // Look up projected point in hemisphere item buffer// Look up projected point in hemisphere item bufferfixed3 fixed3 xtexxtex = tex2D(HemiItemBuffer, = tex2D(HemiItemBuffer, proj.xyproj.xy););

    // Compare the value in item buffer to the// Compare the value in item buffer to the// ID of the fragment// ID of the fragmentreturn return all(xtexall(xtex == == RecvIDRecvID););

    }}

    Projection Vertex Program Visibility Test Fragment Program

  • Form Factor ComputationForm Factor Computationhalf3 half3 FormFactorEnergyFormFactorEnergy((

    half3 half3 RecvPosRecvPos, // world, // world--space position of this elementspace position of this elementuniform half3 uniform half3 ShootPosShootPos, // world, // world--space position of shooterspace position of shooterhalf3 half3 RecvNormalRecvNormal, // world, // world--space normal of this elementspace normal of this element

    uniform half3 uniform half3 ShootNormalShootNormal, // world, // world--space normal of shooterspace normal of shooteruniform half3 uniform half3 ShootEnergyShootEnergy, // energy from shooter residual texture, // energy from shooter residual textureuniform half uniform half ShootDAreaShootDArea, // the delta area of the shooter, // the delta area of the shooter

    uniform fixed3 uniform fixed3 RecvColorRecvColor ) // the reflectivity of this element) // the reflectivity of this element{{

    // a normalized vector from shooter to receiver// a normalized vector from shooter to receiverhalf3 r = half3 r = ShootPosShootPos -- RecvPosRecvPos;;half distance2 = half distance2 = dot(rdot(r, r);, r);r = r = normalize(rnormalize(r););

    // the angles of the receiver and the shooter from r// the angles of the receiver and the shooter from rhalf half cosicosi = = dot(RecvNormaldot(RecvNormal, r);, r);half half cosjcosj = = --dot(ShootNormaldot(ShootNormal, r);, r);

    // compute the disc approximation form factor// compute the disc approximation form factorconst half pi = 3.1415926535;const half pi = 3.1415926535;half half FijFij = = max(cosimax(cosi * * cosjcosj, 0) / (pi * distance2 + , 0) / (pi * distance2 + ShootDAreaShootDArea););FijFij *= Visible(); // returns visibility as 0 or 1*= Visible(); // returns visibility as 0 or 1

    // Modulate shooter's energy by the receiver's reflectivity// Modulate shooter's energy by the receiver's reflectivity// and the area of the shooter.// and the area of the shooter.half3 delta = half3 delta = ShooterEnergyShooterEnergy * * RecvColorRecvColor * * ShootDAreaShootDArea * * FijFij;;

    return delta;return delta;}}

  • Adaptive SubdivisionAdaptive Subdivision

    We create smaller elements along areas that We create smaller elements along areas that need more detail (need more detail (egeg. Shadow edges).. Shadow edges).Reuse same algorithms except we compute Reuse same algorithms except we compute visibility on the leaf nodes.visibility on the leaf nodes.We evaluate a gradient of the We evaluate a gradient of the radiosityradiosity and if its and if its above a certain threshold weabove a certain threshold wediscard it.discard it.If we discard enough fragments then If we discard enough fragments then we subdivide the current node.we subdivide the current node.

  • PerformancePerformance

    Can render a 10,000 element version of Cornell Can render a 10,000 element version of Cornell Box at 2 fps.Box at 2 fps.To get this we need to make some optimizationsTo get this we need to make some optimizations

    Use occlusion queries in visibility passUse occlusion queries in visibility passShoot rays a lower resolution than the texture.Shoot rays a lower resolution than the texture.Batch together multiple shooters.Batch together multiple shooters.Use lower resolution textures to compute indirect Use lower resolution textures to compute indirect lighting. Compute direct lighting separately and add in lighting. Compute direct lighting separately and add in later.later.

  • Global Illumination using Global Illumination using RadiosityRadiosity

  • Ray TracingRay Tracing

    Ray Tracing on Programmable Graphics Ray Tracing on Programmable Graphics HardwareHardware by Timothy J. Purcell, et al. by Timothy J. Purcell, et al. SiggraphSiggraph 20022002Shows how to design a streaming ray Shows how to design a streaming ray tracer that is designed to be run on parallel tracer that is designed to be run on parallel graphics hardware.graphics hardware.

  • Streaming Ray TracerStreaming Ray Tracer

    MultiMulti--pass algorithmpass algorithmDivides the scene into a uniform grid, Divides the scene into a uniform grid, which is represented by a 3D texture.which is represented by a 3D texture.Split the operation into 4 kernels executed Split the operation into 4 kernels executed as fragment programs.as fragment programs.Uses the stencil buffer to keep track of Uses the stencil buffer to keep track of which pass a ray is on.which pass a ray is on.

  • StorageStorage

    Grid TextureGrid Texture3D Texture3D Texture

    Triangle ListTriangle List1D Texture1D TextureSingle ChannelSingle Channel

    TriangleTriangle--Vertex ListVertex List1D Texture1D Texture3 Channel (RGB)3 Channel (RGB)

  • Eye Ray GeneratorEye Ray Generator

    Simplest of the kernels.Simplest of the kernels.Given the camera parameters it generates Given the camera parameters it generates a ray for each screen pixel.a ray for each screen pixel.A fragment program is invoked for each A fragment program is invoked for each pixel which generates a ray.pixel which generates a ray.Also tests rays against the sceneAlso tests rays against the scene’’s s bounding volume and terminates the ones bounding volume and terminates the ones outside the volume.outside the volume.

  • TraverserTraverser

    For each ray it steps through the grid.For each ray it steps through the grid.A pass is required for each step through A pass is required for each step through the grid.the grid.If a If a voxelvoxel contains triangles, then the ray is contains triangles, then the ray is marked to run the intersection kernel on marked to run the intersection kernel on triangles in that triangles in that voxelvoxel..If not, then it continues stepping through If not, then it continues stepping through the grid.the grid.

  • IntersectorIntersector

    Tests the ray for intersection with all Tests the ray for intersection with all triangles within a triangles within a voxelvoxel..A pass is required for each rayA pass is required for each ray--triangle triangle intersection test.intersection test.If an intersection occurs then the ray is If an intersection occurs then the ray is marked for execution in the shading stage.marked for execution in the shading stage.If not the ray continues in the traversal If not the ray continues in the traversal stage.stage.

  • Intersection Intersection ShaderShader ((Pseudo)CodePseudo)Codefloat4 float4 IntersectTriangleIntersectTriangle( float3 ( float3 roro, float3 rd, , float3 rd, intint list pos, float4 h )list pos, float4 h ){{

    float tri id = texture( list pos, float tri id = texture( list pos, trilisttrilist ););float3 v0 = texture( tri id, v0 );float3 v0 = texture( tri id, v0 );float3 v1 = texture( tri id, v1 );float3 v1 = texture( tri id, v1 );float3 v2 = texture( tri id, v2 );float3 v2 = texture( tri id, v2 );float3 edge1 = v1 float3 edge1 = v1 -- v0;v0;float3 edge2 = v2 float3 edge2 = v2 -- v0;v0;float3 float3 pvecpvec = Cross( rd, edge2 );= Cross( rd, edge2 );float float detdet = Dot( edge1, = Dot( edge1, pvecpvec ););float inv float inv detdet = 1/det;= 1/det;float3 float3 tvectvec = = roro -- v0;v0;float u = Dot( float u = Dot( tvectvec, , pvecpvec ) * inv ) * inv detdet;;float3 float3 qvecqvec = Cross( = Cross( tvectvec, edge1 );, edge1 );float v = Dot( rd, float v = Dot( rd, qvecqvec ) * inv ) * inv detdet;;float t = Dot( edge2, float t = Dot( edge2, qvecqvec ) * inv ) * inv detdet;;boolbool validhitvalidhit = select( u >= 0.0f, true, false );= select( u >= 0.0f, true, false );validhitvalidhit = select( v >= 0, = select( v >= 0, validhitvalidhit, false );, false );validhitvalidhit = select( = select( u+vu+v = 0, validhitvalidhit, false );, false );t = select( t = select( validhitvalidhit, t, h[0] );, t, h[0] );u = select( u = select( validhitvalidhit, u, h[1] );, u, h[1] );v = select( v = select( validhitvalidhit, v, h[2] );, v, h[2] );float id = select( float id = select( validhitvalidhit, tri id, h[3] );, tri id, h[3] );

    return float4( ft, u, v, return float4( ft, u, v, idgidg ););}}

  • ShaderShader

    This adds the shading for the pixel.This adds the shading for the pixel.It also generates new rays and marks It also generates new rays and marks them for processing in a future rendering them for processing in a future rendering pass.pass.Also gives new rays a weight so the color Also gives new rays a weight so the color can be simply added.can be simply added.

  • Global Illumination using Global Illumination using RasterizationRasterization

    HighHigh--Quality Global Quality Global Illumination Rendering Illumination Rendering Using Using RasterizationRasterization by by Toshiya Hachisuka (GPU Toshiya Hachisuka (GPU GEMS 2: Chapter 38) GEMS 2: Chapter 38) Instead of adapting global Instead of adapting global illumination algorithms to illumination algorithms to the GPU, it makes use of the GPU, it makes use of the the GPUGPU’’ss rasterizationrasterizationhardware.hardware.

  • TwoTwo--pass methodspass methods

    First pass uses photon mapping or First pass uses photon mapping or radiosityradiosity to compute a rough to compute a rough approximation of illumination.approximation of illumination.In the second pass, the first pass result is In the second pass, the first pass result is refined and rendered.refined and rendered.The most common way to use the first The most common way to use the first pass is as a source of indirect illumination.pass is as a source of indirect illumination.

  • Final GatheringFinal Gathering

    The process of final gathering is used to The process of final gathering is used to compute the amount of indirect light by compute the amount of indirect light by shooting a large amount of rays.shooting a large amount of rays.

    This can be the bottleneck.This can be the bottleneck.Sampling and interpolation is used to Sampling and interpolation is used to speed it up.speed it up.

    This can lead to rendering artifacts.This can lead to rendering artifacts.

  • Final Gathering via Final Gathering via RasterizationRasterization

    PrecomputesPrecomputes directions and traces all of the rays directions and traces all of the rays at once using at once using rasterizationrasterization..This is done with a parallel projection of the This is done with a parallel projection of the scene along the current direction or the scene along the current direction or the global global ray directionray direction..

  • Depth PeelingDepth Peeling

    Each depth layer is a subsection of the Each depth layer is a subsection of the scene.scene.Shoot a ray in the opposite direction of the Shoot a ray in the opposite direction of the global ray direction.global ray direction.This can be achievedThis can be achievedby rendering multipleby rendering multipletimes using a greatertimes using a greaterthan depth test.than depth test.

  • Depth PeelingDepth Peeling

    Step through the depth layers, computing Step through the depth layers, computing the indirect illumination until no fragments the indirect illumination until no fragments are rendered.are rendered.Repeat with anotherRepeat with anotherglobal ray direction global ray direction until the number ofuntil the number ofsamplings is sufficient.samplings is sufficient.

  • RenderingRenderingThis method only computes indirect illumination.This method only computes indirect illumination.The first rendering pass can be done with any The first rendering pass can be done with any CPU or GPU method that computes the CPU or GPU method that computes the irradiance distribution.irradiance distribution.

    They suggest Grid Photon Mapping.They suggest Grid Photon Mapping.We use this in the final gathering pass.We use this in the final gathering pass.Direct illumination must be computed with a realDirect illumination must be computed with a real--time shadowing technique.time shadowing technique.

    They suggest shadow mapping and stencil shadows.They suggest shadow mapping and stencil shadows.Direct and indirect illumination are summed Direct and indirect illumination are summed before the final rendering.before the final rendering.

  • P e r f o r m a n c eP e r f o r m a n c e

    Its hard to compare performance because Its hard to compare performance because the algorithms are very different.the algorithms are very different.Performance is similar to CPU based Performance is similar to CPU based sampling/interpolation methods.sampling/interpolation methods.Performance is much faster than a CPU Performance is much faster than a CPU method that would sample all pixels.method that would sample all pixels.

  • Global Illumination using Global Illumination using RasterizationRasterization

  • Photon MappingPhoton Mapping

    Photon Mapping on Programmable Photon Mapping on Programmable Graphics HardwareGraphics Hardware by Timothy J. Purcell, by Timothy J. Purcell, et al. et al. SiggraphSiggraph 20032003

  • Photon TracingPhoton Tracing

    Each pass of the photon tracing reads Each pass of the photon tracing reads from the previous frame.from the previous frame.At each surface interaction a photon is At each surface interaction a photon is written to the texture and another is written to the texture and another is emitted.emitted.The initial frame has the photons on the The initial frame has the photons on the light sources and their random directions.light sources and their random directions.The direction of each photon bounce are The direction of each photon bounce are computed from a random number texture.computed from a random number texture.

  • Photon Map Data StructurePhoton Map Data StructureThe original photon map algorithm uses a The original photon map algorithm uses a balanced balanced kk--dd tree for locating the nearest tree for locating the nearest photons.photons.This structure makes it possible to quickly locate This structure makes it possible to quickly locate the nearest photons at any point.the nearest photons at any point.It requires random access writes to construct It requires random access writes to construct efficiently.efficiently.

    This can be slow on the GPU.This can be slow on the GPU.Instead we use a uniform grid for storing the Instead we use a uniform grid for storing the photons.photons.

    BitonicBitonic Merge Sort Merge Sort –– Fragment programFragment programStencil Routing Stencil Routing –– Vertex programVertex program

  • Fragment Program MethodFragment Program Method

    We can Index the photons by grid cell and We can Index the photons by grid cell and sort them by cell.sort them by cell.Then find the index of the first photon in Then find the index of the first photon in each cell using a binary search.each cell using a binary search.BitonicBitonic Merge Sort is a parallel sorting Merge Sort is a parallel sorting algorithm that takes O(logalgorithm that takes O(log22n) steps.n) steps.It can be implemented as a fragment It can be implemented as a fragment program with each rendering pass being program with each rendering pass being one stage of the sort.one stage of the sort.

  • BitonicBitonic Merge SortMerge Sort

    1

    2

    3

    4

    5

    6

    7

    8

    8x monotonic lists: (3) (7) (4) (8) (6) (2) (1) (5)4x bitonic lists: (3,7) (4,8) (6,2) (1,5)

  • BitonicBitonic Merge SortMerge Sort

    1

    2

    3

    4

    5

    6

    7

    8

    Sort the bitonic lists

  • BitonicBitonic Merge SortMerge Sort

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    4x monotonic lists: (3,7) (8,4) (2,6) (5,1)2x bitonic lists: (3,7,8,4) (2,6,5,1)

  • BitonicBitonic Merge SortMerge Sort

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    Sort the bitonic lists

  • BitonicBitonic Merge SortMerge Sort3

    8

    4

    7

    2

    6

    1

    5

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    Sort the bitonic lists

  • BitonicBitonic Merge SortMerge Sort3

    8

    4

    7

    2

    6

    1

    5

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    Sort the bitonic lists

  • BitonicBitonic Merge SortMerge Sort3

    7

    4

    8

    2

    5

    1

    6

    3

    8

    4

    7

    2

    6

    1

    5

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    2x monotonic lists: (3,4,7,8) (6,5,2,1)1x bitonic list: (3,4,7,8, 6,5,2,1)

  • BitonicBitonic Merge SortMerge Sort3

    7

    4

    8

    2

    5

    1

    6

    3

    8

    4

    7

    2

    6

    1

    5

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    Sort the bitonic list

  • BitonicBitonic Merge SortMerge Sort3

    2

    4

    1

    7

    5

    8

    6

    3

    7

    4

    8

    2

    5

    1

    6

    3

    8

    4

    7

    2

    6

    1

    5

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    Sort the bitonic list

  • BitonicBitonic Merge SortMerge Sort3

    2

    4

    1

    7

    5

    8

    6

    3

    7

    4

    8

    2

    5

    1

    6

    3

    8

    4

    7

    2

    6

    1

    5

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    Sort the bitonic list

  • BitonicBitonic Merge SortMerge Sort2

    3

    1

    4

    7

    5

    8

    6

    3

    2

    4

    1

    7

    5

    8

    6

    3

    7

    4

    8

    2

    5

    1

    6

    3

    8

    4

    7

    2

    6

    1

    5

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    Sort the bitonic list

  • BitonicBitonic Merge SortMerge Sort2

    3

    1

    4

    7

    5

    8

    6

    3

    2

    4

    1

    7

    5

    8

    6

    3

    7

    4

    8

    2

    5

    1

    6

    3

    8

    4

    7

    2

    6

    1

    5

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    Sort the bitonic list

  • BitonicBitonic Merge SortMerge Sort1

    3

    2

    4

    7

    6

    8

    5

    2

    3

    1

    4

    7

    5

    8

    6

    3

    2

    4

    1

    7

    5

    8

    6

    3

    7

    4

    8

    2

    5

    1

    6

    3

    8

    4

    7

    2

    6

    1

    5

    1

    2

    3

    4

    5

    6

    7

    8

    3

    8

    7

    4

    5

    6

    1

    2

    Done!

  • Fragment Program MethodFragment Program MethodBinary search can be used to locate the Binary search can be used to locate the contiguous block of photons occupying a given contiguous block of photons occupying a given grid cell.grid cell.We compute an array of the indices of the first We compute an array of the indices of the first photon in every cell.photon in every cell.

    If no photon is found for a cell, the first photon in the If no photon is found for a cell, the first photon in the next grid cell is located.next grid cell is located.

    The simple fragment program implementation of The simple fragment program implementation of binary search requires binary search requires OO(log(lognn) photon lookups. ) photon lookups. All of the photon lookups can be unrolled into a All of the photon lookups can be unrolled into a single rendering pass.single rendering pass.

  • Fragment Program MethodFragment Program Method

  • Vertex Program MethodVertex Program Method

    Since the Since the BitonicBitonic Merge Sort can add many Merge Sort can add many rendering passes, it may not be useful for rendering passes, it may not be useful for interactive rendering.interactive rendering.You can use a Stencil Routing to route photons You can use a Stencil Routing to route photons to each grid cell in one rendering pass.to each grid cell in one rendering pass.Each grid cell covers a Each grid cell covers a m m x x mm set of pixels.set of pixels.Draw a point with a point size of Draw a point with a point size of mm and then use and then use the stencil buffer to send the photon to the the stencil buffer to send the photon to the correct fragment.correct fragment.

  • Vertex Program MethodVertex Program Method

  • Vertex Program MethodVertex Program Method

    There are two draw backs to this methodThere are two draw backs to this methodWe must read from a photon texture which We must read from a photon texture which requires a requires a readbackreadback..We allocate a fixed amount of memory so we We allocate a fixed amount of memory so we must redistribute the power for cells with must redistribute the power for cells with greater than greater than mm22 photons and space is wasted photons and space is wasted if there is less.if there is less.

  • Radiance EstimateRadiance Estimate

    We accumulate a radiance value based on We accumulate a radiance value based on predefined number of nearest photons.predefined number of nearest photons.We search all photons in the cell.We search all photons in the cell.

    If the photon is in the search range then we If the photon is in the search range then we add it.add it.If not, then we ignore it unless we donIf not, then we ignore it unless we don’’t have t have enough photons. Then we add it and expand enough photons. Then we add it and expand the range.the range.

  • RenderingRenderingUse a stochastic ray tracer written using a Use a stochastic ray tracer written using a fragment program to output a texture with all the fragment program to output a texture with all the hit points, hit points, normalsnormals, and colors for a given ray , and colors for a given ray depth.depth.This texture is used as input to several additional This texture is used as input to several additional fragment programs.fragment programs.

    One program computes the direct illumination using One program computes the direct illumination using one or more shadow rays to estimate the visibility of one or more shadow rays to estimate the visibility of the light sources.the light sources.One that invokes the ray tracer to compute reflections One that invokes the ray tracer to compute reflections and refractions.and refractions.One to compute the radiance.One to compute the radiance.

  • VideoVideo

  • CUDA RenderingCUDA Rendering

    All of these rendering techniques can be All of these rendering techniques can be done with CUDA.done with CUDA.They are simpler to implement because They are simpler to implement because you donyou don’’t have to store everything in t have to store everything in textures and you can use shared memory.textures and you can use shared memory.

  • CUDA Rendering DemoCUDA Rendering Demo

  • ReferencesReferencesGPU Gems 2 GPU Gems 2 –– Chapters 38 & 39Chapters 38 & 39Ray Tracing on Programmable Graphics HardwareRay Tracing on Programmable Graphics Hardwareby Timothy J. Purcell, et al., by Timothy J. Purcell, et al., SiggraphSiggraph 20022002Photon Mapping on Programmable Graphics Photon Mapping on Programmable Graphics HardwareHardware by Timothy J. Purcell, et al., by Timothy J. Purcell, et al., SiggraphSiggraph20032003Jon Jon OlickOlick VideoVideo

    http://www.youtube.com/watch?v=VpEpAFGplnIhttp://www.youtube.com/watch?v=VpEpAFGplnICUDA CUDA VoxelVoxel DemoDemo

    http://www.geeks3d.com/20090317/cudahttp://www.geeks3d.com/20090317/cuda--voxelvoxel--renderingrendering--engine/engine/