Download - Texture Memory -in CUDA Perspective TEXTURE MEMORY IN - IN CUDA PERSPECTIVE VINAY MANCHIRAJU
Texture Memory
-in CUDA Perspective
TEXTURE MEMORY IN - IN CUDA PERSPECTIVE
VINAY MANCHIRAJU
TEXTURE MEMORY
• Read only memory used by programs in CUDA• Used in General Purpose Computing for
Accuracy and Efficiency.• Designed for DirectX and OpenGL rendering
Pipelines.
• Can cache non consecutive memory locations unlike CPU caching schemes.
• Designed to accelerate access patterns.
WHY USE TEXTURES?
• Texture memory is cached on a chip.• Provides higher effective bandwidth.• Reduces memory requests to the off-chip
DRAM.• Improves performance of graphics application
where memory access patterns exhibit great deal of spatial locality.
PARALLELIZING PHYSICAL SIMULATIONS
•Results are more accurate with reduced computational complexity and lesser time to solve.•Textures have a significant role in simulation problems.
HEAT SIMULATION EXAMPLE
• A rectangular room consisting of a grid.• Inside the grid various heaters with fixed temperatures are scattered in the cell .
FLOW OF HEAT
Warmer cells tend to cool as the heat is dissipated to cooler regions and vice versa
AS A FUNCTION OF HEAT LOSS/GAIN
• Imagine that there are 4 neighbors for a given cell.
• K -> Rate of heat flow from one cell to another.• A large value of k will drive the system to a
constant temperature quickly, while a small value will allow the solution to retain large temperature gradients longer.
THREE STEPS TO COMPUTE TEMPERATURE UPDATES
copy_const_kernel()Copy Heater temperatures to respective gridsEnforce a restriction that temperatures of the cells
with heaters are constant. blend_kernel():
Output temperatures are calculated based on the input temperatures of the grid using the equation.
Swap the input and output temperatures for the calculation in next step.
copy_const_kernel()
Convert threadIdx and blockIdx into an x and y coordinate.Compute a linear offset into constant and input buffers.If the cell in the constant grid is nonzero copy of the heater temperature in cptr[] to the input grid in iptr[] .
blend_kernel()
• 1 thread for every cell.• Offsets of the neighbors in all the 4 directions are
computed to read the temperatures of those cells.• Each thread reads its cell’s temperature, the
temperatures of its neighboring cells, perform the previous update computation, and then update its temperature with the new value.
• Calculate updated temperature adding old temperatures and scaled differences and the neighboring cell temperatures.
anim_kernel()
We use DataBlock contains the constant buffer of heaters and the updated temperatures.
Arguments: pointer to a data block, number of ticks of animation that have elapsed.(not used)
We use a 16 x 16 grid and blocks of 256 threads.
anim_kernel()
After the iteration we swap the input and output buffers to obtain the final temperatures.
The temperatures are converted into colors and the bitmap image is transferred from GPU to CPU.
The Program.
USING TEXTURES
• Declare inputs as texture references.• Use references to floating point textures .
• Allocate GPU memory for these textures and then bind the references using cudaBindTexture()
cudaBindTexture()
• Use specified buffer as a texture and texture reference as texture name.
• Please check cudaBindTexture()
tex1Dfetch()
• A Compiler intrinsic function.• Used to pass texIn, texOut, texConstSrc
textures to the blend method.• This would help us to fetch the texture value
into a float point variable.
copy_const_kernel()
cudaUnbindTexture()
USING 2D-TEXTURES
• Reference Declaration:
• Instead of using offset to calculate left, right, top and bottom we directly use x,y to access the texture.
USING 2-D TEXTURES
• Bounds overflow over the grid is taken care of.• If one of x or y is less than zero, tex2D() will
return the value at zero. • If one of these values is greater than the
width, tex2D() will return the value at width 1.
tex2D
CudaBindTexture2d()
Tradeoffs 1D vs 2D
• So from a performance standpoint, the decision between one- and two-dimensional textures is likely to be inconsequential.
• For our particular application, the code is a little simpler when using two- dimensional textures because we happen to be simulating a two-dimensional domain. But in general, since this is not always the case, we suggest you make the decision between one- and two-dimensional textures on a case-by-case basis.
REFERENCES• http://http.developer.nvidia.com/Cg/tex1Dfetch.html• http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/doc
s/online/group__CUDART__HIGHLEVEL_g2aeb95eab6b9d90bb00b26406a27c515.html
• http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__HIGHLEVEL_g67660ae3e9a1ff520575394f78087bea.html#g67660ae3e9a1ff520575394f78087bea
THANK YOU…