small-scale raster map projection using the compute unified device architecture (cuda)

19
Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey Michael P. Finn, Jing Li, and David Mattli ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based Services Suzhou, China 14 – 16 May 2014

Upload: min

Post on 24-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA). Michael P. Finn, Jing Li, and David Mattli. ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based Services Suzhou, China 14 – 16 May 2014. HPC-Research/ Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Small-Scale Raster Map Projection using the Compute Unified Device

Architecture (CUDA)

U.S. Department of the InteriorU.S. Geological Survey

Michael P. Finn,Jing Li, and David Mattli

ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based ServicesSuzhou, China14 – 16 May 2014

Page 2: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

HPC-Research/ Motivation• Prime test case: Map projection/ reprojection for large

raster datasets (“Big” Data?)

• pRasterBlaster: mapIMG in HPC environment• Solve problems using multiple processors

• Currently testing within the NSF CyberGIS Project leveraging XSEDE (more traditional supercomputing (SC) environment)

• How does the same problem compare in a computation sense between CPU-dominate SC environment and a more light-weight General Purpose GPU-dominate environment?

Page 3: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

CUDA• A parallel computing platform and programming model

invented by Nvidia• Allows GPUs to be used for general purpose processing (not

exclusively graphics)• GPUs have a parallel throughput architecture that allows

executing many concurrent threads slowly (rather than executing a single thread very quickly)

• Accessible to software developers through libraries, compiler directives, and extensions to programming languages, including C, C++ and Fortran

Page 4: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Accurate Raster Reprojection in Three (primary) Steps

• Step 1: Calculate and Partition Output Space• Step 2: Read Input and Reproject• Step 3: Combine Temporary Files

Page 5: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

The Equations

• Projection Transformation Process: Framing– The frame of a raster dataset defines the extent of the dataset in the

projection space. It also defines the alignment of projection space with the input (often) image coordinate system.

• X = ULprojX + ((sample – 1) * pixelSizeX) (1)• Y = ULprojY – ((line – 1) * pixelSizeY) (2)

– Alternatively:• Sample = ((X – ULprojX) / PixelSizeX) + 1 (3)• Line = ((ULprojY – Y) / pixelSizeY) + 1 (4)

Page 6: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

CUDA implementation

4 corner point based map projection using CUDA

Page 7: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Raster Chunk Handling

Cannot merge output chunks due to the limitation of computing resources

Page 8: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Results

• Configuration of the testing machine– Intel Quad-core CPU (i5-3450 [email protected])– GeForce GT 640, 384 GPU cores– 8G RAM– NVIDIA CUDA SDK 5.5– Visual Studio 2010

Page 9: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Results• CUDA configuration

– Block size:256*1– Chunk dimension: 1024

• Resample GLC (original: ~900MB)

File Res Dimension Volume CPU GPU Ratio

1 10000 4003*2003 7.64 MB 257 5.54 46

2 5000 8006*4003 30.56 MB 896 (C) 21.635/17.553(C) 51

3 2000 20015*10008 191.03 MB 5548(C) 105.173(C) 52

4 1000 40030*20015 764.08 MB NA 407.65(C)

Equirectangular to AlbersNA = Out of memory (8 Gb) on test machine

Page 10: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Results• CUDA configuration

– Block size:256*1– Chunk dimension: 1024

• Resample NLCD (original: 15.6G)

Albers to Equirectangular

File Res Dimension Volume CPU GPU Ratio

1 1200 4030* 2611 10.03 MB 321 3.06 104

2 C 600 8060* 5221 40.13 MB 1091 12.58 86

3 C 300 16119* 10442

160.52 MB 4350 52.01 83

4 C 150 32238* 20885

642.10 MB NA 204.93

Page 11: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Issues(1 of 2)

• The inverse/ forward map projection for Molliweide is not accurate – Need to find the reasons why (should be a minor fix)– Therefore, restrained the current testing to Equirectangular and Albers

• The results of map projection were inaccurate due to misapplied resampling method (minor fix)

• The way to retrieve input data chunk based on the bounding box of output chunk may not be quite accurate– Problem identified: chunks near the edges of dataset need to have

some overlap retrieved (negative coordinates)

Page 12: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Issues(2 of 2)

• Needs better memory management– CPU: Out of memory error even with chunk– Suspect test machine not releasing memory in timely fashion

• GPU: not stable always: kernels may fail during the execution; grid/ block setup– Workload may not be balanced very well.– Kernels can fail when sending too much data– Using remote desktop to manipulate the data may cause issue

Page 13: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Conclusion

• CUDA provides a light-weight, less-expensive alternative to CPU parallel environments like supercomputers

• Raster map projection behaves similarly in initial test to established pRasterBlaster testing in CPU-dominated HPC environments– Greater than one order of magnitude faster

• More work necessary/ issues remain

Page 14: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

References• Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012). A Performance Profiling Strategy for

High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH.

• Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag.

• Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and Babak Behzad (2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia.

• Finn, Michael P., Daniel R. Steinwand, Jason R. Trent, Robert A. Buehler, David Mattli, and Kristina H. Yamamoto (2012). A Program for Handling Map Projections of Small Scale Geospatial Raster Data. Cartographic Perspectives, Number 71, pages 53 – 67.

• Liu, Yan, Michael P. Finn, Babak Behzad, and Eric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Baltimore, Maryland.

• Liu, Yan, Anand Padmanabhan, and Shaowen Wang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: 10.1002/cpe.3256.

• Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February.

• http://cegis.usgs.gov/• http://www.du.edu/nsm/departments/geography/ • http://nationalmap.gov/3DEP/ • http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php• http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page• http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster

Page 15: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Other Collaborators(primarily on the CyberGIS project)

• Shaowen Wang, Anand Padmanabhan, Yan Liu– University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial

Information Laboratory• David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel

– USGS, Center of Excellence for Geospatial Information Science (CEGIS)• Kristina H. Yamamoto

– USGS, National Geospatial Technical Operations Center • Babak Behzad

– UIUC, Department of Computer Science • Eric Shook

– Kent State University, Department of Geography • Qingfeng (Gene) Guan

– China University of Geosciences

Page 16: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Disclaimer

• Any use of trade, product, or firm names in this paper is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Page 17: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

Small-Scale Raster Map Projection using the Compute Unified Device

Architecture (CUDA)

U.S. Department of the InteriorU.S. Geological Survey

QUESTIONS?

ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based ServicesSuzhou, China14 – 16 May 2014

Page 18: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)
Page 19: Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA)

• The block size is not directly related to the chunking concept. • • Block size is the number of threads within each block of GPU. Another concept is grid size. CUDA can launch multiple

threads at the same time (e.g., 512 threads). All threads in a block will be sent to the GPU processors at the same time but may not launch at the same time (depending how many GPU cores are available).

• • In my implementation, I assign each cell of the output image/chunk to a thread. • • If the output image has a dimension of 256*256 and the block size is 16*16, then the grid size is (256/16)*(256/16) =16*16.

• • If the output image has a dimension of 250*250 and the block size is 16*16, then the grid size is (256/16)*(250/16) =

16*15.x = 16*16. This implies that the last few blocks have less data (e.g., 16*10). • • So the selection of the block size is determined by the number of GPU cores as well as the dimension of the image. • • When dealing with large image, which cannot be read into CPU main memory all at once, the image should be divided into

chunks. One chunk then becomes an input image. Then GPU starts processing the chunk..