small-scale raster map projection using the compute unified device architecture (cuda)

Small-Scale Raster Map Projection using the Compute Unified Device

Architecture (CUDA)

U.S. Department of the InteriorU.S. Geological Survey

Michael P. Finn,Jing Li, and David Mattli

ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based ServicesSuzhou, China14 – 16 May 2014

HPC-Research/ Motivation• Prime test case: Map projection/ reprojection for large

raster datasets (“Big” Data?)

• pRasterBlaster: mapIMG in HPC environment• Solve problems using multiple processors

• Currently testing within the NSF CyberGIS Project leveraging XSEDE (more traditional supercomputing (SC) environment)

• How does the same problem compare in a computation sense between CPU-dominate SC environment and a more light-weight General Purpose GPU-dominate environment?

CUDA• A parallel computing platform and programming model

invented by Nvidia• Allows GPUs to be used for general purpose processing (not

exclusively graphics)• GPUs have a parallel throughput architecture that allows

executing many concurrent threads slowly (rather than executing a single thread very quickly)

• Accessible to software developers through libraries, compiler directives, and extensions to programming languages, including C, C++ and Fortran

Accurate Raster Reprojection in Three (primary) Steps

• Step 1: Calculate and Partition Output Space• Step 2: Read Input and Reproject• Step 3: Combine Temporary Files

The Equations

• Projection Transformation Process: Framing– The frame of a raster dataset defines the extent of the dataset in the

projection space. It also defines the alignment of projection space with the input (often) image coordinate system.

• X = ULprojX + ((sample – 1) * pixelSizeX) (1)• Y = ULprojY – ((line – 1) * pixelSizeY) (2)

– Alternatively:• Sample = ((X – ULprojX) / PixelSizeX) + 1 (3)• Line = ((ULprojY – Y) / pixelSizeY) + 1 (4)

CUDA implementation

4 corner point based map projection using CUDA

Raster Chunk Handling

Cannot merge output chunks due to the limitation of computing resources

Results

• Configuration of the testing machine– Intel Quad-core CPU (i5-3450 [email protected])– GeForce GT 640, 384 GPU cores– 8G RAM– NVIDIA CUDA SDK 5.5– Visual Studio 2010

Results• CUDA configuration

– Block size:256*1– Chunk dimension: 1024

• Resample GLC (original: ~900MB)

File Res Dimension Volume CPU GPU Ratio

1 10000 4003*2003 7.64 MB 257 5.54 46

2 5000 8006*4003 30.56 MB 896 (C) 21.635/17.553(C) 51

3 2000 20015*10008 191.03 MB 5548(C) 105.173(C) 52

4 1000 40030*20015 764.08 MB NA 407.65(C)

Equirectangular to AlbersNA = Out of memory (8 Gb) on test machine

Results• CUDA configuration

– Block size:256*1– Chunk dimension: 1024

• Resample NLCD (original: 15.6G)

Albers to Equirectangular

File Res Dimension Volume CPU GPU Ratio

1 1200 4030* 2611 10.03 MB 321 3.06 104

2 C 600 8060* 5221 40.13 MB 1091 12.58 86

3 C 300 16119* 10442

160.52 MB 4350 52.01 83

4 C 150 32238* 20885

642.10 MB NA 204.93

Issues(1 of 2)

• The inverse/ forward map projection for Molliweide is not accurate – Need to find the reasons why (should be a minor fix)– Therefore, restrained the current testing to Equirectangular and Albers

• The results of map projection were inaccurate due to misapplied resampling method (minor fix)

• The way to retrieve input data chunk based on the bounding box of output chunk may not be quite accurate– Problem identified: chunks near the edges of dataset need to have

some overlap retrieved (negative coordinates)

Issues(2 of 2)

• Needs better memory management– CPU: Out of memory error even with chunk– Suspect test machine not releasing memory in timely fashion

• GPU: not stable always: kernels may fail during the execution; grid/ block setup– Workload may not be balanced very well.– Kernels can fail when sending too much data– Using remote desktop to manipulate the data may cause issue

Conclusion

• CUDA provides a light-weight, less-expensive alternative to CPU parallel environments like supercomputers

• Raster map projection behaves similarly in initial test to established pRasterBlaster testing in CPU-dominated HPC environments– Greater than one order of magnitude faster

• More work necessary/ issues remain

References• Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012). A Performance Profiling Strategy for

High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH.

• Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag.

• Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and Babak Behzad (2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia.

• Finn, Michael P., Daniel R. Steinwand, Jason R. Trent, Robert A. Buehler, David Mattli, and Kristina H. Yamamoto (2012). A Program for Handling Map Projections of Small Scale Geospatial Raster Data. Cartographic Perspectives, Number 71, pages 53 – 67.

• Liu, Yan, Michael P. Finn, Babak Behzad, and Eric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Baltimore, Maryland.

• Liu, Yan, Anand Padmanabhan, and Shaowen Wang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: 10.1002/cpe.3256.

• Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February.

• http://cegis.usgs.gov/• http://www.du.edu/nsm/departments/geography/ • http://nationalmap.gov/3DEP/ • http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php• http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page• http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster

http://cegis.usgs.gov/

http://cegis.usgs.gov/

http://www.du.edu/nsm/departments/geography/

http://www.du.edu/nsm/departments/geography/

http://nationalmap.gov/3DEP/



http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php

http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php

http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page

http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page

http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster

http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster

Other Collaborators(primarily on the CyberGIS project)

• Shaowen Wang, Anand Padmanabhan, Yan Liu– University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial

Information Laboratory• David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel

– USGS, Center of Excellence for Geospatial Information Science (CEGIS)• Kristina H. Yamamoto

– USGS, National Geospatial Technical Operations Center • Babak Behzad

– UIUC, Department of Computer Science • Eric Shook

– Kent State University, Department of Geography • Qingfeng (Gene) Guan

– China University of Geosciences

Disclaimer

• Any use of trade, product, or firm names in this paper is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Small-Scale Raster Map Projection using the Compute Unified Device

Architecture (CUDA)

U.S. Department of the InteriorU.S. Geological Survey

QUESTIONS?

ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based ServicesSuzhou, China14 – 16 May 2014

• The block size is not directly related to the chunking concept. • • Block size is the number of threads within each block of GPU. Another concept is grid size. CUDA can launch multiple

threads at the same time (e.g., 512 threads). All threads in a block will be sent to the GPU processors at the same time but may not launch at the same time (depending how many GPU cores are available).

• • In my implementation, I assign each cell of the output image/chunk to a thread. • • If the output image has a dimension of 256*256 and the block size is 16*16, then the grid size is (256/16)*(256/16) =16*16.

• • If the output image has a dimension of 250*250 and the block size is 16*16, then the grid size is (256/16)*(250/16) =

16*15.x = 16*16. This implies that the last few blocks have less data (e.g., 16*10). • • So the selection of the block size is determined by the number of GPU cores as well as the dimension of the image. • • When dealing with large image, which cannot be read into CPU main memory all at once, the image should be divided into

chunks. One chunk then becomes an input image. Then GPU starts processing the chunk..

small-scale raster map projection using the compute unified device architecture (cuda)

Documents

map projection reprojection

inverse forward map

alignment of projection

chunk dimension

large raster datasets

current testing

ulprojy line

test machine c stands