parallel geospatial data management for multi-scale environmental data analysis on gpus

Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs

Visiting Faculty: Jianting Zhang, The City College of New York (CCNY), [email protected]: Host: Dali Wang, Climate Change Science Institute (CCSI), [email protected]

OverviewAs the spatial and temporal resolutions of Earth observatory data and Earth system simulation outputs are getting higher, in-situ and/or post- processing such large amount of geospatial data increasingly becomes a bottleneck in scientific inquires of Earth systems and their human impacts. Existing geospatial techniques that are based on outdated computing models (e.g., serial algorithms and disk-resident systems), as have been implemented in many commercial and open source packages, are incapable of processing large-scale geospatial data and achieve desired level of performance. Partially Supported by DOE’s Visiting Faculty Program (VFP), we are investigating on a set of parallel data structures and algorithms that are capable of utilizing massively data parallel computing power available on commodity Graphics Processing Units (GPUs). We have designed and implemented a popular geospatial technique called Zonal Statistics, i.e., deriving histograms of points or raster cells in polygonal zones. Results have shown that our GPU-based parallel Zonal Statistic technique on 3000+ US counties over 20+ billion NASA SRTM 30 meter resolution Digital Elevation (DEM) raster cells has achieved impressive end-to-end runtimes: 101 seconds and 46 seconds a low-end workstation equipped with a Nvidia GTX Titan GPU using cold and hot cache, respectively; and ,60-70 seconds using a single OLCF TITAN computing node and 10-15 seconds using 8 nodes.

High-resolution Satellite ImageryT

In-situ Observation Sensor Data

T

Global and Regional Climate Model Outputs

Data Assimilation Zonal Statistics

TEcological, environmental and administrative zones

T

T

V

Temporal Trends

High-End Computing Facility

A

BC

Thread Block

ROIs

Background & Motivation

Zonal Statistics on NASA Shuttle Radar Topography Mission (SRTM) Data

2 3

SQL: SELECT COUNT(*) from T_O, T_Z WHERE ST_WITHIN (T_O.the_geom,T_Z.the_geom) GROUP BY T_Z.z_id;

Point-in-Polygon Test

For each county, derive its histogram of elevations from raster cells that are in the polygon.

•SRTM: 20*109 (billion) raster cells (~40GB raw, ~15GB compressed TIFF)•Zones: 3141 counties, 87,097 vertices

Brute-force Point-in-polygon test•RT=(#of points)*(number of vertices)*(number of ops per p-in-p test)/(number of ops per second) =20*109*87097*20/(10*109)=3.7*106seconds=40days•Using up all Titan’s 18,688 nodes: ~200 seconds•Flops utilization is typically low: can be <1% for data intensive applications (typically <10% in HPC)

Hybrid Spatial Databases + HPC ApproachObservation : only points/cells that are close enough to polygons need to be tested Question: how do we pair neighboring points/cells with polygons?

Minimum Bounding Boxes (MBRs)

Step1: divide a raster tile into blocks and generate per-tile histograms Step 2: derive polygon MBRs and pair MBRs with blocks through box-in-polygon test (inside/intersect)Step 3: aggregate per-blocks histograms into per-polygon histograms if blocks are within polygonsStep 4: for each intersected polygon/block pair, perform point(cell)-in-polygon test for all the raster cells in the blocks and update respective polygon histogram

(A) Intersect (B) Inside (C) Outside

GPU Implementations

Identifying parallelisms and mapping to hardware(1)Deriving per-block histograms(2)Block in polygon test(3)Aggregate histograms for “within” blocks (4)Point-in-polygon test for individual cells and update

GPU Thread BlockRaster Block

AtomicAdd

1

M1C1

M1C2

M2C3

…

…

…

…

•Point-in-poly test for each of cell’s 4 corners•All-pair Edge intersection tests between polygon and cell 2

Polygon vertices

Cells

•Perfect coalesced memory accesses •Utilizing GPU floating point power

4

BPQ-Tree based Raster compression to save disk I/O•Idea: chop a M-bit raster into M binary bitmaps and then build a quadtree for each bitmap (Zhang et al 2011)

•BPQ-Tree achieves competitive compression ratio but is much more parallelization friendly on GPUs •Advantage 1: compressed data is streamed from disk to GPU without requiring decompression on CPUs reducing CPU memory footprint and data transfer time•Advantage 2: can be used in conjunction with CPU-based compression to further improve compression ratio

Experiments and EvaluationsTile # dimension Partition Schema1 54000*43200 2*22 50400*43200 2*23 50400*43200 2*24 82800*36000 2*25 61200*46800 2*26 68400*111600 4*4Total 20,165,760,000 36

Data Format Volume (GB)Original (Raw) 38TIFF Compression 15gzip compression 8.3BPQ-Tree Compression 7.3BPQ-Tree+ gzip compression

5.5

Single Node Configuration 1: Dell T5400 WS•Intel Xeon E5405 dual Quad-Core Processor (2.00 GHZ), 16 GB, PCI-E Gen2, 3*500GB 7200 RPM disk with 32M cache ($5,000)•Nvidia Quadro 6000 GPU, 448 Fermi core ( 574 MHZ), 6 GB, 144GB/s ($4,500)

Single Node Configuration 2: Do-IT Yourself Workstation•Intel Core-i5 650 Dual-Core, 8 GB, PCI-E Gen3, 500GB 7200 RPM disk with 32M (recycled), ($1000)•Nvidia GTX Titan GPU, 2688 Kepler core ( 837 MHZ), 6 GB, 288 GB/s ($1000)

Parameters:•Raster chunk size (coding): 4096*4096• Thread block size: 256

•Maximum histogram bins: 5000•Raster Block size: 0.1*0.1 degree 360*360 (resolution is 1*1 arc-second)

GPU Cluster : OLCF Titan

ResultsSingle Node Cold Cache Hot cacheConfig1 180s 78sConfig2 101s 46s

Quadro 6000 GTX Titan 8-core CPU(Step 0): Raster decompression 16.2 8.30 131.9Step 1: Per-block histogramming 21.5 13.4 /Step 2: Block-in-polygon test 0.11 0.07 /Step 3: “within-block” histogram aggregation 0.14 0.11 /Step 4: cell-in-polygon test and histogram update 29.7 11.4 /total major steps 67.7 33.3 /Wall-clock end-to-end 85 46 /

Titan#of nodes 1 2 4 8 16Runtime(s) 60.7 31.3 17.9 10.2 7.6

Data and Pre-processing

mailto:[email protected]

mailto:[email protected]

http://images.google.com/imgres?imgurl=http://www.ccny.cuny.edu/public_safety/LOGO-NEW-2.gif&imgrefurl=http://www.ccny.cuny.edu/public_safety/&h=1518&w=1670&sz=419&hl=en&start=5&usg=__RzwNqxW2tlvxCgyaIMmAFcqnUOo=&tbnid=LQS2VxavYEbsuM:&tbnh=136&tbnw=150&prev=/images?q=ccny+logo&gbv=2&hl=en&sa=X

http://www.google.com/imgres?imgurl=http://www.gc.cuny.edu/img/logos/gc_logo_180x180.gif&imgrefurl=http://www.gc.cuny.edu/prospective_students/viewbook/Mina_Rees.htm&usg=__dInRZWETDlF0TpQ-yazfHuRmyPg=&h=180&w=180&sz=11&hl=en&start=2&um=1&itbs=1&tbnid=oy68vYzPJt-0gM:&tbnh=101&tbnw=101&prev=/images?q=graduate+center+logo&um=1&hl=en&sa=X&imgtbs=s&tbs=isch:1

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=mkInutuV2q29wM&tbnid=r1Ui4VbPq5Jx6M:&ved=0CAUQjRw&url=http://picsbox.biz/key/zonal%20statistics%20arcgis&ei=3uv6UZKeNoHG9gS_2YGQCA&bvm=bv.50165853,d.eWU&psig=AFQjCNGsouWTiLFgEUSymRsNyJ_bXO-dLA&ust=1375485258568277

parallel geospatial data management for multi-scale environmental data analysis on gpus

Documents