large-scale spatial query processing on gpu-accelerated big data...

Post on 17-Oct-2020

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Large-Scale Spatial Query Processing

on GPU-Accelerated Big Data Systems

Jianting Zhang1,2 Simin You2

1 Depart of Computer Science, CUNY City College (CCNY)

2 Department of Computer Science, CUNY Graduate Center

http://www-cs.ccny.cuny.edu/~jzhang/

Outline•Introduction

•Spatial data, GIS, BigData and HPC

•Taxi trip data in NYC and Global Biodiversity Applications

•Spatial query processing on GPUs

•ISP-GPU

•Architecture and Implementations

•Experiment Results

•Alternative Techniques

•SpatialSpark

•Lightweight Distributed Execution (LDE) Engine

•Summary an Future Work

Computational

Geometry

Spatial Databases:

data modeling,

indexing, query

processing

Scientific Data/Information Visualization

Statistics/Machine learning

Geographical Information System

GIS

Environmental

Modeling

Air quality

Remote

Sensing

Hydrology

Ecology

Social-

Economic

Modeling Urban planning

Transportation

Census/Taxation

Social Studies

Computer

Graphics

Image Processing/Computer Vision

Big Geospatial Data Challenges

– Event Locations, trajectories and O-D data

• E.g., Taxi trip records (GPS traces or O-D locations)

• 0.5 million in NYC (medallion taxi cab only) and 1.2 million in Beijing per day

• From O-D locations to trajectories to frequent patterns

– Satellite: e.g., from GOES to GOES-R (2015/2016) [$11B]

• http://www.goes-r.gov/downloads/GOES-R-Tri-10-06-09_v7.pdf

• Spectral (3X)*spatial (4X)* temporal (5X)=60X

• 2km*2km*5min*16bands(360*60)*(180*60)*(12*24)*16~ 1+ trillion pixels per day

• Derived thematic data products (vector)

– http://www.goes-r.gov/products/baseline.html

– http://www.goes-r.gov/products/option2.html

– Species distributions

• E.g. 400+ million occurrence records (GBIF)

• E.g. 717,057 polygons and 78,929,697 vertices for 4148 birds distribution data (NatureServe)

Cloud computing+MapReduce+Hadoop

A

B

C

Thread Block

CPU Host (CMP)

Core

Local Cache

Shared Cache

DRAM

HDD SSD

GPU

SIMD

PCI-E

Ring Bus

Local Cache

Core ... Core

Core Core

GDRAM GDRAM

Core Core Core... ...

MIC

PCI-E

T0 T1

T2 T3

4-Threads

In-Order

16 Intel Sandy Bridge CPU cores+ 128GB RAM + 8TB disk + GTX TITAN + Xeon Phi 3120A ~ $9,994

ASCI Red: 1997 First 1 Teraflops

(sustained) system with 9298 Intel Pentium

II Xeon processors (in 72 Cabinets)

•Feb. 2013

•7.1 billion transistors (551mm²)

•2,688 processors •4.5 TFLOPS SP and 1.3 TFLOPS DP

•Max bandwidth 288.4 GB/s

•PCI-E peripheral device

•250 W (17.98 GFLOPS/W -SP)

• Suggested retail price: $999

What can we do today using a device that is more

powerful than ASCI Red 16 years ago?

Geospatial Technologies and

Environmental CyberInfrastructure

(GeoTECI) Lab

Dr. Jianting Zhang

Department of Computer Science

The City College of New York

Affiliated Institutions

Collaborating Institutions

Students: Simin You (Ph.D. 2009 -), Siyu Liao (Ph.D. 2014-), Costin Vicoveanu (Undergraduate, 2014-)

Bharat Rosanlall (Undergraduate, 2014), Jay Yao (MS-thesis, 2011-2012), Chandrashekar Singh (MS

2013), Agniva Banerjee (MS, 2012), Roger King (MS, 2012), Wahyu Nugroho (MS, 2011), Xiao Quan

Cen Feng (MS 2011), Chetram Dasrat (Undergraduate, 2008)

GeoTECI@CCNY

•HIGHEST-DB

•HIgh-performance GrapHics units based Engine for Spatial-Temporal data

•Spatial and Spatiotemporal indexing, query processing and optimization

•Trajectory data management on GPUs

•Segmentation/simplification/compression/Aggregation/Warehousing

•Map matching with road networks

•Data mining (moving cluster, convoy, swarm...)

… when yellow cabs,

green cabs and MTA

buses meet with multi-

core CPUs, GPUs and

MICs in NYC …

$449,845/4yr (08/01/2013-07/31/2017)

GeoTECI@CCNY

High-resolution Satellite ImageryT

In-situ Observation Sensor Data

T

Global and Regional Climate Model Outputs

Data AssimilationZonal Statistics

TEcological, environmental and administrative zones

T

T

V

Temporal Trends

High-End Computing Facility

A

B

C

Thread Block

ROIs

… when GOES-R satellites, extratropical

cyclones and hummingbirds meet with TITAN …

GeoTECI@CCNYCCNY Computer Science LAN

Microway

Dual 8-core

128GB memory

Nvidia GTX Titan

Intel Xeon Phi 3120A

8 TB storage

DIY

*2

SGI Octane III

Dual Quadcore

48GB memory

Nvidia C2050*2

8 TB storage

Dual-core

8GB memory

Nvidia GTX Titan

3 TB storage

Dell T5400

Dual Quadcore

16GB memory

Nvidia Quadro 6000

1.5 TB storage

Lenovo T400s

Dell T7500

Dual 6-core

24 GB memory

Nvidia Quadro 6000

Dell T7500

Dual 6-core

24 GB memory

Nvidia GTX 480

Dual Quadcore

16GB memory

Nvidia FX3700*2

Dell T5400

DIY

Quadcore (Haswell)

16 GB memory

AMD/ATI 7970

Quadcore

8 GB memory

Nvidia Quadro 5000m

HP 8740w

HP 8740w

CUNY HPCC

KVM

“Brawny” GPU cluster

“Wimmy” GPU cluster

Web Server/

Linux App Server Windows

App Server

...building a highly-configurable experimental computing

environment for innovative BigData technologies…

Taxi trip data in NYC

Count-Distance Distribution

0

5000000

10000000

15000000

20000000

<=

0

.0

( 0

.8,

1.0

]

( 1

.8,

2.0

]

( 2

.8,

3.0

]

( 3

.8,

4.0

]

( 4

.8,

5.0

]

( 5

.8,

6.0

]

( 6

.8,

7.0

]

( 7

.8,

8.0

]

( 8

.8,

9.0

]

( 9

.8,

10

.0]

( 1

0.8

, 1

1.0

]

( 1

1.8

, 1

2.0

]

( 1

2.8

, 1

3.0

]

( 1

3.8

, 1

4.0

]

( 1

4.8

, 1

5.0

]

( 1

5.8

, 1

6.0

]

( 1

6.8

, 1

7.0

]

( 1

7.8

, 1

8.0

]

( 1

8.8

, 1

9.0

]

( 1

9.8

, 2

0.0

]

Trip Distance (mile)

Co

un

t

Count-Time Distribution

0

5000000

10000000

15000000

20000000

<=

0

.0

( 2

.0,

3.0

]

( 5

.0,

6.0

]

( 8

.0,

9.0

]

( 1

1.0

, 1

2.0

]

( 1

4.0

, 1

5.0

]

( 1

7.0

, 1

8.0

]

( 2

0.0

, 2

1.0

]

( 2

3.0

, 2

4.0

]

( 2

6.0

, 2

7.0

]

( 2

9.0

, 3

0.0

]

( 3

2.0

, 3

3.0

]

( 3

5.0

, 3

6.0

]

( 3

8.0

, 3

9.0

]

( 4

1.0

, 4

2.0

]

( 4

4.0

, 4

5.0

]

( 4

7.0

, 4

8.0

]

> 5

0.0

TripTime (Minute)

Co

un

t

Count-Speed Distribution

0

5000000

10000000

15000000

20000000

<=

0

.0

( 1

.0, 2

.0]

( 3

.0, 4

.0]

( 5

.0, 6

.0]

( 7

.0, 8

.0]

( 9

.0, 1

0.0

]

( 1

1.0

, 1

2.0

]

( 1

3.0

, 1

4.0

]

( 1

5.0

, 1

6.0

]

( 1

7.0

, 1

8.0

]

( 1

9.0

, 2

0.0

]

( 2

1.0

, 2

2.0

]

( 2

3.0

, 2

4.0

]

( 2

5.0

, 2

6.0

]

( 2

7.0

, 2

8.0

]

( 2

9.0

, 3

0.0

]

( 3

1.0

, 3

2.0

]

( 3

3.0

, 3

4.0

]

( 3

5.0

, 3

6.0

]

( 3

7.0

, 3

8.0

]

( 3

9.0

, 4

0.0

]

( 4

1.0

, 4

2.0

]

( 4

3.0

, 4

4.0

]

( 4

5.0

, 4

6.0

]

( 4

7.0

, 4

8.0

]

( 4

9.0

, 5

0.0

]

Speed (MPH)

Co

un

t

Count-Fare Distribution

0

5000000

1000000015000000

20000000

25000000

30000000

<=

0.0

( 1

.0,

2.0

]

( 3

.0,

4.0

]

( 5

.0,

6.0

]

( 7

.0,

8.0

]

( 9

.0,

10.0

]

( 11.0

, 12.0

]

( 13.0

, 14.0

]

( 15.0

, 16.0

]

( 17.0

, 18.0

]

( 19.0

, 20.0

]

( 21.0

, 22.0

]

( 23.0

, 24.0

]

( 25.0

, 26.0

]

( 27.0

, 28.0

]

( 29.0

, 30.0

]

( 31.0

, 32.0

]

( 33.0

, 34.0

]

( 35.0

, 36.0

]

( 37.0

, 38.0

]

( 39.0

, 40.0

]

( 41.0

, 42.0

]

( 43.0

, 44.0

]

( 45.0

, 46.0

]

( 47.0

, 48.0

]

( 49.0

, 50.0

]

Fare ($)

Co

un

t

Over all distributions of trip distance, time, speed and fare (2009)

Taxi trip data in NYC

• How to manage taxi trip data?

– Geographical Information System (GIS)

– Spatial Databases (SDB)

– Moving Object Databases (MOD)

• How good are they?

– Pretty good for small amount of data

– But, rather poor for large-scale data

Taxi trip data in NYC

• Example 1: – Loading 170 million taxi pickup locations into PostgreSQL

– UPDATE t SET PUGeo = ST_SetSRID(ST_Point("PULong","PuLat"),4326);

– 105.8 hours!

• Example 2: – Finding the nearest tax blocks for 170 million taxi pickup locations

using open source libspatiaindex+GDAL

– 30.5 hours!

I do not have time to wait...

Can we do better?

Intel Xeon 2.26 GHz processors with 48G memory

Global

Biodiversity

Data at GBIF

15

SELECT aoi_id, sp_id, sum (ST_area (inter_geom))

FROM

(

SELECT aoi_id, sp_id,

ST_Intersection (sp_geom, qw_geom)

AS inter_geom

FROM SP_TB, QW_TB

WHERE ST_Intersects (sp_geometry, qw_geom)

)

GROUP BY aoi_id, sp_id

HAVING sum(ST_area(inter_geom)) >T;

http://gbif.org

Spatial Data Processing on GPUs

http://www-cs.ccny.cuny.edu/~jzhang/papers/gpu_spatial_tr.pdf

Spatial query processing on GPUs

Single-Level Grid-File based Spatial Filtering

Vertices

(polygon/

polyline)

Points

•Perfect coalesced

memory accesses

•Utilizing GPU floating

point computing power

Nested-Loop based Refinement

J. Zhang, S. You and L. Gruenwald, "Parallel Online Spatial and Temporal Aggregations on Multi-core CPUs and Many-Core GPUs," Information Systems, vol. 44, p. 134–154, 2014.

Spatial query processing on GPUs

Top: grid size =256*256resolution=128 feet Right: grid size =8192*8192resolution=4 feet

Spatial Aggregation

9,424 /326=30X (8192*8192)

Temporal Aggregation

1709/198=8.6X (minute)

1598 /165 = 9.7X (hour)

Spatial query processing on GPUs

P2P-TP2N-D P2P-D

147,011

street

segments

38,794

census

blocks

(470,941

points)

735,488 tax

blocks

(4,698,986

points)

P2N-D P2P-T P2P-D

- 15.2 h 30.5 h

10.9 s 11.2 s 33.1 s

- 4,900X 3,200X

CPU time

GPU Time

Speedup

Algorithmic

improvement: 3.7X

Using main-memory

data structures: 37.4X

GPU Acceleration:

24.3X

Outline•Introduction

•Spatial data, GIS, BigData and HPC

•Taxi trip data in NYC and Global Biodiversity Applications

•Spatial query processing on GPUs

•ISP-GPU

•Architecture and Implementations

•Experiment Results

•Alternative Techniques

•SpatialSpark

•Lightweight Distributed Execution (LDE) Engine

•Summary an Future Work

ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters

ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters

http://www.slideshare.net/hadooparchbook/impala-architecture-presentation

• SQL Frontend: translate SQL

queries into execution plans

• C/C++ backend with SSE4

support (for strings operations)

• Efficient implementations of

hash-joins (partitioned and non-

partitioned)

• LLVM-based JIT

• ….

Attractive Features

Extension is challenging!

ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters

class SpatialJoinNode : public BlockingJoinNode {

public:

SpatialJoinNode(ObjectPool* pool, const TPlanNode& tnode, const DescriptorTbl& descs);

virtual Status Prepare(RuntimeState* state);

virtual Status GetNext(RuntimeState* state, RowBatch* row_batch, bool* eos);

virtual void Close(RuntimeState* state);

protected:

virtual Status InitGetNext(TupleRow* first_left_row);

virtual Status ConstructBuildSide(RuntimeState* state);

private:

boost::scoped_ptr<TPlanNode> thrift_plan_node_;

RuntimeState* runtime_state_;

}

create_rtree(…)

pip_join(…)

nearest_join(…)

ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters

http://www-cs.ccny.cuny.edu/~jzhang/papers/isp_gpu_tr.pdf

Scalable and Efficient Spatial Data Management on Multi-Core CPU and GPU Clusters.

IEEE HardBD’15 Workshop

ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters

ISP-GPU ISP-MC+ GPU-Standalone MC-Standalone

taxi-nycb (s) 96 130 50 89

GBIF-WWF(s) 1822 2816 1498 2664

Taxi-nycb: ~170 million points, ~40 thousand polygons (9 vertices/polygon)

GBF-WWF: ~375 million points, ~15 thousand polygons (279 vertices/polygon)

Single-node results: 16core CPU/128GB, GTX Titan

Cluster results: 2-10 nodes each with 8 vCPU cores/15GB, 1536 CUDA cores/4 GB

(50 million species locations used due to memory constraint)

Outline•Introduction

•Spatial data, GIS, BigData and HPC

•Taxi trip data in NYC and Global Biodiversity Applications

•Spatial query processing on GPUs

•ISP-GPU

•Architecture and Implementations

•Experiment Results

•Alternative Techniques

•SpatialSpark

•Lightweight Distributed Execution (LDE) Engine

•Summary an Future Work

Alternative

Techniques

val sc = new SparkContext(conf)

//reading left side data from HDFS and perform pre-processing

val leftData = sc.textFile(leftFile, numPartitions).map(x => x.split(SEPARATOR)).zipWithIndex()

val leftGeometryById = leftData.map(x => (x._2, Try(new WKTReader().read(x._1.apply(leftGeometryIndex)))))

.filter(_._2.isSuccess).map(x => (x._1, x._2.get))

//similarly for right-side data….

//ready for spatial query (broadcast-based)

val joinPredicate =SpatialOperator.Within // NearestD can be applied similarly

var matchedPairs:RDD[(Long, Long)] = BroadcastSpatialJoin(sc, leftGeometryById, rightGeometryById, joinPredicate)

SpatialSpark: Just Open-Sourced

http://simin.me/projects/spatialspark/

http://www-cs.ccny.cuny.edu/~jzhang/papers/spatial_cc_tr.pdf

Large-Scale Spatial Join Query Processing in Cloud (Comparison with ISP-MC)

IEEE CloudDM’15 Workshop

Alternative

Techniques

Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processinghttp://www-cs.engr.ccny.cuny.edu/~jzhang/papers/lde_spatial_tr.pdf

Summary and Future Work

• Designs and implementations of an in-memory

spatial data management system on multi-core

CPU and many-core GPU clusters by

extending Cloudera Impala for distributed

spatial join query processing

• Experiments on the initial implementations

have revealed both advantages and

disadvantages of extending a tightly-coupled

big data system to support spatial data types

and their operations.

• Alternative techniques are being developed to

further improve efficiency, scalability,

extensibility and portability.

Q&A

jzhang@cs.ccny.cuny.edu

http://www-cs.ccny.cuny.edu/~jzhang/

top related