a fault-tolerant environment for large-scale query processing mehmet can kurt gagan agrawal...

19
A Fault-Tolerant Environment for Large- Scale Query Processing Mehmet Can Kurt Gagan Agrawal Department of Computer Science and Engineering The Ohio State University HiPC’12 Pune, India 1

Upload: owen-justin-welch

Post on 13-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

A Fault-Tolerant Environment for Large-Scale Query Processing

Mehmet Can Kurt Gagan AgrawalDepartment of Computer Science and

Engineering The Ohio State University

HiPC’12 Pune, India 1

Motivation

• “big data” problem– Walmart handles 1 million customer transaction every

hour, estimated data volume is 2.5 Petabytes.– Facebook handles more than 40 billion images– LSST generates 6 petabytes every year

• massive parallelism is the key

HiPC’12 Pune, India 2

Motivation

• Mean-Time To Failure (MTTF) decreases

• Typical first year for a new cluster*– 1000 individual machine failures– 1 PDU failure (~500-1000 machines suddenly disappear)– 20 rack failures (40-80 machines disappear, 1-6 hours to get

back)

HiPC’12 Pune, India 3

* taken from Jeff Dean’s talk in Google IO (http://perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx)

Our Work

• supporting fault-tolerant query processing and data analysis for a massive scientific dataset

• focusing on two specific query types:1. Range Queries on Spatial datasets2. Aggregation Queries on Point datasets

• supported failure types: single-machine failures and rack failures

HiPC’12 Pune, India 4

* rack: a number of machines connected to the same hardware (network switch, …)

Our Work

• Primary Goals1) high efficiency of execution when there are no failures

(indexing if applicable, ensuring load-balance)

2) handling failures efficiently up to a certain number of nodes (low-overhead fault tolerance through data replication)

3) a modest slowdown in processing times when recovered from a failure (preserving load-balance)

HiPC’12 Pune, India 5

Range Queries on Spatial Data

• nature of the task:– each data object is a rectangle in 2D space– each query is defined with a rectangle– return intersecting data rectangles

• computational model:– master/worker model– master serves as coordinator– each worker responsible for a portion of data

HiPC’12 Pune, India 6

Y

X

query

data data data

worker worker worker

query

quer

y query

master

Range Queries on Spatial Data

• data organization:– chunk is the smallest data unit– create chunks by grouping data objects together– assign chunks to workers in round-robin fashion

HiPC’12 Pune, India 7

Y

X

chunk 1

chunk 2

chunk 3worker

chunk 4 worker

* actual number of chunks depends on chunk size parameter.

Range Queries on Spatial Data

• ensuring load-balance:– enumerate & sort data objects according to Hilbert Space-Filling

Curve, then pack sorted data objects into chunks

• spatial index support:– Hilbert R-Tree deployed on master node– leaf nodes correspond to data chunks– initial filtering at master, tells workers which chunks to look

HiPC’12 Pune, India 8

1

2 3

4

o1

o 4

o3

o8

o6

o 5

o 2

o 7

sorted objects: o1, o3 , o8, o6 , o2 , o7 , o4 , o5

chunk 1 chunk 2 chunk 3 chunk 4

Range Queries on Spatial Data

• Fault-Tolerance Support – Sub-chunk Replication:step1: divide data chunks into k sub-chunksstep2: distribute sub-chunks in round-robin fashion

HiPC’12 Pune, India 9

Worker 1 Worker 2 Worker 3 Worker 4

chunk1 chunk2 chunk3 chunk4

chunk1,1 chunk1,2

step1

chunk2,1 chunk2,2

step1

chunk3,1 chunk3,2

step1

chunk4,1 chunk4,2

step1

* rack-failure: same approach, but distribute sub-chunks to nodes in different rack

k = 2

Range Queries on Spatial Data

• Fault-Tolerance Support - Bookkeeping:– add a sub-leaf level to the bottom of Hilbert R-Tree– Hilbert R-Tree both as a filtering structure and failure

management tool

HiPC’12 Pune, India 10

Aggregation Queries on Point Data

• nature of the task:– each data object is a point in 2D space– each query is defined with a dimension (X or Y), and aggregation function (SUM, AVG, …)

• computational model:– master/worker model– divide space into M partitions– no indexing support– standard 2-phase algorithm:

local and global aggregation

HiPC’12 Pune, India 11

worker 1 worker 2

worker 3 worker 4

X

Y

partial result

in worker 2

M = 4

Aggregation Queries on Point Data

• reducing communication volume– initial partitioning scheme has a direct impact– have insights about data and query workload:

P(X) and P(Y) = probability of aggregation along X and Y-axis|rx| and |ry| = range of X and Y coordinates

• expected communication volume Vcomm defined as:

• Goal: choose a partitioning scheme (cv and ch) that minimizes Vcomm

HiPC’12 Pune, India 12

Aggregation Queries on Point Data

• Fault-Tolerance Support – Sub-partition Replication:step1: divide each partition evenly into M’ sub-partitionsstep2: send each of M’ sub-partitions to a different worker node

• Important questions: 1) how many sub-partitions (M’)?2) how to divide a partition (cv’ and ch’) ?3) where to send each sub-partition? (random vs. rule-based)

HiPC’12 Pune, India 13

Y

X

M’ = 4ch’ = 2cv’ = 2

a better distribution

reduces comm. overhead

rule-based selection: assign to nodes which share

the same coordinate-range

Experiments• local cluster with nodes

– two quad-core 2.53 GHz Xeon(R) processors with 12 GB RAM• entire system implemented in C by using MPI-library• range queries:

– comparison with chunk replication scheme– 32 GB spatial data– 1000 queries are run, and aggregate time is reported

• aggregation queries:– comparison with partition replication scheme– 24 GB point data

• 64 nodes used, unless noted otherwise

HiPC’12 Pune, India 14

Experiments: Range Queries

Optimal Chunk Size Selection Scalability

HiPC’12 Pune, India 15

- Execution Times with No Replication and No Failures

(chunk size = 10000)

Experiments: Range Queries

Single-Machine Failure Rack Failure

HiPC’12 Pune, India 16

- Execution Times under Failure Scenarios (64 workers in total)- k is the number of sub-chunks for a chunk

Experiments: Aggregation Queries

Effect of Partitioning SchemeOn Normal Execution

Single-Machine Failure

HiPC’12 Pune, India 17

P(X) = P(Y) = 0.5, |rx| = |ry| = 10000 P(X) = P(Y) = 0.5, |rx| = |ry| = 100000

Conclusion

• a fault-tolerant environment that can process– range queries on spatial data and aggregation queries on

point data– but, proposed approaches can be extended for other type

of queries and analysis tasks

• high efficiency under normal execution• sub-chunk and sub-partition replications

– preserve load-balance in presence of failures, and hence– outperform traditional replication schemes

HiPC’12 Pune, India 18

Thank you for listening …

Questions

HiPC’12 Pune, India 19