query processing of massive trajectory data based on mapreduce qiang ma, bin yang (fudan university)...

Query Processing of Massive Trajectory Data based on MapReduce

Qiang Ma, Bin Yang (Fudan University)Weining Qian, Aoying Zhou (ECNU)

Presented By: Xin Cao (Aalborg University)

Outline

• Introduction • Preliminary• Trajectory Processing– Execution Overview– Storage– Indexing Methods– Query Processing

• Experimental Study• Future Works

Introduction

• Location-based services are playing important roles.

• Large volumes of diverse formats of trajectory data have been accumulated.

• Traditional centralized technologies may not deal with the large amount of trajectories.

• Cloud computing, such as GFS and MapReduce, provides a promising paradigm to conquer the explosion of trajectory data.

Challenge

• Huge volume, updates frequently, rapidly increasing.

• Trajectory data is “continuous”, i.e. ordered sequentially.

• Highly skewed.• MapReduce is good at offline data

analysis, but not efficient for online query.

Our Contributions

• Extend the MapReduce framework to manage massive sequential data, such as trajectories of moving objects.

• Study what kind of query processing methods are appropriate for large clusters.

• Provide two scalable indexing methods to facilitate query processing efficiently.

Preliminary

• Data Model - line segments model–A polyline in three-dimensional space.

• Query Types– Spatio-temporal Range Query:

– Q(Es, Et) → {Sk}– Trajectory-based Query:

– Q(O, Et) → {Sk}

Trajectory Processing

• Execution Overviews

Storage• Data are grouped with key and organized in data chunks in GFS-style

storage.• The whole data set is divided into several parts, and each part is called a

partition and assigned to one data chunk to store.• Each trajectory data is assigned to at least one partition according to

spatio-temporal information

Storage

• A good spatio-temporal partitioning makes the size of data per chunk is fairly uniform.

• Static partitioning strategies are easy to control and suitable for distributed scheduling, but may lead to load imbalance.

• Dynamic strategies can resolve load imbalance, but re-split data can cause distantly migration of large volume of data in clusters.

• Appropriate strategies should be trained

PMI (Partition based Multilevel Index)

• Aim to speed up spatio-temporal range queries.• Generate all candidate partitions by invoking space

partition strategy.• Store together as key/value.– <PartitionID, Sk>

• Each data chunk only contains trajectory segments that belong to the same partition.

• Multilevel index for each node can be built local. (using traditional centralized methods)

OII (Object Inverted Index)

• Aim to speed up trajectory based queries.• Collect each object's all historical trajectories.• Store together as key/value.–<OID, { PartitionID, T}>–Access according to key(object identifier).

Data Insertion

Query Processing• Query Processing

• Trajectory based Queries– Given any object ID, the system can locate the object's trajectory

according to OII.• Range Queries

Experimental Study

• Settings–Hadoop version 0.19.0–8 PC nodes• Ubuntu Linux version 8.04• Pentium IV 1.7GHz CPU• 512M memory

– Java SDK 1.42– Experiment data: Network-based Generator

Experiments – Load Balance

Standard Deviation of Partitioning Load Balance of PRADASE

Experiments – Data Importing and Index Creating

Data Importing with PMI Data Importing with OII

Experiments – Query Processing

Spatio-temporal Range Query Processing with PMI Trajectory Base Query Processing with OII

Future Works

• More heuristic partitioning methods.• Reducing data migration between nodes.• Efficient real-time query processing on

Cloud infrastructure.

Thanks!

query processing of massive trajectory data based on mapreduce qiang ma, bin yang (fudan university)...

trajectorybased query

explosion of trajectory

online query

massive sequential data

offline data analysis

mapreduce framework

scalable indexing methods

mapreduce qiang ma

Documents

yuanbin wu cs@ecnu -...

ecnu program candidate handbook

basics of endocrinology, diabetes & metabolism hussam h....

teacher education innovation for sustainable development —...

experiments on random packings of ellipsoids - nyu...

open access review article sulfur in human health › ecnu...

hyperbolic geometry - welcome to math dept. in ecnu

dyslipidemia (med-3) dr anwar a jammah, md, frcpc, facp,...

peerdb : a p2p-based system for distributed data sharing db...

chapter 11 rotational dynamics and static...

sino-us agriculture trade and wto negotiations (draft)...

ciee global institute shanghai ecnu off-campus housing...

modis products 2011 imapp training workshop: satellite...

performance evaluation of isoprene in ozone modeling of...

discrete mathematics and its...

teacher education innovation for sustainable development —...

adaptive bayesian density regression for high-dimensional...

cronicon › ecnu › pdf › ecnu-05-0000164.pdf · 1192...

ampcamp introduction to berkeley data analytics systems...

aoying zhou - ipsj dbs · daaa a adquta stream and queries...