dempsy - stream based “big data” applied to traffic

25
Dempsy - Stream Based “Big Data” Applied to Traffic

Upload: penney

Post on 22-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Dempsy - Stream Based “Big Data” Applied to Traffic. Television. Radio. Traffic.com Sensor Network. Internet. DOT Sensor / Flow Data. Wireless. Incident and Event Data. Historic Data. In-Vehicle. Probe Data. Collection. Fusion. Dissemination. Traffic End-to-End. Data Fusion. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dempsy - Stream Based “Big Data” Applied to Traffic

Dempsy -Stream Based “Big Data” Applied to Traffic

Page 2: Dempsy - Stream Based “Big Data” Applied to Traffic

Traffic End-to-End

Incident and Event Data

DOT Sensor / Flow Data

Traffic.com Sensor Network

Collection

Probe Data

Fusion

In-Vehicle

Wireless

Dissemination

Internet

Television

Radio

Historic Data

Data Fusion

Page 3: Dempsy - Stream Based “Big Data” Applied to Traffic

Sensor Data Collection

Page 4: Dempsy - Stream Based “Big Data” Applied to Traffic

Probe Data Collection

ProbeCollector

Metro Mapping

Alg

orith

ms

Arc

hive

Handlers

Veh. IdentifierSpeedHeadingLocation (lat,lon)Metro ID

Roadnet-work

...Third party

probe data provider

Metro 1 Mapmatch

Metro 2 Mapmatch

Metro 3 Mapmatch

Metro Mapping

3rd Party ProbeCollector

Metro Mapmatchers

Veh. IdentifierSpeedHeadingLocation (lat,lon)Metro IDNavteq Edge IDLocation along edge

Page 5: Dempsy - Stream Based “Big Data” Applied to Traffic

Overview of Arterial Model

5

Probe Data

Map Matcher

Path Analysis

Travel Time Allocation

Arterial Travel Times

Arterial Traffic Data

• Map Matcher Matches the probe data to road network in real time

with associated probabilities.• Path Analysis

Routes between pairs of probable matches across chains and applies a Hidden Markov Model to the results to determine the most likely path through a set of points.

• Travel Time AllocationAssigns the path travel times to the appropriate

arterial segments.• Arterial Model

Combines expected values with the allocated travel times and previous estimates into the current estimate

Page 6: Dempsy - Stream Based “Big Data” Applied to Traffic

Width of the Road• Center the normal distribution over the probe reported location• Compute the distance from the peak of distribution to the edges of the road.

o It is possible to estimate road width from the number of lanes• Integral of the normal distribution gives the probability of the probe being on that road.

6

b

ag

x

eP 22

2)(

221

Page 7: Dempsy - Stream Based “Big Data” Applied to Traffic

Real Life Examples

7

Page 8: Dempsy - Stream Based “Big Data” Applied to Traffic

Technology Survey

9

• Streams Processing Engines• Hadoop / Map Reduce• “Distributed Actors Model”

Page 9: Dempsy - Stream Based “Big Data” Applied to Traffic

Technology Survey

• Streams processing enginesoOracle, IBM, SQLStream oNot a good fit. More for relational data processing.

• Hadoop Map ReduceoNot a good fit for low latency computations (15 to 30 minutes

per batch)oHbase Co-processors are a possibility but more of a hack

• Actors ModeloS4, Akka, Stormo Just what we need

10

Page 10: Dempsy - Stream Based “Big Data” Applied to Traffic

Dempsy – Distributed Elastic Message Processing System• POJO based Actors programming abstraction eliminates synchronization bugs

• Framework handles messaging and distribution• Fine grained partitioning of work• Elastic• Fault tolerant

11

Page 11: Dempsy - Stream Based “Big Data” Applied to Traffic

Dempsy – Distributed Elastic Message Processing System• Separation of concerns – scale agnostic apps versus scale aware platform

• Support code quality goals (guidelines, reuse, design patterns, etc)

• Functional programming (-like)• Map Reduce (-like)• Distributed Actors Model (-like)

12

Page 12: Dempsy - Stream Based “Big Data” Applied to Traffic

Dempsy

13

MP Container

ZooKeeper

MP Container Cluster

Dist

ribut

or

MP Container

MP Container

ZooKeeper

MP Container Cluster

MP Container

Page 13: Dempsy - Stream Based “Big Data” Applied to Traffic

System Characteristics - DevOps• Manage every node and every process in exactly the same way. 

E.g. arterial, path analyzer, map matcher look the same to an operations person.

• Everything runs on exactly the same hardware• Scale elastically.  To increase throughput, just add a machine to

the cluster – no extra work required.  The system can even be automatically scaled as load increases.

• Robust failure handling – no real-time manual intervention required when nodes fail.

14

• Development, QA and Integration teams can use a pool of resources rather than dedicated resources.  The pool can grow elastically as required by overlapping project schedules

Page 14: Dempsy - Stream Based “Big Data” Applied to Traffic

Example – Traffic Processing

15

Page 15: Dempsy - Stream Based “Big Data” Applied to Traffic

Map Matching and Path Analysis as an Example

• Algorithm decompositionoDiscrete Business Logic Components

1. Map Matching2. Vehicle Accumulation3. Path Analysis (currently A* routing)

oMP Addressing1. Tile based addressing2. Addressing by vehicle id3. Tile based addressing

16

Page 16: Dempsy - Stream Based “Big Data” Applied to Traffic

17

Adaptor

MapMatch

MP

MapMatcher

Singleton

Vehicle Accumulat

orMP

PathAnalyzer

Singleton

PathAnalyzer

MP

TravelTime

Singleton

TravelTime

MP

TrafficState

Singleton

TrafficStateMP

Linkset AstarGraph

Traffic History

Segment

Table

Key: tilex 40k

Key: probeIdx 10M

Key: tilex 40k

Key: tilex 40k

Key: segment Idx 2M

x 50 x 50 x 50 x 50

TrafficReporter

OLTP

X 9Every 60 seconds

Analytics

Extract

x 1

Quality & Audit Logs App Logs

Distributed Log Collection

Distributed File Storage

Dempsy – Arterial Model Example

Page 17: Dempsy - Stream Based “Big Data” Applied to Traffic

Dempsy Proof Of Concept Results

18

Page 18: Dempsy - Stream Based “Big Data” Applied to Traffic

Dempsy Testing and Analysis• Decomposed Arterial (MegaVM) into Dempsy Message processors• Implemented first two stages of Arterial, Map Match and Path Analysis• Implemented Message Processors as trivial POJOs around existing

mapmatch and path analysis libraries• Wrapped into a Dempsy Application• Front ended with Dempsy Adaptor to read probe data from files and

inject them into Dempsy• Deployed to Amazon EC2 to prove out scaling, collect performance

data, and analyze behavior of system under load• Three main rounds of testing

1. Original HornetQ Transport (Sprint 6.2 )2. Lighter weight TCP/Socket Based Transport (6.3 Sprint)3. More finely grained Message Keys (6.3 Sprint)

19

Page 19: Dempsy - Stream Based “Big Data” Applied to Traffic

Distributed Map Match /Path Analyzer Testing• Ran multiple tests on EC2 with increasing number of Dempsy Nodes

• Scaled Map Match in Parallel• Used a constant number of Probe Readers, empirically set at 3

20

Page 20: Dempsy - Stream Based “Big Data” Applied to Traffic

Test 1: HornetQ Transport

Stack Width (# Map/Path Nodes)

Throughput Probes per Second

1 6,2812 12,4993 18,8404 25,7165 28,123

21

1 2 3 4 50

50001000015000200002500030000

Probes/Sec

Probes/Sec

Page 21: Dempsy - Stream Based “Big Data” Applied to Traffic

Test 2: TCP Transport

Stack Width (# Map/Path Nodes)

Throughput Probes per Second

1 14,4982 27,7423 32,2114 53,1695 49,207

22

1 2 3 4 50

100002000030000400005000060000

Probes/Sec

Probes/Sec

Page 22: Dempsy - Stream Based “Big Data” Applied to Traffic

Test 3: TCP w/ Small Tiles Transport

Stack Width (# Map/Path Nodes)

Throughput Probes per Second

1 14,4002 27,5813 41,8394 55,7255 68,2856 81,967

23

1 2 3 4 5 60

20000400006000080000

100000Probes/Sec

Probes/Sec

Page 23: Dempsy - Stream Based “Big Data” Applied to Traffic

Development Life Cycle• Write Message Processor (MP) prototypes• Configuration using the Dependency Injection container of your

choice (currently supports Spring).• Develop using one node or pseudo distributed mode• No messaging code to write• No queues• No synchronization• Narrow scope of concern – each processing element deals with

only a limited set of data. There may be millions of processing elements.

• Simple debugging and unit testing

24

Page 24: Dempsy - Stream Based “Big Data” Applied to Traffic

Trade-offs• There’s no free lunch

o Sacrifice guaranteed delivery, message ordering, message uniqueness

o Gain response timeo Gain simple clusteringo Gain memory efficiency (no queuing)o Gain lower latency under load

• Where does this worko Statistically based analyticso Techniques where sacrificing input data quantity results in low

output quality• Where doesn’t this work

o Transaction based systemso Techniques where a message results in ‘false’ results (e.g. bank

transactions)

25

Page 25: Dempsy - Stream Based “Big Data” Applied to Traffic

StartConstruct

Startup@Start

Start

Prototype Ready

clone()message

explicitinstantiation

@Activate

Ready

Activate

No Activate

@MessageHandler

message

@Output

scheduledoutput

complete

completeoutput

@Evictablescheduledevictcheck

no eviction

@Passivate

eviction

Passivate

finalize

jvm gc

jvm gc

Elasticity

Message Processor PrototypeMessage Processor

Proposed AdditionProposed Addition

Future Addition

DEMPSY – MP LIFECYCLE DIAGRAM