scaling the mobile millennium - uc berkeley amp...

26
January 11, 2012 Scaling the Mobile Millennium System in the Cloud Timothy Hunter, Teodor Moldovan, Matei Zaharia, Samy Merzgui, Justin Ma, Michael J. Franklin, Pieter Abbeel, Alexandre M. Bayen UC Berkeley

Upload: others

Post on 27-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012

Scaling the Mobile Millennium System in the Cloud

Timothy Hunter,Teodor Moldovan, Matei Zaharia, Samy Merzgui, Justin Ma,Michael J. Franklin, Pieter Abbeel, Alexandre M. Bayen

UC Berkeley

Page 2: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 2/26

Machine learning at scale

● One goal of the AMPLab software: enable ML researchers to work at scale

● Issues with more traditional frameworks:● Algorithms often iterative in nature● Not taken into account in traditional Map-Reduce● A number of alternatives: Pregel, Twister, HaLoop, Spark

● We report lessons applying one of these frameworks (Spark) to a real-world application (car traffic estimation)

● We identified 3 challenges not widely studied before:● Framework-level memory management● Sharing large parameters● Access to storage system

Page 3: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 3/26

Plan

● Need for traffic estimation● Overview of Mobile Millennium● Presentation of the algorithm● Programming with the Spark framework● Challenges, current solutions

Page 4: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 4/26

Need for good traffic estimation

● Traffic congestion affects everyone● Up-to-date estimation is critical● Well-studied in the case of highways● More complex for urban streets (arterial roads)● Most promising source of data: cellphone GPS

Page 5: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 5/26

Real-time processing of fleet data

● Input: sampled position of taxicabs

● Observed every minute

● Covers the whole SF Bay

● 0.5 Million points / day(60M / day total)

● 0.1 Million road links

Page 6: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 6/26

Estimating the travel times

● Input: sampled position of taxicabs

● Observed every minute

● Covers the whole SF Bay

● 0.5 Million points / day(60M / day total)

● 0.1 Million road links

Page 7: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 7/26

Filtering of fleet data

● Trajectories need to be recovered

● Done using a Conditional Random Field

● Output: segments of most likely trajectories between GPS points

Page 8: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 8/26

Mobile Millennium

● A cyberphysical system for participatory sensing

Page 9: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 9/26

Mobile Millennium

● A cyberphysical system for participatory sensing

Today's talk:Batch ML jobsoutsourced a cluster

Today's talk:Batch ML jobsoutsourced a cluster

Page 10: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 10/26

Estimation of arterial traffic

● Input:● Description of road network● Observations: start time, end time, route followed

● Output: probability distributions of travel time● For each link● At different time intervals● parameter vector θ (for example: mean and variance of link

travel time)©

Go

ogle

, In

c.

Page 11: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 11/26

Estimation of traffic: Graphical Model

Links

Observations

ill

Link states(50k multidimensional variables)

Partial travel time distributions for each link(about 400k-200M variables)

Travel time observations: pairs of travel time and path(about 100k-50M observations)

Hidden random variable

Observed random variable

Page 12: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 12/26

System workflow

Database

Worker nodes

Master node

Page 13: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 13/26

System workflow

Start link parameters(on master node)

Observations(distributed, persisted across nodes)

Page 14: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 14/26

System workflow

Network parameters(distributed over the nodes)

Page 15: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 15/26

System workflow

Travel time samplesFor each observation link

Page 16: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 16/26

System workflow

Travel time samples aggregatedon a link basis

Page 17: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 17/26

System workflow

New parameters are generatedThe maximize sampled travel timesfor each link.

The master collects the vector ofnew parameters.

Page 18: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 18/26

Using the Spark programming model

● Spark: Open-source cluster computing system● Can persist datasets in memory across cluster● In a fault-tolerant manner● Written in Scala (emphasizes functional programming)

observations = spark.textFile(“hdfs:...”) .map(parseObservation _).cache()params = // Initialize parameterswhile (!converged) { samples = observations.map( obs => generateSamples(obs, params)) params = samples.groupByKey().map( (linkId, vals) => mostLikelyParam(linkId, vals) ).collect()}

Main loop of the program:

Step 1 (E step)

Step 2 (M step)

Page 19: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 19/26

Challenges

Page 20: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 20/26

Efficient utilization of memory

● The observation data is stored in memory:● Be careful with the memory footprint● Diagnose when the cluster runs out of memory

● We cache pointer-based structures ● Significant overhead in the JVM

● Solution: keep serialized data in memory● Lesson: Need for more tools understanding memory

bottlenecks● Lesson: Provide more memory-efficient primitives

Page 21: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 21/26

Broadcast of large parameters

● Need to broadcast data to all workers:● At the start of the job (network description)● Between iterations (updated parameters θ)● Common problem to many ML problems

● Network description larger than 40MB● Solution: “broadcast variables” before starting computations:

● Using Cornet (BitTorrent-like protocol)

network = // load networkbv = spark.broadcast(network)observations = spark.textFile(“...”) .map(parseObservation(_, bv.get()))

network = // load networkobservations = spark.textFile(“...”) .map(parseObservation(_, network))

No broadcast BT broadcast0

500

1000

1500

2000

2500

Data loading time

Load

ing

time

(sec

)

Page 22: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 22/26

Access to storage system

● Mobile Millennium uses PostgreSQL● Reliable, previous experience, PostGIS extensions

● We ran the DB to the cloud:● Still 75% running time spent on data loading● Bursty access pattern● Small dataset overall (1GB)

● Solution: export data to HDFS● Stale snapshot● Distributed, much faster

● Ideal solution: same storage system for on-site and cloud applications

On-site DB Cloud DB HDFS1

10

100

1000

10000

100000

1000000

Data loading throughput

Ave

rag

e th

rou

gh

pu

t (re

cord

s /

se

c)

Page 23: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 23/26

Conclusion

● We presented a first test of Spark to a real-world ML problem

● This test exposed some strengths and weaknesses:

● Improved memory management

● Distributing common large parameters

● Difficulty of integration with common storage solutions

● Implementation now faster than real time

● Used to learn more sophisticated travel time distributions

● Evaluating the quality of the output

cores

runtime

Page 24: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 24/26

Thank you

Page 25: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 25/26

System workflow

Observations(1M-100M)Persisted in memory

Travel time samples1k per observation

Link distributions(100k)

Link distribution parameters

Page 26: Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 26/26

Estimation of arterial traffic

● We lack travel times on individual links:● One way around it: Expectation Maximization

● Randomly partition the total travel time amongst links● Weight each partition by likelihood according to model● Aggregate weighted samples for each link● For each link, update parameters to maximize

likelihood of link samples