daniel lamblin week4_demo
TRANSCRIPT
MTA Delay & MonitoringDaily delay data on lines 1-6, L & S
Motivation
The MTA’s subway delays are rarely announced in actionable timeframes.
Most riders are only interested in knowing about issues on their daily commute at their regular travel time.
When Switzerland completed it’s Swiss Rail 2000 initiative (late, in 2004) it enabled users to subscribe to their regular train, notifying them early of issues.
Web UI Demo
The demo is linked from dlamblin.github.io/mta-delay-monitoring
Processing pipeline
GTFS
Challenges: Resolved
GTFS is in Protocol Buffers, while a solid technology it’s not conducive to rapid prototyping; cluster configuration and reconfiguration.
Format is ‘good’ for a current snapshot, but poor for an yearly overview.
Original format is compact; only 50GB for 2 years of GTFS
Then 1.5GB for 2y turn-style data a few MB for the network geojson, station, route and schedule information
Challenges: Remaining
Originally aimed for a repeatable containerized deployable pipeline with Kubernetes, with replication controllers and monitoring, but the timeframe gave
➢ A manually configured cluster of 4 EC2 m4.large nodes.➢ Running shared HDFS, Kafka, Spark, HBASE, web-api.➢ Would have output additional views for manual analysis EG with Drill➢ Generating sample user loads.
Clusters, sizes, throughput etc.
Plan to show some measurements of data sizes, possibly # of user notifications supported.
Daniel Pascal LamblinBS-CS 1999; years at:
GoogleDoubleClick Studio componentized ad authoring web app with campaign reporting, and internal
reporting on reporting
Travelocity & Sabre GDSRecommend Travel Experiences, with shared and
reviewed trip journals
EMC CorpPetaSite support: PB of tape
Raised in Dubai
A 4 year old took this →
Sources
Realtime:
➔ MTA has a realtime feed of subway train locations for 1,2,3,4,5,6 S and L trains.
➔ This includes service alerts.
Engineered:
➔ User data for subscriptions to trains and notifications of personal delays would need to be generated for demonstration use.
Static:
➔ The MTA provides batches of historical data for these too. datamine-history.s3.amazonaws.com/2015-09-01-09-{01,06,11,16,26,31,36,41,46,51,56}
➔ Weekly turnstile data can be used to generate realistic interest in transit throughout the day.
➔ Station IDs and geolocations