portal: applications of new technology to transportation data...
TRANSCRIPT
+
Portal: Applications of New Technology to Transportation Data Archiving
Kristin Tufte & the Portal Team NATMEC, July 1, 2014, Chicago, IL
+Who is Kristin? ! 20 years Data Management System Design and Implementation
! Paradise (1990-1996) - sold to NCR/Teradata
! NiagaraST Data Stream System (1996-2014)
! S-Store – Streams + OLTP (2013-…)
! 10 years Transportation Data Management ! Portal Transportation Data Archive (2004-…)
! Portland Observatory (2013-…)
+New Technologies Big Data, Cloud, Data Streams…
+Big Data…Why all the interest?
! An increased number and variety of data sources that generate large quantities of data
! Sensors (e.g. location, acoustical, …); Web 2.0
! Realization that data was “too valuable” to delete ! ADUS - ITS Program Plan addendum 1998
! Dramatic decline in the cost of hardware, especially storage ! If storage was still $100/GB there would be no big data
revolution underway
Slide credit: David DeWitt, Microsoft/UW-Madison, SQL Server PASS Talk 2011
0 1 1 0 0
0 0 1 0 1 1
1 0 1 0 0 1
0 0 1 0 1 1
1 1 0 0 1 1
0 0 1 0 1 1
1 1 1 1 1 1
0 0 1 0
0 0 1 0
+Is Transportation Data Big Data?
0
100
200
300
400
500
600
700
800
900
1000 966
848
715
619
434 364
269 227
Amount of Stored Data By Sector (in Petabytes, 2009)
1 zettabyte? = 1 million petabytes = 1 billion terabytes = 1 trillion gigabytes
Sources: "Big Data: The Next Frontier for Innovation, Competition and Productivity." US Bureau of Labor Statistics | McKinsley Global Institute Analysis
Peta
byte
s
Figures credit: David DeWitt, Microsoft/UW-Madison, SQL Server PASS Talk 2011
+Volume, Velocity and Variety ! Variety is the most difficult and potentially most critical
axis of Big Data ! Transportation data has Variety in spades
! Big Data’s Implications for Transportation Operations: An Exploration ! White Paper – March 2014; Prepared by Volpe Center;
Prepared for US DOT ITS JPO
Ace Sensor – Radar – Bluetooth – Signal loops
Bus AVL, boardings SCATS AQ Sensor
Other: • Incident
reports • Crossing
demands • Bus priority
requests
Slide credit: Adam Moore, Miguel Figliozzi, PSU
+The Cloud: NoSQL vs. NewSQL vs. RDBMS
! Cloud Computing: Many definitions, many promises…
! Reduced time to insight…
! Saving resources…
SQL:
8
NoSQL:
" No cleansing! " No ETL! " No load! " Analyze data where it lands!
RDBMS
Data Arrives
Derive a schema
Cleanse the data
Transform the data
Load the data
SQL Queries
1
2
3 4 5
6 Sometimes termed
“Schema First”
Sometimes termed
“Schema Later”
Data Arrives
Application Program
1 2
NOSQL System
Slide credit: David DeWitt, Microsoft/UW-Madison, SQL Server PASS Talk 2011
+
Unused resources
Save Resources…
Data center in the cloud
Demand
Capacity
Time
Res
ourc
es
Demand
Capacity
Time R
esou
rces
Slide Credit: RAD Lab, UC Berkeley
9
+But, Don’t be Fooled… ! NoSQL systems make lots of promises…
! But they don’t work for everything…
10
+Portal An Update…
+Portal – Where is it at today?
! Happy 10th Birthday Portal! (April 1994 - …)
! Publically funded (Thanks to NSF, FHWA, OTREC, Metro & Transport, RTC)
! Focus on open-source software (PostgreSQL, PostGIS, OpenLayers, HighCharts)
! Focus on open data (Thanks to all our wonderful collaborators)
! New Transit Load and Performance Map (GTFS + PAX data)
! Lots of new (local) data feeds
+Freeway Data Flow
Portal OR-WA Archive
ODOT - Loop Detectors - Wavetronix - Bluetooth Travel Time
Lane County - Wavetronix
ODOT DAQ
# XML Feed # 20 second granularity # automated station inventory file
WSDOT - Loop Detectors - Wavetronix
# XML Feed # 20 second granularity
# XML Feed # 20 second granularity # being phased out
+Transit Data Flow
Portal OR-WA Archive
TriMet - AVL/APC (Init) - GTFS Data
# GTFS data published publically
TriMet Enterprise Database
# PAX data inserted in Enterprise Database # Data is cleaned and aggregated
# Quarterly PAX data exported
C-Tran - AVL/APC (Init) - GTFS
# No enterprise database (yet) # Process to be determined
# Portal Archive import processing combines PAX and GTFS data
+Arterial Data Flow - Current
Portal OR-WA Archive
City of Portland - Bluetooth
# Bluetooth data gathered from devices by scripts on CoP servers # Data uploaded to Portal weekly # Processing scripts calculate travel times
Washington & Clackamas County - Signal System (TransSuite)
# Central Signal Server is Shared
City of Portland - Signal System, including MOE Logs (TransSuite) - Bike/Ped Counts
# Hourly data feed created by TransSuite # Data uploaded to PSU hourly (sftp)
Clark County - Wavetronix
# Data generated using Wavetronix report-generation system # Data uploaded to PSU nightly
Near Future: City of Vancouver - Bluetooth Clark County - Bluetooth City of Vancouver - Wavetronix - Signal System (ATMS.Now)
+Transit Map!
! Combines GTFS (General Transit Feed Specification) with AVL/APC (Automatic Vehicle Location/Automatic Passenger Counter) data ! Lots of GIS processing involved
! GIS process is complex – freeway data too!!
! Metrics include: ! Stop activity
! Ons/offf; On-time performance
! Segment activity
! Load and Utilized capacity
Segment Load
+Stop-Level performance
+S-Store Big Data Bikeshare Demo
! Demo applying Data Stream technology to a Bikeshare scenario
! Collaboration between PSU, Intel, MIT, Brown Univ.
! To be shown at VLDB 2014 (Hangzhou, China)
+Portland Observatory: Urban Informatics Variety Testbed
! Challenges the third “V” of Big Data: variety
! Observations and context
! For researchers, planners, managers, public
! Building on experience from Portal transportation archive
! Increasing data touches
+THANK YOU!! http://portal.its.pdx.edu [email protected]