framework for managing data from emerging...
TRANSCRIPT
TRANSPORTATION RESEARCH BOARD
@NASEMTRB#TRBwebinar
Framework for Managing Data from Emerging
TechnologiesSeptember 10, 2020
The Transportation Research Board
has met the standards and
requirements of the Registered
Continuing Education Providers
Program. Credit earned on completion
of this program will be reported to
RCEP. A certificate of completion will
be issued to participants that have
registered and attended the entire
session. As such, it does not include
content that may be deemed or
construed to be an approval or
endorsement by RCEP.
PDH Certification Information:
•1.5 Professional Development Hour (PDH) – see follow-up email for instructions•You must attend the entire webinar to be eligible to receive PDH credits•Questions? Contact Reggie Gillum at [email protected]
#TRBwebinar
Learning Objectives
#TRBwebinar
1. List the fundamental differences between traditional and modern data management approaches and practices
2. Discuss the tools in the guidebook
Guidebook for ManagingData from Emerging TechnologiesKelley Klaver Pecheux, Ph.D. Senior Director Transportation Data
Benjamin PecheuxDirector of Information Services
NCHRP 08-116 WebinarSeptember 10, 2020
Overview of Presentation
Project background and objectives
Challenges to managing data from emerging technologies
What is big data?
Why should agencies move toward the modern approach to data management?
How can agencies make the move toward this approach?
Roadmap to Managing Data from Emerging Technologies for Transportation
Supporting tools
2Guidebook for Managing Data from Emerging Technologies for Transportation
Project Background and Objectives
Background:ØNew, big, and varied datasets are available to
transportation agencies at an increasing pace.
ØThese data have tremendous potential to offer new insights to transportation agencies.
ØThe volume, speed, and granularity of these data are unprecedented and will fundamentally alter the transportation sector.
Research Objectives:ØDevelop a framework for managing data from
emerging technologies, including data from connected and automated vehicles and data linked to new mobility initiatives.
ØOutline a process for applying this framework to help agencies incorporate these data into the decision-making process.
3Guidebook for Managing Data from Emerging Technologies for Transportation
Transportation Agency Challenges to Managing Data from Emerging Technologies
• Reliance on traditional database management systems. Data from emerging technologies are too large, too varied in nature, and will change too quickly to be handled by these traditional data systems.
• Struggle to break down business unit and data silos.
• Do not fully recognize the value of big data or the eminent need to ready for it.
• Do not fully understand the uses and benefits of cloud-based architecture conducive to handling data from emerging technologies.
• Have difficulty hiring and retaining modern data management professionals.
“Our big data issues are straightforward, we don’t have the
technology, money, or skills.”– CITY DOT
• Experience a loss of control to vendors over data, technology, and service agreements.
• Are unequipped to handle this level of big data at an organizational level.
4Guidebook for Managing Data from Emerging Technologies for Transportation
5
Big Data is More Than a Buzzword or Simply “Lots of Data”
• Big data may refer to data sets that are so vast and complex that they require new and powerful computational resources to process.1
• Big data may encompass all the non-traditional strategies and technologies neededto gather, organize, process, and generate insights from large datasets.2
• Big data is an approach to generating knowledge in which advanced techniques are applied to the capture, management, and analysis of very large and diverse volumes of data – data so large, so varied, and analyzed at such speed that it exceeds the capabilities of traditional data management and analysis tools.3
• Big data is a term that describes the large volume of structured and unstructured data that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It is what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.4
• Big data is a new attitude by businesses, non-profits, government agencies, and individuals that combining data from multiple sources could lead to better decisions.5
What is Big Data?
1 Big Data. Dictionary.com, 2019.2 Ellingwood, J. An Introduction to Big Data Concepts and Terminology, Digital Ocean, 2016.3 Burt, Cuddy, Razo. Big Data’s Implications for Transportation Operations: An Exploration, USDOT, 2014. 4 What is Big Data? What is Big Data, SAS, 2019.5 Press, G. 10 Big Data Definitions - What's Yours? Forbes, 2014.
6
Traditional vs Modern Big Data Management
Managing data from emerging technologies requires a complete paradigm shift. These data cannot be handled simply by adding more hardware or processing power. The nature of the data demands an updated approach.
Guidebook for Managing Data from Emerging Technologies for Transportation
• With increased connectivity between vehicles, sensors, systems, shared-use transportation, and mobile devices, unexpected and unprecedented amounts of data are being added to the transportation domain at a rapid rate.
• These new data offer the potential to uncover insights to drive better decision-making at all levels of transportation agencies in a way that is simply not happening now.
• The potential value of these new data cannot be easily or efficiently extracted by traditional methods; the complexity of the task requires new big data tools and techniques.
• As data sources become more varied and change more and more rapidly, the traditional approach cannot cope with the complexity and cannot be re-designed quickly or cost-effectively enough to handle frequent data and business requirements changes.
ØModern big data methods to collect, transmit, store, aggregate, analyze, apply, and share these data need to be adopted by transportation agencies if they are to be utilized to facilitate better decision-making.
7
Why Should Agencies Move Toward the Modern Approach to Data Management?
• Provides a modern big data management framework that introduces new concepts and methodologies, best practices, and 100+ recommendations for managing data in a modern, flexible, scalable, and sustainable way.
• Lays out a roadmap on how to begin to shift – technically, institutionally, and culturally – toward effectively managing data from emerging technologies.
• Provides examples and references of transportation agencies currently exploring or already navigating the implementation of big data, including their challenges and successes.
• Discusses common misconceptions within the transportation industry regarding big data management.
8
This Guidebook Can Help Agencies Shift Toward the Modern Data Management Approach
Laying the Foundation
Modern Big Data Management Framework
Roadmap to Managing Data from Emerging Technologies
100+ recommendations across the data lifecycle
8-step process toward organizational change
Contrasts traditional vs. modern approach for 11 characteristics of data systems
Presents modern big data architecture
Supporting Resources & Toolsv NCHRP 08-116 Research Reportv Data Management Capability
Maturity Self-Assessment (DM CMSA)v Data Sources Catalog Toolv Big Data Governance Role &
Responsibilitiesv Frequently Asked Questions (FAQ)
Supporting Resources & Tools
Big Data Management Lifecycle and Framework
• The lifecycle defines the four major components of managing data throughout the entire lifecycle including the creation of data, storage of data, use of data, and sharing of data.
• The framework builds from these data management components to include big data industry best practices and over 100 associated recommendations for managing big data across the lifecycle.
• The framework should be applied throughout each step in the roadmap.
9
Big Data Lifecycle
Create
StoreUse
Share
Guidebook for Managing Data from Emerging Technologies for Transportation
Roadmap to Managing Data from Emerging Technologies for Transportation
• Step 1 – Develop an understanding of big data
• Step 2 – Identify a use case and an associated pilot project
• Step 3 – Secure buy-in from at least one person from leadership for the pilot project
• Step 4 – Establish an embryotic big data test environment/ playground
• Step 5 – Develop the big data project within the playground
• Step 6 – Demonstrate the value of the data to other business units
• Step 7 – Demonstrate the value of the data to executive leadership
• Step 8 – Establish a formal data storage and management environment
10Guidebook for Managing Data from Emerging Technologies for Transportation
Supporting Resources & Tools• NCHRP 08-116 Final Research Report – Framework for Managing Data from Emerging Technologies to
Support Transportation Decision-Making, provided under separate cover, documents the research activities and provides supplemental information for reference to support implementation of the guidebook.
• Data Management Capability Maturity Self-Assessment (DM CMSA) – offers over 100 questions to allow agencies to gauge their data management practices, as well as identify areas for improvement.
• Data Sources Catalog Tool – a tool to catalog existing and potential data sources.
• Big Data Governance Plan Template – provides a list of recommendations to consider when developing a modern data governance approach, a description and frameworks for big data governance, and a tool for tracking the big data governance roles and responsibilities within an agency.
• Frequently Asked Questions (FAQ) – responses to frequently asked questions regarding big data implementation, management, governance, use, and security.
11Guidebook for Managing Data from Emerging Technologies for Transportation
12
Whether an agency:
• Is starting from scratch with a new technology data set
• Is trying to make the business case for emerging technology data
• Is already working on a big data project
• Has an issue or problem that might be solved with emerging technology data
• Is looking for a new enterprise data management solution
The steps and guidance outlined in this document are designed to walk them through the necessary data management policies, procedures, and practices to fully meet the needs of data from emerging technologies.
In Closing
Guidebook for Managing Data from Emerging Technologies for Transportation
Contact Information
Kelley Klaver Pecheux, [email protected]
Benjamin B. [email protected]
13
For further information about the project, please contact:
Guidebook for Managing Data from Emerging Technologies for Transportation
Step 1 includes information on the following:• What is big data?
• Big data characteristics
• Big data concepts
• When to pursue big data
• Common misconceptions regarding big data
• Case study – The Importance of Understanding Big Data
• Additional resources
Step 1 - Develop an Understanding of Big Data
14Guidebook for Managing Data from Emerging Technologies for Transportation
Step 2 - Identify a Use Case and an Associated Pilot Project
Step 2 includes guidance on the following:• Selecting of a use case and pilot project that align with business unit, leadership,
and organizational goals, including examples of drivers for change, example big data sources of interest, and associated example use cases and pilot projects
• Engaging others in the cause, including those internal to the business unit, cross-business unit, junior and mid-level staff, and external partners
• A case study on the Portland Urban Data Lake Pilot Project (PUDL)
15Guidebook for Managing Data from Emerging Technologies for Transportation
Step 3 – Secure Buy-In from at Least One Person from Leadership for the Pilot Project
Step 3 includes guidance on the following:• Establishing and communicating the value proposition for the pilot project, including
example projects, value propositions, and questions to assist in developing the “pitch”
• Ways to create a sense of urgency and a fear of missing out (FOMO)
• De-risking the decision by identifying and communicating risks and other potential barriers up front
• Knowing how and when to make the pitch
16Guidebook for Managing Data from Emerging Technologies for Transportation
Step 4 – Establish an Embryotic Big Data Test Environment or PlaygroundStep 4 includes guidance on the following:
• Establishing buy-in from IT, including understanding potential challenges and barriers, as well as the pros and cons of on-premise versus cloud storage
• Establishing the playground, including both the data storage layer and the data processing layer
• Taking ownership and responsibility for analytical projects
• Common misconceptions regarding big data storage
• A case study on storing data on-premise vs in the cloud
17Guidebook for Managing Data from Emerging Technologies for Transportation
Step 5 – Develop the Pilot Project Within the Big Data Test Environment/PlaygroundStep 5 includes guidance on the following:
• Developing/ensuring the availability of the right expertise, including the pros and cons of various options (e.g., training/hiring in-house staff, trusted contractors and university partners, and big data experts/consultants)
• Developing the project by applying a data science perspective (e.g., collecting raw data, processing and cleaning the data, performing exploratory data analyses, building data science pipelines)
• Iteratively developing and improving the project and the associated outputs/data products
• Case studies on negotiating technical contracts for data services and building data knowledge
18Guidebook for Managing Data from Emerging Technologies for Transportation
Step 6 – Demonstrate Value of Data to Other Business Units Step 6 includes guidance on the following:
• Building support for the data and project across the organization, including other mid-level/branch managers that may have an interest in the data, project, and data products (or similar products) for their own business areas
• Using the data to tell the story of success by crafting a compelling story using understandable and persuasive visualizations that tie the insights uncovered in the data to the ability to address an issue or solve a problem of the business unit
• Getting others involved in sharing and using their data within the test environment, including iteratively expanding the use of the data to improved, enhanced, and new use cases
• A case study on iterative success and growth of big data within a transportation agency
19Guidebook for Managing Data from Emerging Technologies for Transportation
Step 7 includes guidance on the following:
• Presenting the success stories/business case to executives
• Continuing to build support, foster data sharing, and grow iteratively and incrementally
• Pushing for organization change/adoption of a formal big data environment
• A case study on buy-In from executive leadership
Step 7 – Demonstrate Value of Data to Executive Leadership
20Guidebook for Managing Data from Emerging Technologies for Transportation
Step 8 – Establish a Formal Data Storage and Management Environment Step 8 includes guidance on the following:
• Establishing a clear vision and goals
• Making data accessible yet secure
• Integrating at the data level
• Using data to make decisions
• Merging existing projects into the same data infrastructure
• Continuing to seek input from other stakeholders and to iterate on evolving data governance plans and procedures
• Seeking continuous improvement by periodically reviewing and revising datasets, technology, processes and procedures, documentation, security and privacy protection, metadata catalog, etc.
• A case study on one transportation agency’s continued room for growth
21Guidebook for Managing Data from Emerging Technologies for Transportation
The Road to Modern Data Management
Kentucky Transportation Cabinet
The Catalyst for Change
• 2012-2013 and 2013-2014 Winters
• Record snowfalls
• Record costs
• Salt shortages
• Interstate incidents
• Spring of 2014 – KYTC began research for a Snow and Ice Decision Support System.
• September of 2014 – KYTC signed with the Waze Connected Citizen Program.
• November 2014 – Title 23, CFR 511.301-315 requirements for ITS.
The First Attempt
• November 2014 – KYTC rolled out an out-of-the-box real-time GIS solution.
• The system:
• Provided a map of the data
• Provided simple dashboards with statistics
• Tracked 200 snow plows every 10 seconds
• Displayed 200-300 Waze reports every 2 minutes
• Displayed Doppler radar images every 5 minutes
The result: The system crumbled.
Communication & Collaboration
Business Needs
Snow and Ice2012-2013, 2013-2014$68-$70 Million Costs(~$20mil Over Avg.)
September 2014New Waze Data
November 2014Federal ITS Requirements
GIS
Online Mapping
LRS Snapping
Geofencing
Enterprise Data
Distributed Computing
Python Scripts
Data Lake
Big Data
Brainstorming Use Cases
On-Premise vs. Cloud• In 2014, cloud computing was restricted, forcing an on-premise solution.
• Requested 20 servers and received 7
• Developed in-house expertise
• In 2017, cloud computing was still restricted by the centralized IT.
• Benefits of big data were not fully understood
• Cloud computing was still considered new for state DOTs
• Scaling and development continued, adding to the complexity of the architecture
• System administration required an outside contractor for server support
• In 2019, KYTC received approval to proceed with a proof-of-concept to move the real-time data pipeline to the cloud.
The Journey Continues• Over the years, the system has grown by steadily
meeting the needs of additional use cases (e.g., new data sources or repurposing existing data for different groups).
• The system has matured through the phases of proof-of-concept to being enterprise-ready, far outliving the original snow and ice use case.
• The system has been recognized and is being adopted (as of fall 2019) as an integral part of the enterprise architecture plans for integrating, processing, storing, analyzing, reporting, and republishing data.
Number of Servers Incoming Data Sources Shared Data Business Use Cases
Iterative Success & Growth: Data
125 million records per week | 12.4K records per minute | 206 records per second
• Waze Incidents• Waze Traffic Speeds• Snow Plows (AVL)• HERE Traffic Speeds• Roadway Weather Stations• County Activity Reports• KYMesonet• CoCoRahs• Doppler Radar
• Twitter• Statewide TMC Reports• Metro TMC Reports• Dynamic Message Signs• iCone Traffic Speeds• Truck Parking• NWS Forecasts: Rain, Snow, Ice
Iterative Success & Growth: Use Cases• Title 23, CFR 511.301-315 (2014)• Snow and Ice Management (2014)• Situational Awareness (2014)• Traveler Information System (2015)• Incident Detection (2015)• Incident Recovery Times (2016)• Traffic Control Plan Training (2016)• Environmental (2017)• Work Zone Monitoring (2018)• Secondary Crash Analysis (2019)• Department of Motor Carriers (2019)• COVID-19 Traffic Analysis (2020)• Work Zone Performance Committee (2020)
• Predictive Analytics*• Secondary Crash Detection*• Congestion Mitigation*• Signal Timing*• Automated DMS Messages*• Automated Bookkeeping*• Research, BI, & Data Science*
*in development
7 billion recordsin hot storage
Growth Rate:
125 million recordsper week
17.86 million recordsper day
744K recordsper hour
12.4K recordsper minute
206 recordsper second
Buy-in from Executive Leadership• With the success of the snow and ice pilot project, the architecture and data gained
additional attention and support from executive leadership.• Executive leadership started referring other divisions to consult with the big data
team, which has become the single point of contact within the agency for all things related to real-time data.
• Over time, the big data group has become something of an internal consulting service to the other departments, thereby growing the influence and exposure of big data.
Sample Use Cases
Snow and Ice: Decision Support (2019)
Roadway Weather (2019)
Roadway Weather Analytics (2019)
Traveler Information (2019)
Incident Detection (2019)
Work Zone Safety Review (2019)
Work Zone Monitoring (2019)
Work Zone Performance (2019)
Some Lessons Learned• Communication is key!
• GIS had been evaluating real-time GIS for nearly a year before this use case.
• Enterprise Data had been trying to justify Hadoop two years before the snow and ice use case.
• Hardware Procurement• Utilizing on-premise architecture requires a certain level of understanding about properly
scaling central processing units (CPU), random access memory (RAM), and storage ratios.
• Software• Sometimes standardizing around single solutions isn’t feasible, like using an IoT tool for reading
data only being updated once per day.
• Misunderstood Concepts• The idea of a data lake, which stores “duplicate” raw data.
• The idea of designing a multiuse “platform” as opposed to a standalone “application.”
PORTLAND’S DATA JOURNEY APPROACHES ATTEMPTED & LESSONS LEARNED
PORTLAND’S DATA JOURNEY
2018
IDEATION
2019
ITERATION
2020
SUBSTANTIATION
Survey the market, assess our options, consider pathways,
make our selection
Develop and test our solution, run pilots and integrate into
City infrastructure
Document results, assess real costs, demonstrate value and
potential
IT ALL STARTED WITH…
…A SENSOR
AT&T CITYIQ TRAFFIC SENSORS
IDEATION ITERATION SUBSTANTIATION
OVERNIGHT THE OLD WAY WAS RENDERED UNWORKABLE
IDEATION ITERATION SUBSTANTIATION
1,440 x 24 x 7 x 365
The solution had to be…
¡ Dynamic ¡ Scalable¡ Modular ¡ Cloud-based¡ Based on today’s best
practice ¡ Cost Efficient
AN UPGRADE (IN BOTH TECH & THINKING) WAS REQUIRED
IDEATION ITERATION SUBSTANTIATION
The solution could not…
¡ Build off or mirror past efforts
¡ Leverage existing technical infrastructure
¡ Rely on the City’s current thinking re: data management
OUR SOLUTION
IDEATION ITERATION SUBSTANTIATION
A data agnostic, scalable solution that would allow us to experiment, test, & innovate
Total investment (so far) $150K, which was tied to our
sensor project budget
OUR PARTNER COMPOSITION
Cloud Infrastructure (Azure) Partner
Azure Commissioning/Support Partner
Data Management Platform Partner
Data Architecture/Implementation Partner
Data Visualization Partner
IDEATION ITERATION SUBSTANTIATION
PUDL STACK
IDEATION ITERATION SUBSTANTIATION
OUR APPROACH TO PROVING OUT OUR SOLUTION
PILOTS3 TOTAL | UNIQUELY SCOPED | 1 YEAR EACH
IDEATION ITERATION SUBSTANTIATION
PILOTS IN DEPTH
IDEATION ITERATION SUBSTANTIATION
1 2 3
A RESULT IN BRIEF
IDEATION ITERATION SUBSTANTIATION
TODAYBus arrival times are based on Ideal Traffic Conditions
TOMORROWBus arrival times will factor in regular traffic delays, accidents, closures, and unforeseen events
The screen says the bus will arrive in 7 mins
The bus actually arrives in 20 mins (screen still
says 7 mins)
The screen says the bus will arrive in 13 mins,
updates as traffic conditions change +
Ideal Traffic Conditions Real Time Accidents, Closures, Events
Ideal Traffic Conditions
NOW WE NEED TO SUSTAIN OUR PROGRESS
Yet the landscape remains challenging…
IDEATION ITERATION SUBSTANTIATION
On the one hand • Significant progress made to date
• Cost & efficiency gains realized since project inception
• Technical staff are supportive
• Vision for the future is crystalizing
On the other• Leaders and staff still don’t completely
understand what PUDL is
• Cost savings are in areas that rarely received attention prior and are too unpredictable to instill confidence
• Other priorities vs. PUDL
THE HYPE CYCLE IS REAL
IDEATION ITERATION SUBSTANTIATION
We are here
THE STEPS WE’RE TAKING
Tangibly Demonstrating PUDL’s
Value & Potential
IDEATION ITERATION SUBSTANTIATION
The keys to City-wide adoption (as informed by our lessons learned)
Persistently Searching for
Executive Buy-In
Incrementally Requesting Budget (vs. Everything at Once) & Seeking
Partners
Continuously Improving Our
Technical Capacity
Actively Governing Data Intake,
Management, and Use
QUESTIONS OR FOR MORE INFORMATION
IDEATION ITERATION SUBSTANTIATION
Kevin Martin (he/him/his) | Smart City PDX/Tech Services Manager Planning & Sustainability [email protected] | 503.823.7710 https://www.smartcitypdx.com | @smartcitypdx
Michael Kerr | Strategy & Innovation Manager | Office of the Director Portland Bureau of Transportation 1120 SW 5th Avenue, Suite 800 Portland, OR 97204 503-823-5808 [email protected]
Moderator: Vaishali Shah
Kelley Pecheux
Today’s Panelists#TRBWebinar
Chris Lambert
Ben Pecheux
Michael Kerr
Upcoming webinars
• October 1: Governing Data to Improve Transportation Asset Management
• For all TRB Webinars, visit our website.
Get Involved with TRB
#TRBwebinarReceive emails about upcoming TRB webinarshttps://bit.ly/TRBemails
Find upcoming conferenceshttp://www.trb.org/Calendar
Get Involved with TRB
Be a Friend of a Committee bit.ly/TRBcommittees– Networking opportunities
– May provide a path to Standing Committee membership
Join a Standing Committee bit.ly/TRBstandingcommittee
Work with CRP https://bit.ly/TRB-crp
Update your information www.mytrb.org
#TRBwebinar
Getting involved is free!
#TRBAM is going virtual!
• 100th TRB Annual Meeting is fully virtual in January 2021
• Continue to promote with hashtag #TRBAM• Check our website for more information
#TRB100