freight data dictionaryfreight data interchange ... • fdd provides potential model for other...

Post on 08-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Freight Data DictionaryLinking freight data sources across transportation

modes, subjects, and geography

Mary MoultonLeighton ChristiansenXin Wang

The National Transportation Library

Bureau of Transportation Statistics

US Department of Transportation

FCSM Metatada Workshop

September 14, 2018

1

2

Overview

• About the National Transportation Library

• Project background

• Functions and features

• Information architecture

• Implementation

• Future development

3

About NTL

Bureau of Transportation Statistics, Office of Information and Library Sciences (OILS)

Established in 1998, we provide to:

• Digital collections

• Data services

• Reference and research services

• Networking

We are an open access digital repository.

All items are in the public domain and available

for reuse without restriction.

4

NTL Mandates

• Transportation Equity Act for the 21st Century (1998)– “establish and maintain a National Transportation Library, which shall contain

a collection of statistical and other information needed for transportation decision making at the Federal, State, and local levels.”

• MAP-21 (2012)– Acquire, preserve and manage transportation information and information

products and services for use by DOT, other Federal agencies, and the public– Central repository for DOT research results and technical publications– Central clearinghouse for transportation data and information of the Federal

Government– Coordinate among and cooperate with multiple external parties to develop a

“comprehensive transportation information and knowledge network”

• White House Office of Science and Technology Policy memo (2013) requiring all Executive Departments and Agencies spending more than $100 million/year on R&D to ensure public access to peer-reviewed publications and digital datasets arising from federally-funded scientific research

5

Repository and Open Science Access Portal (ROSA P)

6

Project Background

7

Project Background

A national Freight Data Dictionary is proposed that offers a centralized, controlled, authoritative vocabulary capable of supporting: • Enhanced data inputs• Improved accuracy, efficiency, and flexibility in

freight data interchange• Freight data analysis and interoperability across

the transportation sector• Improved analysis and decision making at all

levels of government

8

National Cooperative Freight Research Program (NCFRP) Report 35

Implementing the Freight Transportation Data Architecture:Data Element Dictionary

http://www.trb.org/Main/Blurbs/173083.aspx

• Identifies “readily available” data sources associated with freight.

• Provides examples of freight data uses and applications.

• Presents an inventory of data elements and glossary terms found in the selected sources into a uniform typology.

• Identifies differences in data element definitions.

• Provides metadata tools and resources to guide data users on the appropriate steps and procedures for combining data from multiple freight data sources.

Result: a searchable and sustainable web-based application containing the study findings, an inventory of freight data dictionaries, and a discussion feature to be used by practitioners to exchange ideas and information.

9

Why is BTS Interested in the FDD?

• BTS is a freight and transportation statistics aggregator and publisher.– Several BTS products are represented in the FDD.

• BTS identified as logical host for FDD.

• FDD can provide a model architecture and platform for BTS metadata harmonization efforts.– In 2016 BTS launched a Data Management and Data Curation project.

• FDD provides potential model for other transportation modal data dictionaries.

10

Functions & Features

11

FDD Home Pagehttps://fdd.bts.gov/freight-data-dictionary/

12

FDD Simple Search: Origin Airport

13

FDD Simple Search:Origin Airport

There are two distinct main tables in the system:Data Dictionaries and Glossary Terms

(Don’t squint: we will zoom in on each of these.)14

FDD Simple Search: Origin Airport 1

Data Dictionary Number and Sources

15

FDD Simple Search: Origin Airport 2

Data Dictionary Sources16

FDD Simple Search: Origin Airport 3

Data Dictionary Source Table17

FDD Simple Search: Origin Airport 4

Added Elements

18

FDD Simple Search: Origin Airport 5

Similar Elements

19

FDD Simple Search: Origin Airport 6

Complete Table Profile

20

FDD Simple Search Example

Glossary Terms

21

Information Architecture

22

Sources

• 28 sources compiled (2 commercial, the rest public)

• 6,322 total number of data elements selected

• For each data source, the minimum required entities are the data source name, a table containing the elements, and the elements themselves.

23

Sources

1 Air Carrier Statistics2 Air Carrier Financial Reports3 Annual Survey of Manufacturers4 Border Crossing/Entry5 CTA Intermodal Terminals Database6 Carload Waybill Sample7 Commodity Flow Survey8 County Business Patterns9 Fatality Analysis Reporting System10 Federal Railroad Administration Safety Database11 Foreign Trade12 Freight Analysis Framework13 Highway Performance Monitoring System14 IHS Transearch

15 Motor Carrier Management Information System16 Motor Carrier Safety Measurement System17 National Agricultural Statistics Service18 National Ballast Information Clearinghouse Database19 National Corridors Analysis and Speed Tool Database20 North American Transborder Freight Database21 Pipeline and Hazardous Material Safety Administration22 Service Annual Survey23 Survey of Business Owners24 Topologically Integrated Geographic Encoding and Referencing25 U.S. Waterway Data26 Vehicle Inventory and Use Survey 27 Vehicle Travel Information System 28 Woods and Poole Economics, Inc.

24

Classification schema

• Role-Based Classification Schema (RBCS) organizes and categorizes data elements across multiple data sources

• Top level groups derived from analyzing freight data classification schema

• Secondary level groups differentiates data elements that identify objects from data elements that describe the features of an object

25

Classification schema

Primary, top level groups– Commodity

– Event

– Humans

– Industry

– Link

– Mode

– Place

– Time

– Unclassified (elements that do not fit in other roles)

Commodities (C) generated by the industry (I) are moved by various transport modes (M) from one place (P) to another (P) along the transportation network (L) within a time period (T). During the transport process, a chain of possible events (E) may occur that involve various stakeholders or individuals (H).

26

Classification schema

Secondary classification groups• Time elements: time period for reporting or freight movement

• Place elements: O-D freight movement or event location

– Place identifier (e.g. city name, county, state, country ... or geo point)

– Place feature (e.g. population, area)

• Commodity elements– Commodity identifier (standard commodity codes)

– Commodity feature (e.g. liquid, bulk, value)

• Link elements – Link identifier (e.g. roadway name, waterway name)

– Link feature (e.g. width, length)27

Classification schema

Secondary classification groups, cont’d.• Mode elements

– Mode identifier (e.g. truck, rail, air, pipeline)– Mode feature (e.g. unit train, vehicle class)

• Industry elements– Industry identifier (NAICS, SIC)– Industry feature (e.g. number of employees, sales)

• Event elements– Event identifier (e.g., an accident report number, a dredging

operation)– Event feature (e.g., number of fatalities, number of port calls)

• Human elements– Human identifier (e.g., investigating officer, reporting agent)– Human feature (e.g., drunk driver)

28

Glossaries

• 13,554 terms from 13 glossaries compiled into a single glossary

• Entries include glossary terms and their definitions, link to glossary term source

29

Recommended Data Types

Data Type Description

Nominal Values exist in name only, can be counted not measured

Binary Values involve 2 things (e.g. yes or no, true or false)

Date/Time Time of day, day of week or month, year, time period

Real Number Values can be measured, can be expressed in non-whole numbers (miles, tonnage, …)

Integer Values expressed only in whole numbers (number of trucks)

Currency Monetary values

Ratio Relation between 2 numbers (e.g. passenger miles per available seat miles)

Percentage Values expressed as fraction of 100 (e.g. percentage truck traffic)

Geometry Representation of GID data (e.g. point, line, polygon)

30

Implementation

31

Objective

The primary goal of this acquisition was to provide the solution and services necessary for BTS to offer freight vocabulary control for transportation industry and community in the manner of efficiency, agility, innovation, and potential cost savings.

32

Timeline

• BTS received the finished project from Texas University at Austin in 2015.

• BTS allocated the funding for the Migration of the System to Microsoft Azure Cloud in 2016.

• The National Transportation Library migrated the system to the DOT internal testing environment in mid-2017.

• NTL created the Project Charter, Performance Based Statement of Work (PBSW), and other related documents in late summer 2017.

• The Project was approved and funded in October 2017 for FY 2018.

33

Milestones

• Setup the Freight Data Dictionary in DOT network (early 2017)

• Convert Unix based oracle to Windows based Oracle (early 2017)

• Convert Oracle to Microsoft SQL Server in Azure Cloud (mid 2017)

• Implement Azure Index to the application (mid 2017)

• Recode the application in PaaS in Azure (early 2018)

• System testing (April 2018)

• Develop Web based Public Access API (May 2018)

• Move to staging and Production environment (May 2018)

34

Future

35

Challenges

• FDD lacks export feature

• Units of measurement are US

• Data elements cannot be displayed individually

• Source code not available

• Search is very simple – no Boolean operators

• No user documentation

• Point of contact for each data source not identified

• Planned as a collaborative platform

36

Future opportunities

• Collaborate with freight data community on governance and submission of new terms

• Forthcoming project for Federal Aviation Administration data dictionary using the same technology

• Planned enhancements: Boolean search, individual source search, download function, term suggestion form

• Outreach

37

Questions or comments?

ROSA Phttps://rosap.ntl.bts.gov/

Freight Data Dictionaryhttps://fdd.bts.gov/freight-data-dictionary/

38

Contact us

Mary Moulton Digital librarian https://orcid.org/0000-0002-1791-068Xmary.moulton@dot.gov202-366-0303

Leighton Christiansen Data curatorhttps://orcid.org/0000-0002-0543-4268leighton.christiansen@dot.gov202-366-2759

Xin WangSystems librarian xin.wang@dot.gov202-366-9014

https://transportation.libanswers.com/

39

References

• Walton, C Michael; Seedah, Dan P K; Choubassi, Carine; Wu, Hui; Ehlert, Andy; Harrison, Robert; Loftus-Otway, Lisa; Harvey, Jim; Meyer, Joel; Calhoun, Jacob; Maloney, Lucia; Cropley, Stephen; Annett, Ford. Implementing the Freight Transportation Data Architecture: Data Element Dictionary. NCFRP Report, Issue 35, 2015, 161p. http://trid.trb.org/view/1367451

• Freight Data Sharing Guidebook. NCFRP Report 25, Cambridge Systematics; North River Consulting Group; University of Washington, Seattle, Issue 25, 2013, 68p. http://trid.trb.org/view/1251804

• Quiroga, Cesar; Koncz, Nicholas; Kraus, Edgar; Villa, Juan; Warner, Jeffery; Li, Yingfeng; Winterich, David; Trego, Todd; Short, Jeffrey; Ogard, Elizabeth. Guidance for Developing a Freight Transportation Data Architecture. NCFRP Report, Issue 9, 2011, 105p. http://trid.trb.org/view/1085296

40

top related