fda data innovation lab and predictive analytics meetup dr. brand niemann director and senior data...

32
FDA Data Innovation Lab and Predictive Analytics Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup October 6, 2014 1

Upload: shannon-williams

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1

FDA Data Innovation Lab and Predictive Analytics Meetup

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

October 6, 2014

2

Agenda• 6:30 p.m. Welcome and Introduction – Report on Recent Meeting

with Dr. Taha Kass-Hout, FDA’s First Chief Health Informatics Officer (CHIO) and FDA Data Science Data Publication Tutorial:– Interest in our Meetup on OpenFDA, July 7th – Keynote at AFCEA Bethesda’s Health IT Day, December 2nd

• 7:00 p.m. Brooke Aker, Big Data Lens, Predictive Analytics for OpenFDA and Other Examples

• 7:45 p.m. Brief Member Introductions and Inter-American Development Bank Open Data Portal and FDA Examples

• 8:30 p.m. Open Discussion • 8:45 p.m. Networking • 9:00 p.m. Depart

3

Dr. Taha Kass-Hout, FDA’s First Chief Health Informatics Officer

Dr. Taha Kass-Hout is the Chief Health Informatics Officer of FDA [email protected] | @DrTaha_FDA

Dr. Jeffrey Shuren is Director of FDA's Center for Devices and Radiological Health [email protected]

4

OpenFDA• OpenFDA, a new initiative to provide unprecedented access to FDA data

and highlight projects in the public and private sector that use these data to further scientific research, educate the public, and save lives.

• OpenFDA is an initiative of FDA’s Office of Informatics and Technology Innovation to provide a new level of access to a number of public high-value FDA datasets via RESTful APIs and structured raw file download. Currently, the project is in an early-development stage, with an alpha release of two datasets planned for spring 2014 and a larger public release later in the year. Additionally, openFDA will provide a platform for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data and creating new partnerships and opportunities between the public and private sector (BOLDING BY ME).– Presidential Innovation Fellow: Sean Herron is a Presidential Innovation Fellow

serving at FDA [email protected] | @seanherronhttp://www.hhs.gov/idealab/innovate/openfda/

5

OpenFDA History• OpenFDA is the first innovation created by Taha Kass-Hout, MD, MS, upon joining FDA as the

first Chief Health Information Officer in March 2013.• Dr. Kass-Hout launched the project by obtaining a Presidential Innovation Fellow to focus on

policy and programmatic issues in July 2013.• In August 2013, a research and development contract was awarded to Iodine, Inc. to build the

site.• The public cloud environment was determined in September 2013, and Dr. Kass-Hout’s team

solicited agency and user input into policies, first priority datasets, and desirable technical characteristics of openFDA.

• In December 2013 FDA established the Office of Informatics and Technology Innovation (OITI) under Dr. Kass-Hout’s leadership.

• OpenFDA launched in Beta mode on June 2, 2014.• By September 2014, medical device reports, enforcement reports, and drug adverse event

reports were available.• There were over 4.5 million data calls, over 40,000 visitors to openFDA from all over the world,

dozens of press articles, and several websites that use openFDA in their own public offerings.• During Fiscal Year 2015, additional datasets and harmonization will be added.

https://open.fda.gov/about/

6

Making public FDA datasets more accessible

• Caution:– We're in beta! openFDA is a beta research project and not for

clinical use. We may limit or otherwise restrict your access to the API in line with our Terms of Service. Need help? Try StackExchange

• Purposes:– Open data for easier and better access to FDA datasets. APIs, raw

data, and documentation for high value public datasets.– Open source code and documentation. Shared on GitHub for

community contribution.– Open community to share examples, apps, and ideas. Developers,

researchers, and FDA on GitHub, StackExchange and Twitter.

https://open.fda.gov/

7

OpenFDA Updates• Introducing openFDA

– Taha Kass-Hout | 04 Mar 2014• FDA's Path Forward for Open Data and Next Generation Sequencing *

– Taha Kass-Hout | 06 Mar 2014 (See next slides)• Ten Things to Know About Drug Adverse Events *

– Sean Herron | 02 Jun 2014 (See next slides)• OpenFDA

: Innovative Initiative Opens Door to Wealth of FDA’s Publicly Available Data– Taha Kass-Hout | 02 Jun 2014

• OpenFDA Provides Ready Access to Recall Data– Taha Kass-Hout | 08 Aug 2014

• Providing Easy Public Access to Prescription Drug, Over-the-Counter Drug, and Biological Product Labeling– Taha Kass-Hout | 18 Aug 2014

• Providing Easy Access to Medical Device Reports Submitted to FDA since the Early 1990s– Taha Kass-Hout | Jeffrey Shuren | 19 Aug 2014

https://open.fda.gov/updates/

8

FDA's Path Forward for Open Data and Next Generation Sequencing

• Utility NGS (Next Generation Sequencing) in the Internet cloud: FDA is facing growing NGS needs for processing internal genome sequencing data as well as the NGS data from industry submissions. The NGS initiative is planning and developing a cloud-base Big Data platform and analytics for robust, secure and controlled data storage, analysis, and collaboration and potentially sharing public-access genome sequencing information.

• NGS is a Big Data Initiative.https://open.fda.gov/update/fda-path-forward-for-open-data-and-next-generation-sequencing/

9

Ten Things to Know About Drug Adverse Events

• 1. Start with the examples• 2. Know the limitations• 3. Know why the data is sometimes messy• 4. Make sure you check out the reference• 5. Learn the Lucene query syntax• 6. Don’t forget about count• 7. Use the openfda fields!• 8. Use .exact to count for phrases• 9. Beware of null values• 10. Watch for changes

– We’ll be adding additional data to this endpoint whenever a new Quarterly Data File is posted.• My Note: Bulk data downloads I used!https://open.fda.gov/update/ten-things-to-know-about-adverse-events/

10

Data Science Data Publications forBig Data Analytics

• New Government Data Science Best Practices:– Digital Government Strategy– Open Research Data Policy– Agency: HHS IdeaLab, NIH Data Commons, FDA Innovation Lab– White House NITRD Big Data Initiative and NSF Agency Strategic

Plan: Data Science, Data Infrastructure, and Data Publications• New Government Data Science Publication Examples:

– Federal Data Center Consolidation 2014– Performance.gov– FDA Data and FDA Data Innovation Lab– National Science Board Science & Engineering Indicators

11

Data Science Data Publication for Federal Data Center Consolidation 2014: Data Journalism

• In 2011 and 2012, I published three stories on the Federal Data Consolidation Initiative because of the poor quality and incompleteness of the data. It was one of the first non-federal applications of analytics I did after leaving government service. I decided to revisit the data for this and was please to find that the quality and completeness had improved considerably and so I decided to import the new spreadsheet into Spotfire and explore the results in multiple dynamically linked adjacent visualizations.

• Of the 3,665 data centers in the data set now, only 976 have been closed since the beginning of the program and 2,689 are yet to be closed in 2014-2015! The vast majority of these (2,254) belong to the Department of Agriculture.

Spreadsheet

13

Data Science Data Publications for FDA:Data Science Data Mining Process

• Recall OpenFDA Knowledge Base for previous visualization and analytics:– Brooke Aker, Biplab Pal, and Brand Niemann.

• Mined HealthData.gov for FDA data and built linked data spreadsheets (17) for Spotfire:– See next slides.

• Mined FDA Site Map for data:– Found Two: Data Standards and FDA Drug Approvals & Databases.– Downloaded and inventoried files (41) (ZIP, CSV & XLS) for Spotfire.– Used for FDA Data Innovation Lab Visualization Gallery.

14

Data Science for OpenFDA MindTouch Knowledge Base

http://semanticommunity.info/Data_Science/Data_Science_for_OpenFDA

15

Data Science for FDA DataExcel Spreadsheet Data Ecosystem

• FDA @ HealthData.gov• Summary FDA• FDA Site Map• FDA-TRACK• FDA Glossary• FDA-TRACK Research

Glossary• FDA Drug Approvals &

Databases

• Summary All• Holdren Memo Agencies• HealthData.gov Subject 09172014• HealthData.gov Agency 09172014• HealthData.gov Date 09172014• HealthData.gov Year 09172014• HealthData.gov Period 09172014• HealthData.gov Spatial 09172014• HealthData.gov Start 09172014• HealthData.gov Media 09172014

http://semanticommunity.info/@api/deki/files/30746/HHSFDA.gov.xlsx?origin=mt-web

16

Data Science Data Publication:FDA Data in Spotfire

• Cover Page-Performance Analytics: FDA TRACK• Content Analytics: Summary Statistics• Content Analytics: HealthData.gov Statistics

09172014• Content Analytics: FDA @ HealthData.gov• Network Analytics: FDA Glossary & Site Map• Data Analytics: FDA Drug Approvals &

Databases

22

Data Analytics:FDA Drug Approvals & Databases

My Note: Inventory to prioritize further data science data publication work!

Web Player

23

FDA Data Innovation Lab Visualization Gallery:Spreadsheet Inventory

http://semanticommunity.info/@api/deki/files/30746/HHSFDA.gov.xlsx?origin=mt-web

My Note: This inventory is updated as one drills down into the data sets!

24

FDA Data Innovation Lab Visualization Gallery:File Folder

My Note: Some folders contain multiple files!

25

Suggestions

• Help the FDA Data Innovation Lab with data publication gallery and wall posters.

• Help the FDA Data Innovation Lab with their Open Data Lab Day.

• Organize Joint Meetups and promote use of the FDA Data Innovation Lab.

• Help form Data Science Teams to work on FDA big data problems.

26

Open Data Portal for the Inter-American Development Bank: Comments

• Another good meeting last night.• Thank you for organizing this meetup, very helpful! Special thanks

to Brand for all the info you shared. I'm looking forward to future ones!

• Terrific and innovative data visualizations can make a big impact indeed.

• This week was very good - exposure to interesting beta products (Semantic Insights, this week) as well as new approaches to visualization techniques are always things to which I look forward. When I get to see an illustration of the concept of "cognitive load" in visualizations the way it was shown in this session (with Sankey diagrams), it makes it an even better session. Great stuff! And I get to play around with a new data set - even better!

http://www.meetup.com/Federal-Big-Data-Working-Group/events/206366842/

27

Open Data Portal for the Inter-American Development Bank: Annette Hester

• Thanks for hosting me last night. It was a pleasure to share ideas with such a knowledgeable group.

• We would be delighted if you or any in your group took time to understand the database and compare it to traditional graphs and other visualizations. As I mentioned, the easiest way to do so would be using the first data graph, Energy Flows (http://www.iadb.org/eic/database). It is a Sankey Graph with a twist. You can find similar products at:– http://www.iea.org/Sankey/– https://flowcharts.llnl.gov/– www.energyliteracy.com– http://www.sankey-diagrams.com/tag/ghg/

• And if you google energy flow charts you will find quite a variety. • The more I look at energy data and what we have published, the better I feel

about our database. I look forward to the results of your investigation. Please do keep in touch… and do feel free to post this note on the meetup website.

28

Open Data Portal for the Inter-American Development Bank: Energy Flows Visualization

http://www.iadb.org/en/topics/energy/energy-innovation-center/flow-institutional-data,8879.html?view=v11

29

Data Science for IDB Data:MindTouch Knowledge Base

http://semanticommunity.info/Data_Science/Data_Science_for_IDB_Data

My Initial Data Science Data Publication:• How was the data collected?• Where is the data stored?• What are the results?• Why should we believe them?

The broader context and constructive critique

32

Inter-American Development Bank Open Data Portal Examples, Etc.

• Please post your interest in providing a visualization example(s) and explanations to our Meetup site

• Also feel free to use the FDA data or any other data you are working with in visualizations and explanations.– NSB Science & Engineering Indicators– FDA Data Innovation Lab Visualization Gallery

• This is your time to shine!