bringing together the social and technical in big data analytics: why you can't predict the flu...

Post on 19-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bringing Together the Social and Technical in Big Data Analytics: Why You Can't

Predict the Flu from Twitter, and Here's How

David A. BroniatowskiAsst. Prof. EMSE

http://www.seas.gwu.edu/~broniatowski

PUBLIC HEALTH CYCLE

Population Doctors

Surveillance

Intervention

• Traditional mechanisms

• Surveys

• Clinical visits

REQUIRES:DATA ON THE POPULATION

This has limited research

TWITTER• Short messages (140 chars) posted to public internet

• Content: news, conversation, pointless babble

• Huge volume

• 500 million a day

WHY TWITTER?

• Huge volumes of data

• A constant stream of small updates

• Nothing like waiting in line to buy cigarettes behind a guy in a business suit buying gasoline with ten dollars in dimes

• I eat pizza too much

• I'm at Cvs Pharmacy (117th and kendall, Miami)

INFLUENZA SURVEILLANCE

INFLUENZA SURVEILLANCE

• CDC has nationwide surveillance network with 2700 outpatient centers reporting

• ILI: influenza-like illness

• Cons:

• Slow (2 weeks)

• Varying levels ofgeographicgranularity

TWITTER SURVEILLANCE

• Twitter influenza surveillance must be

• 1) Accurately track ground truth

• Identify infection tweets

• 2) Effective at both municipal and national level

• Expand tweet geolocation and evaluate municipal accuracy

• 3) Predictive in real time

• Deploy previously trained system on this flu season

PIPELINE CLASSIFIERS

• Three steps using supervised machine learning+NLP

• Step 1: Identify health tweets

• Step 2: Identify flu related

• Step 3: Awareness vs. infection

TWITTER SURVEILLANCE

• Twitter influenza surveillance must be

• 1) Accurately track ground truth

• Identify infection tweets

• 2) Effective at both municipal and national level

• Expand tweet geolocation and evaluate municipal accuracy

• 3) Predictive in real time

• Deploy previously trained system on this flu season

LOCAL EFFECTIVENESS

• Current work focuses on US national flu rates

• Useful surveillance needed by region/state/city

• How can Twitter track local trends?

• Is it accurate?

• Is there enough data?

• Only about 1% of Twitter is geocoded

CARMEN(Dredze et al., 2013)

• Over 4000 known locations (countries, states, counties, cities)

• Geocordinates only: ~1%

• Expanded locations: ~22%

• Available in Python and Java

TWITTER SURVEILLANCE

• Twitter influenza surveillance must be

• 1) Accurately track ground truth

• Identify infection tweets

• 2) Effective at both municipal and national level

• Expand tweet geolocation and evaluate municipal accuracy

• 3) Predictive in real time

• Deploy previously trained system on this flu season

SURVEILLANCE RESULTSPearson

Correlation 2009 2011

Keywords 0.97 0.646

Flu Classifier 0.97 0.519

Google Flu Trends

0.97 0.897

Infection 0.972 0.7832

GOOGLE FLU TRENDS GETS IT WRONG?Lohr, S. (2014). Google flu trends: the limits of

big data. New York Times.

Pearson Correlation:

Keywords: 0.75Infection: 0.93

• ILI counts:

• Infection: 0.88

• Keywords: 0.72

BLIND EVALUATION

2013-20140.95 Correlation

MOST RECENT DATA

Broniatowski, D. A., Dredze, M., Paul, M. J., & Dugas, A. (2015). Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study. JMIR Public Health and Surveillance, 1(1), e5.

PREDICTING ACTUAL FLU IN BALTIMORE

Broniatowski, D. A., Dredze, M., Paul, M. J., & Dugas, A. (2015). Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study. JMIR Public Health and Surveillance, 1(1), e5.

HEALTHTWEETS.ORG

HEALTHTWEETS WORLDWIDE

Some Other Projects

David A. BroniatowskiAsst. Prof. EMSE

http://www.seas.gwu.edu/~broniatowski

29

BIG DATA FOR GROUP DECISION MAKING: EXTRACTING SOCIAL NETWORKS FROM FDA ADVISORY PANEL

MEETING TRANSCRIPTS

(Broniatowski & Magee, 2013 American Journal of Therapeutics; Broniatowski & Magee, 2012 IEEE Signal Processing Magazine; Broniatowski & Magee, in preparation)

“GERMS ARE GERMS” AND “WHY NOT TAKE A RISK?”

MODELS AND DATA FOR RISKY DECISION MAKING IN THE ED

(Broniatowski, Klein, & Reyna, in press, Medical Decision Making Broniatowski & Reyna, in preparation)

HOW DO WE DESIGN SYSTEMS TO USE INFORMATION FLOW TO OUR ADVANTAGE?

We would like to deepen our intuitionregarding system architectures

(Broniatowski & Moses, in preparation)

32

QUESTIONS?• Big data

• Influenza tracking and coupled contagion

• Group decision-making

• Individual decision-making

• Formal models

• Medical and engineering applications

• Formal and mathematical models

• Systems architecture

• Design for flexibility

broniatowski@gwu.edu

top related