tracking the flu pandemic by monitoring the social web

21
Tracking the Flu Pandemic by Monitoring the Social Web Vasileios Lampos and Nello Cristianini Jedsada Chartree 04/11/11

Upload: paxton

Post on 25-Feb-2016

61 views

Category:

Documents


4 download

DESCRIPTION

Tracking the Flu Pandemic by Monitoring the Social Web. Vasileios Lampos and Nello Cristianini. Jedsada Chartree 04/11/11. Introduction. Growing interest in monitoring disease outbreaks. Growing of twitter users - February, 201050 million tweets/day - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tracking the Flu Pandemic by Monitoring the Social Web

Tracking the Flu Pandemic by Monitoring the Social Web

Vasileios Lampos and Nello Cristianini

Jedsada Chartree 04/11/11

Page 2: Tracking the Flu Pandemic by Monitoring the Social Web

Introduction• Growing interest in monitoring disease outbreaks.• Growing of twitter users

- February, 2010 50 million tweets/day- June, 2010 65 million tweets/day (750 tweets/s

- 190 million users (Source: http://en.wikipedia.org/wiki/Twitter)

- 5.5 million users in the UK (2009)

Page 3: Tracking the Flu Pandemic by Monitoring the Social Web

Introduction• The National Statistics reports the flu delay of 1 to 2 weeks.• Twitter can reveal the situation up to date.

Page 4: Tracking the Flu Pandemic by Monitoring the Social Web

Methodology•Data 1. Official health reports from the Health Protection Agency (HPA), UK.

2. Twitter, UK - Daily average of 160,000 tweets (24 weeks from 06/22/2009 to 12/06/2009) - Twitter geolocation (geographical coordinates).

Page 5: Tracking the Flu Pandemic by Monitoring the Social Web

Methodology•Data Region A = Central England & Wales Region B = South England Region C = North England Region D = England & Wales Region E = Wales & Northern Ireland

RCGP

Qsur

RCGP = Royal College of General PractitionersQsur = Qsurveillance, University of Nottingham and Egton Medical Information Systems

Page 6: Tracking the Flu Pandemic by Monitoring the Social Web

Methodology

HPA Flu Rates Twitter Data

Correlation Coefficient

Flu-Score

Page 7: Tracking the Flu Pandemic by Monitoring the Social Web

Methodology• Flu-Score

K = Total number of markersn = Total number of tweets for one dayi = [1, k]J = [1, n]M = A set of textual markers = {mi}T = Daily set of tweets = The flu-score of a tweet

s(t j ) =m i (t j )i∑k

f (T,M ) =s(t j )j∑n

= j∑ m i (t j )i∑k • n

s(t j )

Page 8: Tracking the Flu Pandemic by Monitoring the Social Web

Results

Flu rates from the Health Protection Agency (HPA)

Page 9: Tracking the Flu Pandemic by Monitoring the Social Web

Results

Twitter’s flu-scores for region A-E (week 26 to 49, 2009)

Page 10: Tracking the Flu Pandemic by Monitoring the Social Web

Results

Correlation coefficients between Twitter’s flu-score and HPA’s rates

Page 11: Tracking the Flu Pandemic by Monitoring the Social Web

Results

Twitter’s flu-score and HPA rates for region D (England&Wales)

Page 12: Tracking the Flu Pandemic by Monitoring the Social Web

Methodology• Learning HPA’s flu rates from Twitter flu-score

K = Total number of markers, n = Total number of tweets for one dayi = [1, k], i = [1, n], M = A set of textual markers = {m i}T = Daily set of tweets, w = Weighted value

sw (t j ) =w im i (t j )i∑k

fw (T,M ) =sw (t j )j∑n

= j∑ w im i (t j )i∑k • n

fw i (T,M ) = w i •m i (t j )j∑k • n

Page 13: Tracking the Flu Pandemic by Monitoring the Social Web

Results

Linear regression using the markers

Page 14: Tracking the Flu Pandemic by Monitoring the Social Web

Methodology

• Automatic extraction of ILI textual markers 1. Creating candidate markers from:

- Encyclopedic reference - Informal references 2. Forming the flu-subscores with time series. - Ranking the weights by applying the LASSO

method.

Page 15: Tracking the Flu Pandemic by Monitoring the Social Web

Methodology

LASSO

T = shrinkage parameterVector w = the spare solutionW(ls) = the least squares estimates for regression problem

Page 16: Tracking the Flu Pandemic by Monitoring the Social Web

Methodology

Stemmed markers extracted by applying LASSO regionally

Page 17: Tracking the Flu Pandemic by Monitoring the Social Web

Results

Linear regression using the markers on the test sets after performing LASSO

Page 18: Tracking the Flu Pandemic by Monitoring the Social Web

Methodology

Stemmed markers extracted by applying LASSO on the aggregated data

Page 19: Tracking the Flu Pandemic by Monitoring the Social Web
Page 20: Tracking the Flu Pandemic by Monitoring the Social Web

Conclusion

• Tracking the flu outbreak in the UK using Twitter messages.

• High correlation between the flu-score and the HPA flu rates, greater than 95%.

Page 21: Tracking the Flu Pandemic by Monitoring the Social Web

Reference• V. Lampos and N. Cristianini. 2010. International workshop on Cognitive

Information Processing. 6 pp.