tracking the flu pandemic by monitoring the social web

Post on 25-Feb-2016

61 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Tracking the Flu Pandemic by Monitoring the Social Web. Vasileios Lampos and Nello Cristianini. Jedsada Chartree 04/11/11. Introduction. Growing interest in monitoring disease outbreaks. Growing of twitter users - February, 201050 million tweets/day - PowerPoint PPT Presentation

TRANSCRIPT

Tracking the Flu Pandemic by Monitoring the Social Web

Vasileios Lampos and Nello Cristianini

Jedsada Chartree 04/11/11

Introduction• Growing interest in monitoring disease outbreaks.• Growing of twitter users

- February, 2010 50 million tweets/day- June, 2010 65 million tweets/day (750 tweets/s

- 190 million users (Source: http://en.wikipedia.org/wiki/Twitter)

- 5.5 million users in the UK (2009)

Introduction• The National Statistics reports the flu delay of 1 to 2 weeks.• Twitter can reveal the situation up to date.

Methodology•Data 1. Official health reports from the Health Protection Agency (HPA), UK.

2. Twitter, UK - Daily average of 160,000 tweets (24 weeks from 06/22/2009 to 12/06/2009) - Twitter geolocation (geographical coordinates).

Methodology•Data Region A = Central England & Wales Region B = South England Region C = North England Region D = England & Wales Region E = Wales & Northern Ireland

RCGP

Qsur

RCGP = Royal College of General PractitionersQsur = Qsurveillance, University of Nottingham and Egton Medical Information Systems

Methodology

HPA Flu Rates Twitter Data

Correlation Coefficient

Flu-Score

Methodology• Flu-Score

K = Total number of markersn = Total number of tweets for one dayi = [1, k]J = [1, n]M = A set of textual markers = {mi}T = Daily set of tweets = The flu-score of a tweet

s(t j ) =m i (t j )i∑k

f (T,M ) =s(t j )j∑n

= j∑ m i (t j )i∑k • n

s(t j )

Results

Flu rates from the Health Protection Agency (HPA)

Results

Twitter’s flu-scores for region A-E (week 26 to 49, 2009)

Results

Correlation coefficients between Twitter’s flu-score and HPA’s rates

Results

Twitter’s flu-score and HPA rates for region D (England&Wales)

Methodology• Learning HPA’s flu rates from Twitter flu-score

K = Total number of markers, n = Total number of tweets for one dayi = [1, k], i = [1, n], M = A set of textual markers = {m i}T = Daily set of tweets, w = Weighted value

sw (t j ) =w im i (t j )i∑k

fw (T,M ) =sw (t j )j∑n

= j∑ w im i (t j )i∑k • n

fw i (T,M ) = w i •m i (t j )j∑k • n

Results

Linear regression using the markers

Methodology

• Automatic extraction of ILI textual markers 1. Creating candidate markers from:

- Encyclopedic reference - Informal references 2. Forming the flu-subscores with time series. - Ranking the weights by applying the LASSO

method.

Methodology

LASSO

T = shrinkage parameterVector w = the spare solutionW(ls) = the least squares estimates for regression problem

Methodology

Stemmed markers extracted by applying LASSO regionally

Results

Linear regression using the markers on the test sets after performing LASSO

Methodology

Stemmed markers extracted by applying LASSO on the aggregated data

Conclusion

• Tracking the flu outbreak in the UK using Twitter messages.

• High correlation between the flu-score and the HPA flu rates, greater than 95%.

Reference• V. Lampos and N. Cristianini. 2010. International workshop on Cognitive

Information Processing. 6 pp.

top related