tracking the flu pandemic by monitoring the social web
DESCRIPTION
Tracking the Flu Pandemic by Monitoring the Social Web. Vasileios Lampos and Nello Cristianini. Jedsada Chartree 04/11/11. Introduction. Growing interest in monitoring disease outbreaks. Growing of twitter users - February, 201050 million tweets/day - PowerPoint PPT PresentationTRANSCRIPT
Tracking the Flu Pandemic by Monitoring the Social Web
Vasileios Lampos and Nello Cristianini
Jedsada Chartree 04/11/11
Introduction• Growing interest in monitoring disease outbreaks.• Growing of twitter users
- February, 2010 50 million tweets/day- June, 2010 65 million tweets/day (750 tweets/s
- 190 million users (Source: http://en.wikipedia.org/wiki/Twitter)
- 5.5 million users in the UK (2009)
Introduction• The National Statistics reports the flu delay of 1 to 2 weeks.• Twitter can reveal the situation up to date.
Methodology•Data 1. Official health reports from the Health Protection Agency (HPA), UK.
2. Twitter, UK - Daily average of 160,000 tweets (24 weeks from 06/22/2009 to 12/06/2009) - Twitter geolocation (geographical coordinates).
Methodology•Data Region A = Central England & Wales Region B = South England Region C = North England Region D = England & Wales Region E = Wales & Northern Ireland
RCGP
Qsur
RCGP = Royal College of General PractitionersQsur = Qsurveillance, University of Nottingham and Egton Medical Information Systems
Methodology
HPA Flu Rates Twitter Data
Correlation Coefficient
Flu-Score
Methodology• Flu-Score
K = Total number of markersn = Total number of tweets for one dayi = [1, k]J = [1, n]M = A set of textual markers = {mi}T = Daily set of tweets = The flu-score of a tweet
€
s(t j ) =m i (t j )i∑k
€
f (T,M ) =s(t j )j∑n
= j∑ m i (t j )i∑k • n
€
s(t j )
Results
Flu rates from the Health Protection Agency (HPA)
Results
Twitter’s flu-scores for region A-E (week 26 to 49, 2009)
Results
Correlation coefficients between Twitter’s flu-score and HPA’s rates
Results
Twitter’s flu-score and HPA rates for region D (England&Wales)
Methodology• Learning HPA’s flu rates from Twitter flu-score
K = Total number of markers, n = Total number of tweets for one dayi = [1, k], i = [1, n], M = A set of textual markers = {m i}T = Daily set of tweets, w = Weighted value
€
sw (t j ) =w im i (t j )i∑k
€
fw (T,M ) =sw (t j )j∑n
= j∑ w im i (t j )i∑k • n
€
fw i (T,M ) = w i •m i (t j )j∑k • n
Results
Linear regression using the markers
Methodology
• Automatic extraction of ILI textual markers 1. Creating candidate markers from:
- Encyclopedic reference - Informal references 2. Forming the flu-subscores with time series. - Ranking the weights by applying the LASSO
method.
Methodology
LASSO
T = shrinkage parameterVector w = the spare solutionW(ls) = the least squares estimates for regression problem
Methodology
Stemmed markers extracted by applying LASSO regionally
Results
Linear regression using the markers on the test sets after performing LASSO
Methodology
Stemmed markers extracted by applying LASSO on the aggregated data
Conclusion
• Tracking the flu outbreak in the UK using Twitter messages.
• High correlation between the flu-score and the HPA flu rates, greater than 95%.
Reference• V. Lampos and N. Cristianini. 2010. International workshop on Cognitive
Information Processing. 6 pp.