phase i summary report (for distribution)
TRANSCRIPT
RICE II Report: Phase 1 Data Analysis Summary
By: Alex Yerukhimov
This report summarizes the findings of data collected from the first phase of the RICE II
project, leveraging basic SMS technology to achieve real time disease surveillance of infectious
conditions. The findings not only gain clarity into the nature of the disease state in the country
urban vs rural and by differing geography, but also sheds insight on the behavioral patterns of
both medical workers and patients.
Analysis of the delay between seeing patients and sending SMS – Done to check the hypothesis
that communes would initially have a greater delay in sending SMS’s and would get better about sending right
away as the trial proceeded.
As can be seen here, there is no pattern to how much delay there is between the date that the patient was
seen and when the SMS was sent. The above hypothesis is false.
Analysis of number of cases reported over time – A timeline of case reports was done, but showed
only noise with no helpful or workable metrics. The timeline was removed. Stratified timelines by regional
and/or demographic strata would be useful, however there would need to be a significantly larger amount of
data points to discriminate patterns. With this few communes all timeline graphs look like white noise.
y = 0.0079x - 321.13R² = 0.0095
-1
4
9
14
19
24
29
7/3/2012 7/23/2012 8/12/2012 9/1/2012 9/21/2012 10/11/2012 10/31/2012 11/20/2012 12/10/2012 12/30/2012
Average number of days since pt was seen for all communes
Analysis of days on which no communication was received from any commune – It became
rapidly apparent when analyzing the average delay in sending SMS’s that there were many days on which no
communication was received from any commune.
While there were one suspicious week where
there was no data received at all (11/24/2012-
11/30/2012), 85% of days with no communication
were on a weekend, with the most common day
for no communication being Sunday. If we ever
need to take the system down for maintenance or
updates, Sunday is clearly the day to do it.
Analysis of the Average Reporting Delay – A metric devised to quickly and easily assess which
communes were not keeping up well when with their reporting. Delay was also analyzed stratifying for
geography and demography.
An analysis was done further looking at the delay in reporting by disease. No significant difference was found
between reporting delays of reporting ILI (av. 4.9 days) or Diarrhea (av. 3.8 days). P=.47 (2 tail t-Test, α=.05)
Analysis stratifying the data by demography likewise showed no significant difference: Urban (av. 3.4 days)
Rural (av. 3.34 days). P=.89 (2 tail t-Test, α=.05)
Analysis stratifying the data by geography likewise showed no significant difference: Mountains (av. 3.3 days)
Red River Delta (av. 3.5 days). P=.55 (2 tail t-Test, α=.05)
0
5
10
15
20
Days
Average Reporting Delayby Commune
Sunday57%
Monday3%
Tuesday3%
Wednesday3%
Thursday6%
Friday0%
Saturday28%
Days of the week during which no communication
was received
Analysis by Day of the Week – A detailed analysis of the days of the week that SMS messages were
received compared to the reported days that patients were seen yielded little remarkable observations, so the
data was stratified by geography and demography.
Whole Data Set Analysis.
Other than the notable lack of weekend data, and a dip in the middle of the week not much can be seen here.
It would be reasonable to conclude from these graphs that Mondays and Fridays are the busiest days with a
lull in the middle of the week. Additionally, it can be seen that Sunday’s patients are often reported on
Monday, inflating the Monday number. These graphs are misleading. See below
0
100
200
300
400
Number of SMS messages sent per day of the week
0
100
200
300
400
Number of SMS messages sent per day of the week
0
100
200
300
400
Number of entryDates per day of the week
0
100
200
300
400
Number of entryDates per day of the week
Analysis of stratified by Demography or Urban and Rural Communes
Looking at just the Urban data, specifically comparing the SMS messages sent to the date the patient was
seen, it can be seen that there is nothing inherently special about Monday or Friday in terms of patient
volume. On these graphs, it is clearly seen that patients seen on Sunday are reported on Monday, greatly
inflating Monday’s data. This is possibly reflective of the fact that the commune workers trained to send SMS’s
only work during the week, or that in general the communes take it easy on Sunday and catch up on Monday.
The midweek dip in patients is clearly seen in graphs showing the number of entryDates per day of the week.
0
50
100
150
200
250
Number of SMS messages sent per day of the week
(Urban)
0
50
100
150
200
250
Number of SMS messages sent per day of the week
(Urban)
0
50
100
150
200
Number of entryDates per day of the week (Urban)
0
50
100
150
200
Number of entryDates per day of the week (Urban)
Less descriptive than the Urban data, the Rural data none the less also reflects that patients seen on Sunday
are reported on Monday. Like the Urban data, the dip in patients in the middle of the week can be seen.
Unlike the Urban Data, these graphs show an additional busy day on Thursday which is reported on Friday.
This most likely accounts for the Friday spike seen on the whole data set graphs.
020406080
100120140
Number of SMS messages sent per day of the week
(Rural)
020406080
100120140
Number of SMS messages sent per day of the week
(Rural)
0
20
40
60
80
100
120
140
Number of entryDates per day of the week (Rural)
0
20
40
60
80
100
120
140
Number of entryDates per day of the week (Rural)
Analysis Stratified by Geographically by either mountainous or Red River Delta Communes.
The most non-conformist of the all the stratified data, the mountain region does not show any remarkable dip
in patient volume on Wednesdays. Tuesdays have the most SMS’s sent, but Mondays are the busiest. Patients
seen on Sunday are reported on Monday. Thursdays are also fairly busy and reported on Friday.
020406080
100120140160
Number of SMS messages sent per day of the week
(Mountains)
020406080
100120140160
Number of SMS messages sent per day of the week
(Mountains)
020406080
100120140160
Number of entryDates per day of the week
(Mountains)
020406080
100120140160
Number of entryDates per day of the week
(Mountains)
The Red River Delta strata shows no unique patterns. The reporting of weekend patients on Monday is clearly
visible.
Looking at all the strata, some generalities emerge: Mondays and Thursdays are the busiest days of the week.
Some of these patients are reported on Tuesday and Friday, Inflating the SMS volumes on those days
disproportionate to the patient volume. Patients seen on the weekend tend to be reported on Monday.
0
50
100
150
200
250
300
350
Number of SMS Sent per day of the week (Red River
Delta)
0
50
100
150
200
250
300
350
Number of SMS Sent per day of the week (Red River
Delta)
0
50
100
150
200
250
300
Number of entryDates per day of the week (Red River
Delta)
0
50
100
150
200
250
300
Number of entryDates per day of the week (Red River
Delta)
Analysis of Disease distribution by geography and demography – a look at the fraction of ILI and
Diarrhea found in Urban vs Rural and Mountains vs. Red River Delta.
Analysis by geography showed a statistically significant higher rate of ILI in the mountains (.76 of cases vs. .68
in the Red River Delta) P<0.01 (Z test for proportions). Conversely, diarrhea was significantly more prevalent in
the Red River Delta (.31 vs. .24 Mountains) by the same test and with the same probability.
Analysis by demography showed a statistically significant higher rate of ILI in urban communes (.76 of cases
vs. .66 in the rural) P<0.01 (Z test for proportions). Conversely, diarrhea was significantly more prevalent in the
rural areas (.34 vs. .24 urban) by the same test and with the same probability.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Diarrhea ILI
Case Proportion
Proportion of Disease by Geography
Mountains
Red River Delta
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Diarrhea ILI
Proportion of case
Proportion of Disease by Demography
Urban
Rural
Analysis of times of the day that SMS’s were sent – A metric not included in the original analysis of
the data, gives a valuable insight into the behaviors of patients and commune workers, raises many questions
and provides a clear avenue for further investigation.
Histograms of the various strata (urban, rural, mountains and red river delta) do not vary significantly from the
histogram of the whole data set showing a fairly normal bimodal distribution of the data.
Are the commune workers doing as we asked and reporting as patinets come in, making these reflective of the
0
50
100
150
200
250
300
350
Frequency
Sent Timestamp Histogram
0
20
40
60
80
100
120
Frequency
Rural Timestamp Histogram
0
50
100
150
200
250
Frequency
Urban Timestamp Histogram
0
50
100
150
200
Frequency
Mountain Timestamp Histogram
0
50
100
150
200
250
Frequency
Red River Delta Timestamp Histogram