phase i summary report (for distribution)

10
RICE II Report: Phase 1 Data Analysis Summary By: Alex Yerukhimov This report summarizes the findings of data collected from the first phase of the RICE II project, leveraging basic SMS technology to achieve real time disease surveillance of infectious conditions. The findings not only gain clarity into the nature of the disease state in the country urban vs rural and by differing geography, but also sheds insight on the behavioral patterns of both medical workers and patients. Analysis of the delay between seeing patients and sending SMS – Done to check the hypothesis that communes would initially have a greater delay in sending SMS’s and would get better about sending right away as the trial proceeded. As can be seen here, there is no pattern to how much delay there is between the date that the patient was seen and when the SMS was sent. The above hypothesis is false. Analysis of number of cases reported over time – A timeline of case reports was done, but showed only noise with no helpful or workable metrics. The timeline was removed. Stratified timelines by regional and/or demographic strata would be useful, however there would need to be a significantly larger amount of data points to discriminate patterns. With this few communes all timeline graphs look like white noise. y = 0.0079x - 321.13 R² = 0.0095 -1 4 9 14 19 24 29 7/3/2012 7/23/2012 8/12/2012 9/1/2012 9/21/2012 10/11/2012 10/31/2012 11/20/2012 12/10/2012 12/30/2012 Average number of days since pt was seen for all communes

Upload: alex-yerukhimov

Post on 16-Apr-2017

97 views

Category:

Documents


1 download

TRANSCRIPT

RICE II Report: Phase 1 Data Analysis Summary

By: Alex Yerukhimov

This report summarizes the findings of data collected from the first phase of the RICE II

project, leveraging basic SMS technology to achieve real time disease surveillance of infectious

conditions. The findings not only gain clarity into the nature of the disease state in the country

urban vs rural and by differing geography, but also sheds insight on the behavioral patterns of

both medical workers and patients.

Analysis of the delay between seeing patients and sending SMS – Done to check the hypothesis

that communes would initially have a greater delay in sending SMS’s and would get better about sending right

away as the trial proceeded.

As can be seen here, there is no pattern to how much delay there is between the date that the patient was

seen and when the SMS was sent. The above hypothesis is false.

Analysis of number of cases reported over time – A timeline of case reports was done, but showed

only noise with no helpful or workable metrics. The timeline was removed. Stratified timelines by regional

and/or demographic strata would be useful, however there would need to be a significantly larger amount of

data points to discriminate patterns. With this few communes all timeline graphs look like white noise.

y = 0.0079x - 321.13R² = 0.0095

-1

4

9

14

19

24

29

7/3/2012 7/23/2012 8/12/2012 9/1/2012 9/21/2012 10/11/2012 10/31/2012 11/20/2012 12/10/2012 12/30/2012

Average number of days since pt was seen for all communes

Analysis of days on which no communication was received from any commune – It became

rapidly apparent when analyzing the average delay in sending SMS’s that there were many days on which no

communication was received from any commune.

While there were one suspicious week where

there was no data received at all (11/24/2012-

11/30/2012), 85% of days with no communication

were on a weekend, with the most common day

for no communication being Sunday. If we ever

need to take the system down for maintenance or

updates, Sunday is clearly the day to do it.

Analysis of the Average Reporting Delay – A metric devised to quickly and easily assess which

communes were not keeping up well when with their reporting. Delay was also analyzed stratifying for

geography and demography.

An analysis was done further looking at the delay in reporting by disease. No significant difference was found

between reporting delays of reporting ILI (av. 4.9 days) or Diarrhea (av. 3.8 days). P=.47 (2 tail t-Test, α=.05)

Analysis stratifying the data by demography likewise showed no significant difference: Urban (av. 3.4 days)

Rural (av. 3.34 days). P=.89 (2 tail t-Test, α=.05)

Analysis stratifying the data by geography likewise showed no significant difference: Mountains (av. 3.3 days)

Red River Delta (av. 3.5 days). P=.55 (2 tail t-Test, α=.05)

0

5

10

15

20

Days

Average Reporting Delayby Commune

Sunday57%

Monday3%

Tuesday3%

Wednesday3%

Thursday6%

Friday0%

Saturday28%

Days of the week during which no communication

was received

Analysis by Day of the Week – A detailed analysis of the days of the week that SMS messages were

received compared to the reported days that patients were seen yielded little remarkable observations, so the

data was stratified by geography and demography.

Whole Data Set Analysis.

Other than the notable lack of weekend data, and a dip in the middle of the week not much can be seen here.

It would be reasonable to conclude from these graphs that Mondays and Fridays are the busiest days with a

lull in the middle of the week. Additionally, it can be seen that Sunday’s patients are often reported on

Monday, inflating the Monday number. These graphs are misleading. See below

0

100

200

300

400

Number of SMS messages sent per day of the week

0

100

200

300

400

Number of SMS messages sent per day of the week

0

100

200

300

400

Number of entryDates per day of the week

0

100

200

300

400

Number of entryDates per day of the week

Analysis of stratified by Demography or Urban and Rural Communes

Looking at just the Urban data, specifically comparing the SMS messages sent to the date the patient was

seen, it can be seen that there is nothing inherently special about Monday or Friday in terms of patient

volume. On these graphs, it is clearly seen that patients seen on Sunday are reported on Monday, greatly

inflating Monday’s data. This is possibly reflective of the fact that the commune workers trained to send SMS’s

only work during the week, or that in general the communes take it easy on Sunday and catch up on Monday.

The midweek dip in patients is clearly seen in graphs showing the number of entryDates per day of the week.

0

50

100

150

200

250

Number of SMS messages sent per day of the week

(Urban)

0

50

100

150

200

250

Number of SMS messages sent per day of the week

(Urban)

0

50

100

150

200

Number of entryDates per day of the week (Urban)

0

50

100

150

200

Number of entryDates per day of the week (Urban)

Less descriptive than the Urban data, the Rural data none the less also reflects that patients seen on Sunday

are reported on Monday. Like the Urban data, the dip in patients in the middle of the week can be seen.

Unlike the Urban Data, these graphs show an additional busy day on Thursday which is reported on Friday.

This most likely accounts for the Friday spike seen on the whole data set graphs.

020406080

100120140

Number of SMS messages sent per day of the week

(Rural)

020406080

100120140

Number of SMS messages sent per day of the week

(Rural)

0

20

40

60

80

100

120

140

Number of entryDates per day of the week (Rural)

0

20

40

60

80

100

120

140

Number of entryDates per day of the week (Rural)

Analysis Stratified by Geographically by either mountainous or Red River Delta Communes.

The most non-conformist of the all the stratified data, the mountain region does not show any remarkable dip

in patient volume on Wednesdays. Tuesdays have the most SMS’s sent, but Mondays are the busiest. Patients

seen on Sunday are reported on Monday. Thursdays are also fairly busy and reported on Friday.

020406080

100120140160

Number of SMS messages sent per day of the week

(Mountains)

020406080

100120140160

Number of SMS messages sent per day of the week

(Mountains)

020406080

100120140160

Number of entryDates per day of the week

(Mountains)

020406080

100120140160

Number of entryDates per day of the week

(Mountains)

The Red River Delta strata shows no unique patterns. The reporting of weekend patients on Monday is clearly

visible.

Looking at all the strata, some generalities emerge: Mondays and Thursdays are the busiest days of the week.

Some of these patients are reported on Tuesday and Friday, Inflating the SMS volumes on those days

disproportionate to the patient volume. Patients seen on the weekend tend to be reported on Monday.

0

50

100

150

200

250

300

350

Number of SMS Sent per day of the week (Red River

Delta)

0

50

100

150

200

250

300

350

Number of SMS Sent per day of the week (Red River

Delta)

0

50

100

150

200

250

300

Number of entryDates per day of the week (Red River

Delta)

0

50

100

150

200

250

300

Number of entryDates per day of the week (Red River

Delta)

Analysis of Disease distribution by geography and demography – a look at the fraction of ILI and

Diarrhea found in Urban vs Rural and Mountains vs. Red River Delta.

Analysis by geography showed a statistically significant higher rate of ILI in the mountains (.76 of cases vs. .68

in the Red River Delta) P<0.01 (Z test for proportions). Conversely, diarrhea was significantly more prevalent in

the Red River Delta (.31 vs. .24 Mountains) by the same test and with the same probability.

Analysis by demography showed a statistically significant higher rate of ILI in urban communes (.76 of cases

vs. .66 in the rural) P<0.01 (Z test for proportions). Conversely, diarrhea was significantly more prevalent in the

rural areas (.34 vs. .24 urban) by the same test and with the same probability.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Diarrhea ILI

Case Proportion

Proportion of Disease by Geography

Mountains

Red River Delta

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Diarrhea ILI

Proportion of case

Proportion of Disease by Demography

Urban

Rural

Analysis of times of the day that SMS’s were sent – A metric not included in the original analysis of

the data, gives a valuable insight into the behaviors of patients and commune workers, raises many questions

and provides a clear avenue for further investigation.

Histograms of the various strata (urban, rural, mountains and red river delta) do not vary significantly from the

histogram of the whole data set showing a fairly normal bimodal distribution of the data.

Are the commune workers doing as we asked and reporting as patinets come in, making these reflective of the

0

50

100

150

200

250

300

350

Frequency

Sent Timestamp Histogram

0

20

40

60

80

100

120

Frequency

Rural Timestamp Histogram

0

50

100

150

200

250

Frequency

Urban Timestamp Histogram

0

50

100

150

200

Frequency

Mountain Timestamp Histogram

0

50

100

150

200

250

Frequency

Red River Delta Timestamp Histogram

patient volume over the day? or do patients come in easly in the morning and early afternoon and then

reported later? Followup investigation with the communes is needed to for the answers to this question.