a visual analytics system to study the spread of pandemicsrc/projects_files/pandemic...developed for...

2
A Visual Analytics System to Study the Spread of Pandemics VAST 2010 Mini Challenge 2 Ramkrishna Chakrabarty 1 Xiaolong (Luke) Zhang 2 The Pennsylvania State University, University Park, PA ABSTRACT A visual analytics system named PandemicVisualizer was developed for the VAST Mini Challenge 2 to analyze the pandemic data. It is designed to analyze and investigate the spread of a pandemic over space and time. Using this system, analysts can effortlessly examine how a pandemic spreads from one geographical area to another, in what time and also learn about how various demographic parameters like age, gender etc varies. It provides various visual tools to compare locations along multiple parameters, which will help in finding the starting point of the pandemic and examining how quickly different locations recovered from the same. KEYWORDS: Visual analytics, pandemic visualization INDEX TERMS: H.5.2 [Information Systems]: Information Interfaces and Presentation—User Interfaces 1 INTRODUCTION The IEEE VAST 2010 Symposium’s Mini Challenge 2 involves analyses of a world wide pandemic data that consists of hospital admittance and mortality records of eleven separate cities in different countries. The data is made up of the number of patients admitted and died on each day of a three month period and also includes demographic information like age, gender and symptom for each individual. The challenge consists of two tasks. The first asks the participants to analyze the pandemic data to characterize the spread of the disease. The second task is to compare the outbreak across the cities. The PandemicVisualizer visual analytics system was developed to solve the abovementioned tasks. To be an effective analyses tool, fast data access and interactivity along with an intuitive analytic process were made the primary design goals. The data preprocessor module of the system loads the entire dataset into a backend database. This module creates appropriate database tables to ensure fast access to the data and interactivity. The preprocessor scans all the raw data files provided and creates database tables for cumulative data and for each location. This makes sure that the database query time is minimal. When the system starts, it preloads a bulk of the global data required to create the timeline and location coordinates for the map. For simplicity, ten age groups are displayed in the system instead of displaying the number of patients of all ages. Consequently appropriate tables are created which store the number of patients belonging to each age group, gender, and syndrome for each location. PostgreSQL is used as the database in PandemicVisualizer. The preprocessor module is written in Python programming language. The front-end on the system is built using the Adobe Flex framework. Another key design decision was to develop a web-based solution to make it scalable and platform independent. The nature of the problem dictates that the solution should explore both spatial and temporal aspects of the disease spread. The system’s user interface is divided into three parts: an interactive map, a timeline and a charts panel. As shown in Figure 1., the map can be used to drill down to draw attention to a particular location by zooming in and highlights all the locations for which the data is provided. There are two timelines displayed below the map; number of patients admitted and number of patients who died. The view also has an interactive location compare tool. It can be used to dynamically generate charts that will shown the number of hospital admits and deaths over time for each location as shown in Figure 2. On the right the chart panel contains location-wise demographic data for each selected day on the timeline. The PandemicVisualizer system uses all these views interactively to help a user get useful insights from the data. The next section explains how these views and their interactions can be used to complete the tasks in this challenge. Figure 1: PandemicVisualizer showing the hospital admits timeline below the map with highlighted locations 2 ANALYTICAL PROCESS 2.1 Spread of the Pandemic The timeline at the bottom show the temporal spread of the disease. It can toggle between hospital admittance and deaths. On the other hand the spatial spread of the pandemic can be observed when the mouse pointer is hovered on the timeline. As the mouse pointer moves on the timeline, the highlighted cities on the map change color intensity. When the timeline is displaying the hospital admits, the circles on the map highlighting the affected locations change color from light grey to black. Detailed analyses of the timeline shows that the number of patients admitted raises in the first three weeks of May to around 250,000 – 300,000 patients per day. Similarly, the mortality rate (the red line) also rises during this same 3-week duration, peaking on May 18 to 72 deaths out of 1,000 patients admitted on that day. Figure 3 shows the timeline of the number of deaths. The number of deaths per day starts to increase around the beginning of May and peaks to 17,202 deaths on 24 May and the starts decreasing rapidly. Notice 1 email: [email protected] 2 email: [email protected]

Upload: others

Post on 28-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Visual Analytics System to Study the Spread of Pandemicsrc/Projects_files/Pandemic...developed for the VAST Mini Challenge 2 to analyze the pandemic data. It is designed to analyze

A Visual Analytics System to Study the Spread of Pandemics VAST 2010 Mini Challenge 2

Ramkrishna Chakrabarty1 Xiaolong (Luke) Zhang2

The Pennsylvania State University, University Park, PA

ABSTRACT A visual analytics system named PandemicVisualizer was developed for the VAST Mini Challenge 2 to analyze the pandemic data. It is designed to analyze and investigate the spread of a pandemic over space and time. Using this system, analysts can effortlessly examine how a pandemic spreads from one geographical area to another, in what time and also learn about how various demographic parameters like age, gender etc varies. It provides various visual tools to compare locations along multiple parameters, which will help in finding the starting point of the pandemic and examining how quickly different locations recovered from the same. KEYWORDS: Visual analytics, pandemic visualization INDEX TERMS: H.5.2 [Information Systems]: Information Interfaces and Presentation—User Interfaces 1 INTRODUCTION The IEEE VAST 2010 Symposium’s Mini Challenge 2 involves analyses of a world wide pandemic data that consists of hospital admittance and mortality records of eleven separate cities in different countries. The data is made up of the number of patients admitted and died on each day of a three month period and also includes demographic information like age, gender and symptom for each individual. The challenge consists of two tasks. The first asks the participants to analyze the pandemic data to characterize the spread of the disease. The second task is to compare the outbreak across the cities.

The PandemicVisualizer visual analytics system was developed to solve the abovementioned tasks. To be an effective analyses tool, fast data access and interactivity along with an intuitive analytic process were made the primary design goals. The data preprocessor module of the system loads the entire dataset into a backend database. This module creates appropriate database tables to ensure fast access to the data and interactivity. The preprocessor scans all the raw data files provided and creates database tables for cumulative data and for each location. This makes sure that the database query time is minimal. When the system starts, it preloads a bulk of the global data required to create the timeline and location coordinates for the map. For simplicity, ten age groups are displayed in the system instead of displaying the number of patients of all ages. Consequently appropriate tables are created which store the number of patients belonging to each age group, gender, and syndrome for each location. PostgreSQL is used as the database in PandemicVisualizer. The preprocessor module is written in Python programming language. The front-end on the system is built using the Adobe Flex framework.

Another key design decision was to develop a web-based

solution to make it scalable and platform independent. The nature of the problem dictates that the solution should explore both spatial and temporal aspects of the disease spread. The system’s user interface is divided into three parts: an interactive map, a timeline and a charts panel. As shown in Figure 1., the map can be used to drill down to draw attention to a particular location by zooming in and highlights all the locations for which the data is provided. There are two timelines displayed below the map; number of patients admitted and number of patients who died. The view also has an interactive location compare tool. It can be used to dynamically generate charts that will shown the number of hospital admits and deaths over time for each location as shown in Figure 2. On the right the chart panel contains location-wise demographic data for each selected day on the timeline.

The PandemicVisualizer system uses all these views interactively to help a user get useful insights from the data. The next section explains how these views and their interactions can be used to complete the tasks in this challenge.

Figure 1: PandemicVisualizer showing the hospital admits timeline

below the map with highlighted locations

2 ANALYTICAL PROCESS

2.1 Spread of the Pandemic The timeline at the bottom show the temporal spread of the disease. It can toggle between hospital admittance and deaths. On the other hand the spatial spread of the pandemic can be observed when the mouse pointer is hovered on the timeline. As the mouse pointer moves on the timeline, the highlighted cities on the map change color intensity. When the timeline is displaying the hospital admits, the circles on the map highlighting the affected locations change color from light grey to black. Detailed analyses of the timeline shows that the number of patients admitted raises in the first three weeks of May to around 250,000 – 300,000 patients per day. Similarly, the mortality rate (the red line) also rises during this same 3-week duration, peaking on May 18 to 72 deaths out of 1,000 patients admitted on that day. Figure 3 shows the timeline of the number of deaths. The number of deaths per day starts to increase around the beginning of May and peaks to 17,202 deaths on 24 May and the starts decreasing rapidly. Notice

1 email: [email protected] 2 email: [email protected]

Page 2: A Visual Analytics System to Study the Spread of Pandemicsrc/Projects_files/Pandemic...developed for the VAST Mini Challenge 2 to analyze the pandemic data. It is designed to analyze

that the recovery of the disease is marginally faster than its onset as the recovery slope is steeper.

2.2 Outbreak comparison across locations Two separate methods were utilized to compare the pandemic outbreak across locations. The location compare tool provides high-level comparison in between the locations with respect to the total number of patients hospitalized and the dead. We used this interactive tool to create line charts of all the locations. The hospital admissions among some locations do decrease a little bit during the last couple of week of June. Also, it is clear from the chart that the maximum hospital admissions have been in Karachi, Pakistan (red) and the minimum in Nonthaburi, Thailand (light blue). However, when we change it to show number of deaths (Figure 2) a couple of interesting anomalies emerge. Significantly, there seems to be no increase in fatalities in Nonthaburi, Thailand (sky-blue) and Mersin, Turkey (grey). In contrast, rest of the locations show a noticeable rise in the number of deaths starting from the first week of May and peaks around May 27, 2009. All the locations recover by the end of June as represented by the decrease in the number of patients who die in the pandemic.

Once, the high level insight is obtained using the location comparison tool above, PandemicVisualizer lets you drill down to details of each location. When the Daily Statistics view is enabled it displays three charts (Figure 3), one each for gender, age group and syndromes. These charts show the number of patients admitted or dead on any particular day belonging to these parameters. The number of syndromes is quite high and is not possible to display in a single chart, so the user may select up to five syndromes to compare at a time. Now, when the mouse pointer is hovered on the timeline two things happen which provide clear visual cues to help in understanding the outbreak. First, the circle representing the locations on the map change color and the three charts on the right show the number of patients belonging to each age group, gender and the selected syndromes. In Figure 3 (top image) the mouse pointer has highlighted the date May 2, 2009 on the timeline chart below the map. Notice, the location Karachi, Pakistan has turned slight red representing less than 100 deaths on that day. The charts on the right panel show gender, age group and syndrome distribution of the patients who died in each location on that day. The age group chart shows that maximum deaths occurred in the age group of 31 to 60 years. Now as we move the mouse pointer across the timeline notice that the number of deaths increase at other locations also starting with Beirut, Lebanon. Figure 3 shows the progress of the pandemic spatially and in time. In Figure 3 middle and bottom images, we notice the anomaly we detected before. In spite of proximity to two major pandemic centers (Aleppo, Syria and Beirut, Lebanon), Mersin, Turkey has insignificant deaths.

3 CONCLUSION AND FUTURE WORK The PandemicVisualizer system successfully analyzed the pandemic data provided in the challenge and found a couple anomalies in the data. The system clearly shows the spatial and

temporal spread of the disease interactively using various common graphical methods.

One noteworthy improvement in the system will be a better preprocessing of the syndrome data. This will eliminate duplicate mention of various syndromes and present a more streamlined list of syndromes. Another improvement would be to engineer an efficient method add new datasets using online data preprocessing. This would significantly enhance the effectiveness of PandemicVisualizer and also make it highly scalable.

ACKNOLEDGEMENTS We would like to thank Tulika Biswas for her inputs and suggestions in the location compare tool and its evaluation.

Figure 3: These three screenshots show the spread of the pandemic across the locations.

Figure 2: Number of deaths on each day for each location. There is no increase in fatalities in Nonthaburi, Thailand (sky blue) and

Mersin, Turkey (grey)