exploratory data analysis of bahmni with r

23
EDA (Exploratory Data Analysis) of Bahmni (EMR) data Karrtik Iyer Mail: @karrtik Tweets @karrtikiyer YouTube playlist

Upload: karrtik-iyer

Post on 15-Apr-2017

112 views

Category:

Data & Analytics


6 download

TRANSCRIPT

Page 2: Exploratory Data Analysis of Bahmni with R

Purpose

Explore EMR data collected over a period of time to:

1. Derive insights2. Observe Trends 3. Establish probable correlations.4. Help community to get started to explore their EMR data.

Page 3: Exploratory Data Analysis of Bahmni with R

Objectives/Agenda

1. Look at patient trend across various regions2. Top 10 diagnosis reported3. Pick up top diagnosis to further analyze

a. Male/Female ratiob. Top regions/villagesc. Age distributiond. Year wise trende. Explore observations/results and chief complaints reported for these patients.

4. Insights from data and challenges5. Quick peek into other insights which can be derived from this EMR data.

Page 4: Exploratory Data Analysis of Bahmni with R

Pre-requisites

1. Basic knowledge of a. Bahmni/OpenMRS data model and concept dictionaryb. SQLc. R (RStudio IDE)

2. PC/MAC/Linux machine set up witha. MySQL Client to connect to the MYSQL server on which Bahmni anonymous DB is set up,

it could be either local or remote serverb. R and RStudio installed

Page 5: Exploratory Data Analysis of Bahmni with R

Why R?

1. Open source with great community support.2. Lot of inbuilt packages for descriptive and predictive analytics which can

be used out of box.a. Very good mix of packages for querying and plotting the data

3. Easy to learn and use

Page 6: Exploratory Data Analysis of Bahmni with R

Let's get going

All hands on exercises are performed on anonymous data!!!

Page 7: Exploratory Data Analysis of Bahmni with R

Part 1

Page 8: Exploratory Data Analysis of Bahmni with R

Fundamentals

1. Exploring tables and columns of our interests2. Using R/RStudio

a. Connect to MYSQL DBb. Load required R packages

Page 9: Exploratory Data Analysis of Bahmni with R

Patients across Regions

1. Number of patients reported across various cities/villages.2. Percentage of Male/Female Ratio3. Percentage of patients from each region in top 10 cities/villages

Page 10: Exploratory Data Analysis of Bahmni with R

Patient Across Regions

Page 11: Exploratory Data Analysis of Bahmni with R

Part 2

Page 12: Exploratory Data Analysis of Bahmni with R

Top 10 diagnosis

1. Explore distribution of various diagnoses reported across Male/Females2. Pick up top 10 diagnosis and look at the male/female ratio

Page 13: Exploratory Data Analysis of Bahmni with R

Top Diagnosis - Gastritis

Look at

1. Top 5 regionsa. With Male/Female distribution

2. Age distribution for Male/Female in the top 5 regions.a. Boxplotb. Histogram

3. Year wise trend

Page 14: Exploratory Data Analysis of Bahmni with R

Top 10 Diagnosis

Page 15: Exploratory Data Analysis of Bahmni with R

Gastritis - Deep Dive

Page 16: Exploratory Data Analysis of Bahmni with R

Part 3

Page 17: Exploratory Data Analysis of Bahmni with R

Explore results for top diagnosis - Gastritis

1. Gather all results for patients with gastritis.2. Look at important results for female to identify any trends

Page 18: Exploratory Data Analysis of Bahmni with R

Top Chronic Diagnosis - Diabetes

1. Gather all the lab results2. Explore HBA1C results.

a. Lack of consistent data

3. Analyze Hemoglobin levelsa. Outliersb. Flooring and Cappingc. Check for gender bias in 12 to 18 age group

Page 19: Exploratory Data Analysis of Bahmni with R

Exploring Results

Page 20: Exploratory Data Analysis of Bahmni with R

Part 4

Page 21: Exploratory Data Analysis of Bahmni with R

What’s next?

1. Better understanding of data2. Data cleaning and preparation

a. City/Village misspelledb. Outlier detection and replacement strategyc. Descriptive statistics, measures of central tendency, skewness, hypothesis testing.

3. Feature transformationa. Extract new features

i. Like Average sugar levels from fasting and postprandial blood sugar levelsii. Binning of variables such as age to infant, youth, adult, etc..

b. Natural Language processing (NLP)i. Chief complaints

4. Clustering of patients

Page 22: Exploratory Data Analysis of Bahmni with R

References and Links

1. R & RStudio: https://www.rstudio.com/products/rstudio/2. MySQL:

https://dev.mysql.com/doc/refman/5.6/en/osx-installation-pkg.html3. RBlogs: https://www.r-bloggers.com/4. Source Code: https://github.com/karrtikiyer-tw/bahmni-eda5. YouTube playlist

Page 23: Exploratory Data Analysis of Bahmni with R

Thank you!

Please leave your feedback and suggestions via comments.