mining and visualization of flow cytometry data angela chin university of houston research...

24
Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Upload: posy-lawrence

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Mining and Visualization of Flow Cytometry DataANGELA CHIN

UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES

JULY 3, 2013

1

Page 2: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Contents1. Introduction to Flow Cytometry2. The Problem3. Current Approaches & Results4. Future Work

2

Page 3: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Flow CytometryMEDICAL TECHNIQUE USED FOR CELL COUNTING AND CELL SORTING

3

Page 4: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

How it Works

Picture from: Abcam http://www.abcam.com/index.html?pageconfig=resource&rid=11446 4

Page 5: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Flow Cytometry ApplicationDetermine whether a person has b-cell lymphomaBased on the number of clusters that result from flow cytometry• Two clusters : cancer patient

• Three clusters : healthy individual

5

Page 6: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Example: Flow Cytometry Results

6

Cancer PatientHealthy Patient

Page 7: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Problems with Current MethodsThe process for determining if there are two or three clusters is manualDoctors’ time could be better spent on other tasks

7

Page 8: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

The ProblemCREATING AN AUTOMATED METHOD TO DETERMINING THE NUMBER OF CLUSTERS

8

Page 9: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Past ApproachesMany ways to determine number of clusters• Most need to know the number of clusters ahead of time

Most popular is k-means, but there are some problems• Need to give the algorithm the number of clusters beforehand

• Has difficulty when clusters are close, different sizes, etc.

9

Page 10: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Further Defining the ProblemWe want to be able to determine the number of clusters when:The distance between clusters is very smallThe ratio of cluster sizes is large (100:1 to 1000:1)

We decided to further constrain the problem such that we could determine:1 cluster vs 2 clusters when the size ratio was up to 1000:1

10

Page 11: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Current Approaches & Results

11

Page 12: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Two Approaches Approach #1: TransformationFind the center of the dataTake each point and find its angle from the horizontal line located at the center (new x-value) and distance from the center (new y-value)Use transformed data to determine number of clusters

Approach #2: Testing Normal FitProject 2D data onto line to create 1D dataApply normal distribution fitCompare the Bayesian Information Criterion (BIC) of the fit to a cut-off limitIf the BIC is above the limit, there are two clusters; otherwise, there is one

12

Page 13: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Approach #1: Transformation

13

Page 14: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Approach #1: Transformation

14

𝜋/2 3/2 2𝜋

Add an slide to show how the transformation takes place-- "transformation process"
Use unit circle example
Make the angles in pi instead of numbers
Use ratio 500:1 - 200:1
Page 15: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Approach #1: Transformation Process

15

𝜋/2 3/2 2𝜋

/2

Page 16: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Approach #1: Transformation

16

𝜋/2 3/2 2𝜋

Page 17: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Approach #2: Testing Normal Fit

17

Page 18: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Approach #2: Testing Normal Fit3 standard deviations apart, ratio 1:99

ONE CLUSTER BEST FITS TWO CLUSTER BEST FITS

18

Page 19: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Approach #2: Testing Normal Fit Comparing BIC of the one

cluster versus two clusters

All data was generated using 100000 points and the same standard deviations

The ratios between clusters and distance between two clusters (if applicable) was varied• Ratios: 199:1 to 63:1• Distance: 1.5 to 5 Standard

Deviations apart

19

Use log scale instead?
Page 20: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Approach #2: Testing Normal Fit

20

Comparing BIC of the one cluster versus two clusters

All data was generated using 100000 points and the same standard deviations

The ratios between clusters and distance between two clusters (if applicable) was varied• Ratios: 199:1 to 63:1• Distance: 1.5 to 5 Standard

Deviations apart

Page 21: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Future Work

21

Page 22: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Future WorkApproach #1:

Determine if there is a way to detect the second cluster in the transformation

Approach #2: Use real data to see if a cut-off can be determined

Overall: After figuring out how to distinguish one and two clusters, extend the method to two versus three clusters

22

Page 23: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

LimitationsAssume the data will have Gaussian distributionNumber of clusters limited to two or three

23

Page 24: Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

AcknowledgementsI would like to thank my research advisor, Dr. Stephen Huang, and Mitch Shih for their guidance on this project. I would also like to thank the University of Houston Computer Science Department and the National Science Foundation for providing me with the opportunity to participate in the REU.

24