machine learning and data mining: a case study with enterotypes

7
Machine Learning and Data Mining: A Case Study with Enterotypes Gabe Al-Ghalith Jimmy Reeve Chapter 28, data mining

Upload: wendi

Post on 23-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Gabe Al-Ghalith Jimmy Reeve Chapter 28, data mining. Machine Learning and Data Mining: A Case Study with Enterotypes. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Learning and Data Mining: A Case Study with Enterotypes

Machine Learning and Data Mining: A Case Study with Enterotypes

Gabe Al-GhalithJimmy Reeve

Chapter 28, data mining

Page 2: Machine Learning and Data Mining: A Case Study with Enterotypes

http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-047-computational-biology-genomes-networks-evolution-fall-2008/lecture-notes/MIT6_047f08_lec04_slide04.pdf

Page 3: Machine Learning and Data Mining: A Case Study with Enterotypes

Choosing Between Clustering and Classification

– Blood type calls for Classification

● Consensus on blood groups: A, B, AB, O

– Gut Bacteria calls for Clustering

● No consensus on types or even number of categories

– Clustering: summarize big data without a priori hypotheses

– How would you categorize people based on their:

– Blood-Type?

– Gut bacteria?

http://www.nytimes.com/2011/04/21/science/21gut.html?_r=2&scp=2&sq=bacteria&st=cse&

Page 4: Machine Learning and Data Mining: A Case Study with Enterotypes

Reasons to Consider Gut Bacteria

● Contribute to diseases and response to treatments

● Protective role, digestive role

● We have 100s of genes that involve handling these bacteria

● NPR.org.- “Gut bacteria might guide the workings of our minds”

● Characterizing these bacteria can help us tease out these associations:

● Personalized medicine and treatment

http://www.gutmicrobiotawatch.org/gut-microbiota-info/ http://www.npr.org/blogs/health/2013/11/18/244526773/gut-bacteria-might-guide-the-workings-of-our-minds

Page 5: Machine Learning and Data Mining: A Case Study with Enterotypes

3 Distinct “Enterotypes” Revealed from Clustering Approach

● Bacterial populations fell into 3 groups based on population composition

● These three “enterotypes” each contain one representative member of gut bacteria (chief/first principle component)

– Enterotype 1: Bacteroides, enriched in vitamins B5,B7,C

– Enterotype 2: Prevotella, enriched in vitamins B1, B9

– Enterotype 3: Blautia (Ruminococcus): H2/CO2 to acetate

●~ 1500 known sequences used as filter for raw metagenomic reads. These are the “features.” A “sample” is the population composition in a subject's gut.

●85 metagenomes from one source, 154 from another, 33 from a third. Same 3 classes emerged upon clustering each.

Enterotypes of the human gut microbiome. Nature 473: 174–180.

Page 6: Machine Learning and Data Mining: A Case Study with Enterotypes

Clustering Methodology Used in the Original Paper

● Karhunen–Loève transform (KLT) – PCA

● Dimensionality reduction technique

– Parallels with SQL3: “pivot” along axis with most variance, then final “roll up” based on distance metric

– Some metrics: Euclidian, Manhattan, Vector angle, Pearsons, Jensen-Shannon...

● Ade4 package in R uses “pam” algorithm (“K-medoid”)

Enterotypes of the human gut microbiome. Nature 473: 174–180.

Page 7: Machine Learning and Data Mining: A Case Study with Enterotypes

References

● Cluster in R (ade4 hooks this) http://cran.r-project.org/web/packages/cluster/cluster.pdf

● Ade4 primer on dimensionality reduction: cran.r-project.org/web/packages/ade4/index.html

● “The human gut microbiome: are we our enterotypes?” Microbial Biotechnology (2011) 4(5), 550–553

● “Bacteria Divide People Into 3 Types, Scientists Say.” New York Times, April 20th, 2011.

● Dan Knights. Seminar: “Diet and microbiome: Which came first, the chicken nuggets or the Eggerthella?” Sep 26, 2013