machine learning and data mining: a case study with enterotypes
DESCRIPTION
Gabe Al-Ghalith Jimmy Reeve Chapter 28, data mining. Machine Learning and Data Mining: A Case Study with Enterotypes. - PowerPoint PPT PresentationTRANSCRIPT
Machine Learning and Data Mining: A Case Study with Enterotypes
Gabe Al-GhalithJimmy Reeve
Chapter 28, data mining
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-047-computational-biology-genomes-networks-evolution-fall-2008/lecture-notes/MIT6_047f08_lec04_slide04.pdf
Choosing Between Clustering and Classification
– Blood type calls for Classification
● Consensus on blood groups: A, B, AB, O
– Gut Bacteria calls for Clustering
● No consensus on types or even number of categories
– Clustering: summarize big data without a priori hypotheses
– How would you categorize people based on their:
– Blood-Type?
– Gut bacteria?
http://www.nytimes.com/2011/04/21/science/21gut.html?_r=2&scp=2&sq=bacteria&st=cse&
Reasons to Consider Gut Bacteria
● Contribute to diseases and response to treatments
● Protective role, digestive role
● We have 100s of genes that involve handling these bacteria
● NPR.org.- “Gut bacteria might guide the workings of our minds”
● Characterizing these bacteria can help us tease out these associations:
● Personalized medicine and treatment
http://www.gutmicrobiotawatch.org/gut-microbiota-info/ http://www.npr.org/blogs/health/2013/11/18/244526773/gut-bacteria-might-guide-the-workings-of-our-minds
3 Distinct “Enterotypes” Revealed from Clustering Approach
● Bacterial populations fell into 3 groups based on population composition
● These three “enterotypes” each contain one representative member of gut bacteria (chief/first principle component)
– Enterotype 1: Bacteroides, enriched in vitamins B5,B7,C
– Enterotype 2: Prevotella, enriched in vitamins B1, B9
– Enterotype 3: Blautia (Ruminococcus): H2/CO2 to acetate
●~ 1500 known sequences used as filter for raw metagenomic reads. These are the “features.” A “sample” is the population composition in a subject's gut.
●85 metagenomes from one source, 154 from another, 33 from a third. Same 3 classes emerged upon clustering each.
Enterotypes of the human gut microbiome. Nature 473: 174–180.
Clustering Methodology Used in the Original Paper
● Karhunen–Loève transform (KLT) – PCA
● Dimensionality reduction technique
– Parallels with SQL3: “pivot” along axis with most variance, then final “roll up” based on distance metric
– Some metrics: Euclidian, Manhattan, Vector angle, Pearsons, Jensen-Shannon...
● Ade4 package in R uses “pam” algorithm (“K-medoid”)
Enterotypes of the human gut microbiome. Nature 473: 174–180.
References
● Cluster in R (ade4 hooks this) http://cran.r-project.org/web/packages/cluster/cluster.pdf
● Ade4 primer on dimensionality reduction: cran.r-project.org/web/packages/ade4/index.html
● “The human gut microbiome: are we our enterotypes?” Microbial Biotechnology (2011) 4(5), 550–553
● “Bacteria Divide People Into 3 Types, Scientists Say.” New York Times, April 20th, 2011.
● Dan Knights. Seminar: “Diet and microbiome: Which came first, the chicken nuggets or the Eggerthella?” Sep 26, 2013