biomedical master introduction to genome-wide association studies metabolic diseases (b. thorens)...
Post on 20-Dec-2015
216 views
TRANSCRIPT
Biomedical Master
Introduction to genome-wide association studies
Metabolic diseases (B. Thorens)
Biomedical Master: Metabolic diseases Lausanne, November 8, 2010
Sven Bergmann
University of Lausanne &
Swiss Institute of Bioinformatics
http://serverdgm.unil.ch/bergmann
A Systems Biology approach
Large (genomic) systems• many uncharacterized
elements
• relationships unknown
• computational analysis should: improve annotation reveal relations reduce complexity
Small systems• elements well-known
• many relationships established
• quantitative modeling of
systems properties like: Dynamics Robustness Logics
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
Genetic variation in SNPs (Single Nucleotide Polymorphisms)
6’18
9 in
divi
dual
s
Phenotypes
159 measurement
144 questions
Genotypes
500.000 SNPs
CoLaus = Cohort Lausanne
Collaboration with:Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)
Analysis of Genotypes only
Principle Component Analysis reveals SNP-vectors explaining largest variation in the data
PC-Analysis of genotypic profile
• Is surprisingly accurate!
• Is useful for forensic purposes or for
individuals interested in their ancestry
• Is useful for population stratification in
Genome-wide Association studies
0
0.2
0.4
0.6
0.8
1
1.2
-6 -4 -2 0 2 4 6
What is association?chromosomeSNPs trait variant
Genetic variation yields phenotypic variation
Population with ‘ ’ allele Population with ‘ ’ allele
Distributions of “trait”
Regression formalism
(monotonic)transformation
phenotype(response variable)of individual i
effect size(regression coefficient)
coded genotype(feature) of individual i
p(β=0)error(residual)
Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)
Whole Genome AssociationCurrent microarrays probe ~1M SNPs!
Standard approach: Evaluate significance for association of each SNP independently:
sig
nif
ican
ce
Whole Genome Associationsi
gn
ific
ance
Manhattan plot
ob
serv
edsi
gn
ific
ance
Expected significance
Quantile-quantile plot
Chromosome & position
GWA screens include large number of statistical tests!• Huge burden of correcting for multiple testing!• Can detect only highly significant associations (p < α / #(tests) ~ 10-7)
Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the
calcium-sensing receptor (CASR) gene
Karen Kapur, Toby Johnson, Noam D. Beckmann, Joban Sehmi, Toshiko Tanaka, Zoltán Kutalik, Unnur Styrkarsdottir, Weihua Zhang, Diana Marek, Daniel F. Gudbjartsson, Yuri Milaneschi, Hilma Holm, Angelo DiIorio, Dawn Waterworth, Andrew Singleton, Unnur Steina Bjornsdottir, Gunnar Sigurdsson, Dena Hernandez, Ranil DeSilva, Paul Elliott, Gudmundur Eyjolfsson, Jack M Guralnik, James Scott, Unnur Thorsteinsdotti, Stefania Bandinelli, John Chambers, Kari Stefansson, Gérard Waeber, Luigi Ferrucci, Jaspal S Kooner, Vincent Mooser, Peter Vollenweider, Jacques S. Beckmann, Murielle Bochud, Sven Bergmann
Current insights from GWAS:
• Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations
• Many of these associations have been replicated in independent studies
Current insights from GWAS:
• Each locus explains but a tiny (<1%) fraction of the phenotypic variance
• All significant loci together explain only a small (<10%) of the variance
The “Missing variance” (Non-)Problem
Why should a simplistic (additive) model using incomplete or approximate features possibly explain anything close to the genetic variance of a complex trait?
… and it doesn’t have to as long as Genome-wide Association Studies are meant to as an undirected approach to elucidate new candidate loci that impact the trait!
1. Improve measurements:- measure more variants (e.g. by UHS)- measure other variants (e.g. CNVs)- measure “molecular phenotypes”
2. Improve models:- proper integration of uncertainties- include interactions- multi-layer models
How could our models become more predictive?
Towards a layered Systems Model
We need intermediate (molecular) phenotypes to better understand organismal phenotypes
Network Approaches for Integrative Association Analysis
Using knowledge on physical gene-interactions or pathways to prioritize the search for functional interactions
Transcription Modules reduce Complexity
SB, J Ihmels & N Barkai Physical Review E (2003)
http://maya.unil.ch:7575/ExpressionView
• Analysis of genome-wide SNP data reveals
that population structure mirrors geography
• Genome-wide association studies elucidate
candidate loci for a multitude of traits, but
have little predictive power so far
• Future improvement will require– better genotyping (CGH, UHS, …) – New analysis approaches (interactions,
networks, data integration)
Take-home Messages: