biomedical master introduction to genome-wide association studies metabolic diseases (b. thorens)...

Biomedical Master

Introduction to genome-wide association studies

Metabolic diseases (B. Thorens)

Biomedical Master: Metabolic diseases Lausanne, November 8, 2010

Sven Bergmann

University of Lausanne &

Swiss Institute of Bioinformatics

http://serverdgm.unil.ch/bergmann

A Systems Biology approach

Large (genomic) systems• many uncharacterized

elements

• relationships unknown

• computational analysis should: improve annotation reveal relations reduce complexity

Small systems• elements well-known

• many relationships established

• quantitative modeling of

systems properties like: Dynamics Robustness Logics

Overview

• Population stratification

• Our whole genome associations

• New Methods and Approaches

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…


ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…


Genetic variation in SNPs (Single Nucleotide Polymorphisms)

6’18

9 in

divi

dual

s

Phenotypes

159 measurement

144 questions

Genotypes

500.000 SNPs

CoLaus = Cohort Lausanne

Collaboration with:Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)

Analysis of Genotypes only

Principle Component Analysis reveals SNP-vectors explaining largest variation in the data

Ethnic groups cluster according to geographic distances

PC1 PC1

PC

2P

C2

PCA of POPRES cohort

Predicting location according to SNP-profile ...

… is pretty accurate!

The Swiss segregate according to language

PC-Analysis of genotypic profile

• Is surprisingly accurate!

• Is useful for forensic purposes or for

individuals interested in their ancestry

• Is useful for population stratification in

Genome-wide Association studies

Phenotypic variation:

0

0.2

0.4

0.6

0.8

1

1.2

-6 -4 -2 0 2 4 6

What is association?chromosomeSNPs trait variant

Genetic variation yields phenotypic variation

Population with ‘ ’ allele Population with ‘ ’ allele

Distributions of “trait”

Association using regression

genotype Coded genotype

phen

otyp

e

Regression formalism

(monotonic)transformation

phenotype(response variable)of individual i

effect size(regression coefficient)

coded genotype(feature) of individual i

p(β=0)error(residual)

Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)

Whole Genome Association

http://www.biotech.umb.edu/Affymetrix.htm

Whole Genome AssociationCurrent microarrays probe ~1M SNPs!

Standard approach: Evaluate significance for association of each SNP independently:

sig

nif

ican

ce

Whole Genome Associationsi

gn

ific

ance

Manhattan plot

ob

serv

edsi

gn

ific

ance

Expected significance

Quantile-quantile plot

Chromosome & position

GWA screens include large number of statistical tests!• Huge burden of correcting for multiple testing!• Can detect only highly significant associations (p < α / #(tests) ~ 10-7)

Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the

calcium-sensing receptor (CASR) gene

Karen Kapur, Toby Johnson, Noam D. Beckmann, Joban Sehmi, Toshiko Tanaka, Zoltán Kutalik, Unnur Styrkarsdottir, Weihua Zhang, Diana Marek, Daniel F. Gudbjartsson, Yuri Milaneschi, Hilma Holm, Angelo DiIorio, Dawn Waterworth, Andrew Singleton, Unnur Steina Bjornsdottir, Gunnar Sigurdsson, Dena Hernandez, Ranil DeSilva, Paul Elliott, Gudmundur Eyjolfsson, Jack M Guralnik, James Scott, Unnur Thorsteinsdotti, Stefania Bandinelli, John Chambers, Kari Stefansson, Gérard Waeber, Luigi Ferrucci, Jaspal S Kooner, Vincent Mooser, Peter Vollenweider, Jacques S. Beckmann, Murielle Bochud, Sven Bergmann

Current insights from GWAS:

• Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations

• Many of these associations have been replicated in independent studies

Current insights from GWAS:

• Each locus explains but a tiny (<1%) fraction of the phenotypic variance

• All significant loci together explain only a small (<10%) of the variance

The “Missing variance” (Non-)Problem

Why should a simplistic (additive) model using incomplete or approximate features possibly explain anything close to the genetic variance of a complex trait?

… and it doesn’t have to as long as Genome-wide Association Studies are meant to as an undirected approach to elucidate new candidate loci that impact the trait!

1. Improve measurements:- measure more variants (e.g. by UHS)- measure other variants (e.g. CNVs)- measure “molecular phenotypes”

2. Improve models:- proper integration of uncertainties- include interactions- multi-layer models

How could our models become more predictive?

Towards a layered Systems Model

We need intermediate (molecular) phenotypes to better understand organismal phenotypes

Network Approaches for Integrative Association Analysis

Using knowledge on physical gene-interactions or pathways to prioritize the search for functional interactions

Transcription Modules reduce Complexity

SB, J Ihmels & N Barkai Physical Review E (2003)

http://maya.unil.ch:7575/ExpressionView

http://maya.unil.ch/

Association of (average) module expression is often stronger than for any

of its constituent genes

• Analysis of genome-wide SNP data reveals

that population structure mirrors geography

• Genome-wide association studies elucidate

candidate loci for a multitude of traits, but

have little predictive power so far

• Future improvement will require– better genotyping (CGH, UHS, …) – New analysis approaches (interactions,

networks, data integration)

Take-home Messages:

biomedical master introduction to genome-wide association studies metabolic diseases (b. thorens)...

Documents

significance slide

effect slide

data slide

chbergmann slide

language slide

genome association slide

pca of popres cohort

analysis of genotypes