il sequenziamento dei genomi sardi al crs4 francesco cucca (university of sassari and inn-cnr)
DESCRIPTION
Francesco Cucca (University of Sassari and INN-CNR) at CRS4 presenting the sardinian genome sequencing program (24 march 2010).TRANSCRIPT
1
Il sequenziamento dei genomi sardi al CRS4
Francesco Cucca INN-CNR
• Humans and other living organisms all contain a digital project constituted by a linear sequence of different combinations of 4 small chemical compounds, named nucleotides, which together constitute their DNA.
• Particular combinations of nucleotides specify the key qualitative and quantitative instructions for the synthesis of essential structural and operative components of the cell formed by different combinations of 20 molecules named amino acids
• In turn amino acids are linked to each other to form more complex molecules named proteins.
U U U Phe U C U Ser U A U Tyr U G U CysU U C Phe U C C Ser U A C Tyr U G C CysU U A Leu U C A Ser U A A STOP U G A STOPU U G Leu U C G Ser U A G STOP U G G Trp
C U U Leu C C U Pro C A U His C G U ArgC U C Leu C C C Pro C A C His C G C ArgC U A Leu C C A Pro C A A Gln C G A ArgC U G Leu C C G Pro C A G Gln C G G Arg
A U U Ile A C U Thr A A U Asn A G U SerA U C Ile A C C Thr A A C Asn A G C SerA U A Ile A C A Thr A A A Lys A G A ArgA U G Met A C G Thr A A G Lys A G G Arg
G U U Val G C U Ala G A U Asp G G U GlyG U C Val G C C Ala G A C Asp G G C GlyG U A Val G C A Ala G A A Glu G G A GlyG U G Val G C G Ala G A G Glu G G G Gly
• While the basic composition of both DNA and protein building blocks and the translational system of one chemical language into the other is conserved, there is wide variation in the order of these block units in different organisms and individuals.
• This is because the DNA and deriving protein products are not a static entity. Instead, DNA is subjected to a variety of different types of heritable change known as mutation.
• Mutations often arise as copying errors during DNA replication. Although the fidelity of DNA replication is strikingly high, misincorporation occurs at a given frequency, known as mutation rate.
• Modern humans originated ~100,000 years ago from pre-modern humans and represent a relatively homogenous species which has experienced a dramatic expansion during its recent evolutionary history.
• Two unrelated human individuals on our planet are identical for about 99.9% and thus differ for about 0.1% of their DNA content.
• This means that there is approximately one change every 1000 nucleotides (our genome has an overall content of about two copies of 3.3 billion nucleotides) when comparing the DNA from two unrelated individuals.
In a complex trait such as MS, the disease state results from interactions between multiple genotypes and the environment. The influence of any individual causal allele tends to be modest and the relationship between the causal variant and the disease state is probabilistic.
This genetic variation has important medical consequences:
In simple mendelian traits, the relationship between the causal genetic variant (genotype) and the disease state is deterministic.
9
Quantitative trait Qualitative trait
10
R. A. Fisher, 1890-1962
theoretical framework
ASSOCIAZIONE PRIMARIA CON LA VARIANTE CAUSALE
ASSOCIAZIONE SECONDARIA DOVUTA A CONTIGUITA’
ASSOCIAZIONE SPURIA DOVUTA A SUBSTRUTTURA DI POPOLAZIONE
POSSIBILI SIGNIFICATI DI UN’ASSOCIAZIONE
Why a sequencing project?
12
The imperfect genome-wide search
All Gene Chip Arrays contain SNPs chosen based on linkage disequilibrium (LD) observed in HapMap populations, a catalogue of ~ 3 million SNPs genotyped in Europeans, Asians, and Africans
The imperfect genome-wide search
All Gene Chip Arrays contain SNPs chosen based on linkage disequilibrium (LD) observed in HapMap populations, a catalogue of ~ 3 million SNPs genotyped in Europeans, Asians, and Africans
Studying a subset of 500,000 or 1 million is limitative
The imperfect genome-wide search
All Gene Chip Arrays contain SNPs chosen based on linkage disequilibrium (LD) observed in HapMap populations, a catalogue of ~ 3 million SNPs genotyped in Europeans, Asians, and Africans
Studying a subset of 500,000 or 1 million is limitative
Tested variant
Causative variant
Power to detect disease associations at a locus inversely correlates with the r2 between typed(tested) and untyped (causative) SNPs
CROATIA UKRAINE
HUNGARY
POLAND
GEORGIA
SICILY
CALABRIA
TURKEY LEBANON GREECE
ALBANIA
NORTH-CENTRAL ITALY
CORSICA
ANDALUSIA
BASQUE COUNTRY
CATALONIA
SARDINIA
Why a sequencing project in Sardinia?
Why a sequencing project in Sardinia?
17
MOROCCO
ANDALUSIAN
SPANISH BASQUES
FRENCH
CZECH AND SLOVAKIAN
CENTRAL-NORTHERN ITALIAN
CALABRIAN
CROATIAN
GREEK
MACEDONIAN
POLISH
UKRAINIAN
GEORGIAN
TURKISH
LEBANESE
SYRIAN
SAAMI
MARI
UDMURT
DUTCH
HUNGARIAN
ALBANIAN
What samples to sequence in Sardinia?
• ProgeNIA study
• Case-Control studies
• Future work
19
ProgeNIA
6.148 volontari
Arzana Arzana
Ilbono Elini
Lanusei
ProgeNIA/SardiNIA project
6,148 individuals - aged 14-102 y.
95% are known to have all grandparents born in Sardinia
711 pedigrees up to 5 generations deep
Largest family: 625 phenotyped individuals
>34,000 relatives pairs
Pilia et al. PLoS Genet. 2006
> 150 quantitative traits Anthropometric Measurements Height, Weight, Hip, Waist, BMI
Blood Chemistry Components LDL, HDL, TG, Insulin, RBC, MCH, MCV, Bilirubin, hsCRP, MCP-1, IL-6,
etc.
Cardiovascular Traits HR, SBP, DBP, PP, PWV, IMT, QT, etc
Personality Facets Neuroticism, Extraversion, Openess, Agreeableness, Coscientiousness, etc.
New traits will be added soon (immunological traits).
Cytokines Adiponectin, Leptin, MCP-1, hsCRP, IL-6, V-CAM, AGE
Case-control samples
• The special case of autoimmune diseases
22
18
19 14
15 12
11
10
7
8 8 10
8 8
6 42
12
12
6
5 7
7 6 9 9
9.8
7
7 6 12
6 8
10 6
21 26
42
36
22
10 13
15 12
13
9
23
13
20
15 8
5
7
19
*Adapted from EURODIAB
17
47 39
55
119
135
50
81 65
55 6
140
61
39 42
21 50 83
62
35 55
56
120 153 93
112
86 83
76
31
187
126
186
68 29
10 Pugliatti et al (EBC), Eur J Neurol 2006
7
10
112
55
74
165
60
0 10 20 30 40 50 60 70
Pazienti Controlli 0 10 20 30 40 50 60 70
Pazienti Controlli 0 10 20 30 40 50 60 70
Pazienti Controlli
How many samples to sequence?
• Is it necessary to sequence all people analysed?
28
• Observed genotypes
• Inferred DNA stretches sharing along chromosome
• Inferred missing genotypes according to chromosome sharing
Burdick et al. Nat. Genet. 2006
Chen and Abecasis AJHG 2008
1) Identify Match Among Reference
Individuals in study sample . . A A . . . . . . . . A . . . . A . . . . . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes C G A G A T C T C C T T C T T C T G T G C C G A A A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G A A T C T C C C G A C C T C A T G G C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T A T A C C G A G A C T C T C C G A C C T C G T G C C G G A G C T C T T T T C T T C T G T G C
1) Identify Match Among Reference
Individuals in study sample . . A A . . . . . . . . A . . . . A . . . . . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes C G A G A T C T C C T T C T T C T G T G C C G A A A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G A A T C T C C C G A C C T C A T G G C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T A T A C C G A G A C T C T C C G A C C T C G T G C C G G A G C T C T T T T C T T C T G T G C
1) Identify Match Among Reference
Individuals in study sample . . A A . . . . . . . . A . . . . A . . . . . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes C G A G A T C T C C T T C T T C T G T G C C G A A A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G A A T C T C C C G A C C T C A T G G C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T A T A C C G A G A C T C T C C G A C C T C G T G C C G G A G C T C T T T T C T T C T G T G C
1) Identify Match Among Reference
Individuals in study sample . . A A . . . . . . . . A . . . . A . . . . . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes C G A G A T C T C C T T C T T C T G T G C C G A A A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G A A T C T C C C G A C C T C A T G G C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T A T A C C G A G A C T C T C C G A C C T C G T G C C G G A G C T C T T T T C T T C T G T G C
2) Phase Chromosome
Individuals in study sample . . A A . . . . . . . . A . . . . A . . . . . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes C G A G A T C T C C T T C T T C T G T G C C G A A A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G A A T C T C C C G A C C T C A T G G C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T A T A C C G A G A C T C T C C G A C C T C G T G C C G G A G C T C T T T T C T T C T G T G C
3) Impute Missing Genotypes
Individuals in study sample C G A A A T C T C C C G A C C T C A T G G C G G A G C T C T T T T C T T T T A T G C
Observed HapMap Chromosomes C G A G A T C T C C T T C T T C T G T G C C G A A A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G A A T C T C C C G A C C T C A T G G C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T A T A C C G A G A C T C T C C G A C C T C G T G C C G G A G C T C T T T T C T T C T G T G C
Recent updates
We used whole-genome sequences of 52 Europeans available from the 1,000 Genomes Project to infer ~6.6 million markers in individuals typed with the higher density chip…..
…. then with imputation method we inferred the 6.6 million markers to all individuals and performed a GWAS
This : Provides a fine mapping for previously discovered loci
May show new loci that were poorly tagged by the previous set of SNPs
GWAS finding
Mostly all of the loci detected by GWAS only explain a small fraction of the heritability
Smaller is the effect size, larger is the sample size required to maintain adequate power
Trait Heritability So far explained
HbF ~60% ~17%
Height ~80% ~4%
BMI ~40% ~1%
38
Shankar Balasubramanian David Klenerman
39
ProgeNIA Team Lanusei-Cagliari
Manuela Uda Serena Sanna Eleonora Porcu Ilenia Zara Carlo Sidore Maristella Steri Marco Masala Gianmauro Cuccuru Angelo Scuteri Marco Orrù Maria Grazia Pilia Danilo Fois Liana Ferreli Francesco Loi
Monica Lai Anna Cau Barbara Deiana Monica Balloi Maria Grazia Piras Gianluca Usala Antonella Mulas Andrea Maschio Fabio Busonero Sandra Lai Mariano Dei
Laura Crisponi Silvia Naitza Caterina Flore Simona Foddi
Giuseppe Pilia, Ideatore e Fondatore del Progetto ProgeNIA
Acknowledgements: Paolo Zanella Chris Jones Roman Tirler
Antonio Cao Giuseppe Pilia
David Schlessinger Goncalo Abecasis John Todd