![Page 1: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/1.jpg)
Полиморфизм генома человека
Алма-Ата, 15.04.06
Василий Раменский, Институт молекулярной биологии им. Энгельгардта РАН , Москва
![Page 2: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/2.jpg)
People are different…
![Page 3: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/3.jpg)
…caccagctcctgtgGggggaggccctgct… …caccagctcctgtgGggggaggccctgct… …caccagctcctgtgGggggaggccctgct… …caccagctcctgtgCggggaggccctgct… …caccagctcctgtgCggggaggccctgct…
…and so are their genomes
![Page 4: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/4.jpg)
Определение
SNP (single nucleotide polymorphism): существование в популяции на одной и той же позиции геномной ДНК двух нуклеотидных вариантов с частотой более редкого варианта (аллеля) ≥1%
5’---------------A---------------3’ |||||||||||||||||||||||||||||||3’---------------T---------------5’
5’---------------G---------------3’ |||||||||||||||||||||||||||||||3’---------------C---------------5’
Na
Ng
Na+Ng = N, Na/N ≥0.01, Ng/N ≥0.01
![Page 5: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/5.jpg)
Комментарии к определению
•речь идет о сравнении последовательностей одного биол. вида
•слово «полиморфизм» не имеет в русском языке
множественного числа (Н.Ляпунова, личное сообщение)
•в обыденной речи под «полиморфизмом» чаще всего
подразумевают именно нуклеотид (т.е. используют его как
синоним слова «мутация»)
•определение подразумевает достоверное измерение частот в
популяции(-ях), что в текущей практике пока редкость
![Page 6: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/6.jpg)
Типы полиморфизма в геноме
* однонуклеотидный (SNP)
* короткая вставка/делеция
* микросателлитный повтор различной длины (VNTR,
variable number tandem repeat)
* вставка объекта
* множественный нуклеотидный (MNP)
![Page 7: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/7.jpg)
Некоторые свойства SNPs
• Comprise the ~90% of human genetic variation
• Occur with an average density ~1/600 bp
• Transition C↔T(G↔A) occurs at ~2/3 of all cases, three transversions C↔A (G↔T), C↔G(G↔C), T↔A(A↔T) in ~1/6 of all cases each
• Most of them (~85%) are common to all populations (with differing allele frequencies)
![Page 8: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/8.jpg)
Why SNPs are important?
• Convenient genetic markers
• Responsible for existence of various phenotypes,
with primary interest in disease ones
• Pharmacogenomics: individual response to drugs
• Clues to understand human evolution
![Page 9: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/9.jpg)
SNP в геноме человека
![Page 10: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/10.jpg)
Классификация SNP по положению в геноме
1. гены
1.1 UTR
1.2 экзоны (cSNP)
1.2.1 синонимичные(sSNP)
1.2.2 несинонимичные (nsSNP)
1.3 интроны
1.4 сайты сплайсинга
2. регуляторные участки генов (rSNP)
3. межгенные участки
![Page 11: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/11.jpg)
Synonymous vs. non-synonymous SNPs:
…CAC CAG CTC CTG TGG GGG GAG GCC CTG CT…
…CAC CAG CTC CTG TGC GGG GAG GCT CTG CT…
HGVBase ID: SNP000003023 G C Hypothetical SNP: C T
… H Q L L W G E A L …
… H Q L L C G E A L …
Example: Lysosomal alpha-glucosidase precursor (SwissProt P10253)
nsSNP Trp746Cys sSNP Ala749Ala
![Page 12: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/12.jpg)
Summary of Annotation on human Genome Build 33 dbSNP Build 124 :
FUNCTION CLASS CODE
SNP COUNT GENE COUNT
FUNCTIONAL
CLASSIFICATION
1 338787 26210 Locus region
3 39214 14342Allele synonymous to contig nucleotide
4 50772 15710Allele nonsynonymous to contig nucleotide
5 546965 17898 untranslated region
6 2925773 19332 intron
7 832 769 splice site
8 89554 18655 Allele is same as contig nucleotide
9 7111 1006 Coding: synonymy unknown
![Page 13: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/13.jpg)
Жизненный цикл SNP (по Miller&Kwok, 2001)
I. Появление нового аллельного варианта путем мутации
(~100 мутаций на индивидуум)
II. «Выживание» до момента появления гомозигот по этому
аллелю
III. Медленное увеличение частоты в популяции
IV. Фиксация нового аллеля (0 vs. 100%), превращение в
between-species difference
![Page 14: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/14.jpg)
Замечание
Описанный выше жизненный цикл SNP занимает ~0.3 млн лет. Предполагая, что разделение человека и шимпанзе
произошло ~5 млн лет назад, а выход H.sapiens из Африки и
разделение различных популяций ~0.1-0.2 млн лет назад,
понятно отсутствие (а) одинаковых SNPs у человека и других
видов, (б) «private» SNP, т.е. локализованных в пределах
одной человеческой популяции
![Page 15: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/15.jpg)
Why polymorphisms are maintained in the population?
• Selectionists: because heterozygotes have higher fitness
• Neutralists: because all observed polymoprhisms are selectively neutral
- - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - Reality: is always somewhat more complicated
![Page 16: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/16.jpg)
Why SNPs are important?
• Convenient genetic markers
• Responsible for existence of various phenotypes,
with primary interest in disease ones
• Pharmacogenomics: individual response to drugs
• Clues to understand human evolution
![Page 17: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/17.jpg)
nsSNPs vs. disease mutations
Disease mutations are rare (<<1%) and usually cause monogenic diseases (e.g., cystic fibrosis)
nsSNPs are frequent (>1%) and can modify risks of major common (multigenic, complex) diseases (e.g., cancer, cardiovascular disease, mental illness, autoimmune states, diabetes)
In some cases, however, it is difficult to make a distinction
![Page 18: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/18.jpg)
Some common nsSNPs are known to affect critical structure features
Frequency of the haemochromatosis allelic variant of HLA-H protein Cys260Tyr (with destroyed disulphide
bond) is up to 6% in Northern Europe
![Page 19: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/19.jpg)
Application area for prediction methods
Genetics of complex diseases Analysis of human birth defects Genetics of rare developmental phenotypes (analysis of
de novo mutations that cannot be mapped by genetic techniques)
Genetics of model organisms (identification of genes involved in diverse processes by mutagenesis screens)
Genomics and evolutionary genetics (e.g., quantifying selective pressure)
![Page 20: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/20.jpg)
Identifying SNPs responsible for complex diseases: general strategies
whole genome scan – hypothesis free approach; extraordinary number of candidate SNPs
candidate gene studies – requires a priori models; nevertheless, large numbers of candidate SNPs must be tested
![Page 21: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/21.jpg)
Identifying SNPs responsible for complex diseases: application
1. A SNP with established association need not be functional; therefore, in silico expertise is required for selection of potentially functional SNPs
2. Detection of enrichment of rare potentially functional alleles in the disease population (plasma levels of HDL-cholesterol, hypertension, colorectal cancer)
![Page 22: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/22.jpg)
Methods for prediction of effect of nsSNPs
* Sequence-based methods: analysis of multiple alignment with homologs Ng-Henikoff [2002]
* Structure-based methods: analysis of various structural parameters Wang, Moult [2001]; Chasman, Adams [2001]
* Combined methods: sequence and structure analysis Sunyaev,Ramensky,Bork [2000, 2001, 2002]
![Page 23: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/23.jpg)
PolyPhen: prediction of amino acid substitution effect on protein function
Prediction: benign (neutral), damaging (deleterious)
![Page 24: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/24.jpg)
Data sources:
1. Sequence annotation of the query protein2. PSIC profile matrix values derived from multiple
alignment with homologous proteins3. Structural parameters and contacts of query protein
structure or its >50% homolog
PolyPhen: prediction of amino acid substitution effect on protein function
Prediction: benign (neutral), damaging (deleterious)
![Page 25: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/25.jpg)
I. Sequence annotation
Hereditary hemochromatosis protein precursor (HLA-H, Q30201)
Features checked:* bond: DISULFID, THIOLEST, THIOETH
* site: BINDING, ACT_SITE, LIPID, METAL, SITE, MOD_RES, SE_CYS
* region: TRANSMEM, SIGNAL, PROPEP
![Page 26: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/26.jpg)
II. PSIC: profile analysis of homologous sequences
1. Align with homologous proteins with seq. ide. 30..94%
![Page 27: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/27.jpg)
II. PSIC: profile analysis of homologous sequences
2. Calculate the profile matrix with PSIC algorithm
Profile matrix: Sa,j = ln[ pa,j / qa ], a = {1,..20}, j = {1,..N}, N = alignment length
SAsn,4 SCys,4
![Page 28: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/28.jpg)
II. PSIC: profile analysis of homologous sequences
3. Analyse difference between profile scores for two a.a. variants:
SAsn,4 SCys,4
AsnCys: = | SAsn,4 – SCys,4 | = 1.591
![Page 29: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/29.jpg)
III. 3D structure analysis1. Residues that are in spatial contact with a
ligand or other “critical” residues
Zen 999
residues in 5Å contact with Zen 999
Bos Taurus trypsin [PDB ID :1ql7]
![Page 30: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/30.jpg)
III. 3D structure analysis2. Residues that form the hydrophobic core of
the protein (buried residues)
Bos Taurus trypsin [PDB ID :1ql7]
Surface residues
Buried residues
![Page 31: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/31.jpg)
Structural parameters and contacts
Secondary structure Phi-psi dihedral angles Solvent accessible surface area, normed s.a.s.a Change in accessible surface propensity Change in residue side chain volume Contacts with heteroatoms Interchain contacts Contacts with functional sites (BINDING,
ACT_SITE, LIPID, and METAL) Region of the phi-psi map (Ramachandran map) Normalised B-factor (temperature factor)
![Page 32: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/32.jpg)
RULES (connected with logical AND) PREDICTION
PSIC score difference : Substitution site properties: Substitution type properties:
arbitrary annotated as a functional* or bond formation** site arbitrary probably damaging
not considered in a region annotated or predicted as transmembrane
PHAT matrix difference resulting from substitution is negative possibly damaging
0.5 arbitrary arbitrary benign
>1.0atoms are closer than 3.0Å to atoms of a ligand or residue annotated as BINDING, ACT_SITE, LIPID, METAL
arbitrary probably damaging
0.5<1.5
normed accessibility ACC15%
absolute change of accessible surface propensity is 0.75 orabsolute change of side chain volume is 60
possibly damaging
normed accessibility ACC5%
absolute change of accessible surface propensity is 1.0 or absolute change of side chain volume is 80
probably damaging
1.5<2.0 arbitrary arbitrary possibly damaging
>2.0 arbitrary arbitrary probably damaging
![Page 33: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/33.jpg)
all dam unknown dam/(dam+ben)
–––––––––––––––––––––––––––––––––––––––––––––Disease mutationsStrict set 444 366 3 82.9%Total 2,782 2,047 70 75.4%
Between species substitutionsTotal 671 58 5 8.7%
Validation: control sets
![Page 34: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/34.jpg)
Validation: case studies
• APEX1 protein: 24 out of 26 substitutions predicted correctly (Xi et al.)
• Plasminogen activator inhibitor-2: 18 out of 20 (Di Guisto et al.)
• 3 HapMap populations and 10 primate species: analysis of ~27,000 nsSNPs with frequencies (Victoria Carlton, AFFYMETRIX, private communication)
![Page 35: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/35.jpg)
Validation: allele frequency
![Page 36: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/36.jpg)
Validation: nsSNPs vs. human-mouse interspecies variation
![Page 37: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/37.jpg)
PolyPhen predictions for dbSNP b.121All: 9,502 unknown27,991 benign...............67.6% 7,905 possibly damaging....19.1% 5,521 probably damaging....13.3%50,919 total (44,005 unique rs’s)
With structure: 42 unknown 2,142 benign...............57.1% 531 possibly damaging....14.2% 1,076 probably damaging....28.7% 3,791 total (,167 uniqe rs’s)
[ Ivan Adzhubei, 2004 ]
![Page 38: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/38.jpg)
PolyPhen predictions for dbSNP b.121All: Filtered: 5 seq. in multiple alignment16,813 benign...............64.2% 5,195 possibly damaging....19.8% 4,168 probably damaging....15.9%26,176 total (21,677 unique rs’s)
With structure:Filtered: 5 seq. in multiple alignment2,021 benign...............56.6% 499 possibly damaging....14.0%1,050 probably damaging....29.4%3,570 total (2,983 unique rs’s)
[ Ivan Adzhubei, 2004 ]
![Page 39: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/39.jpg)
Hydrophobic core stability parameters are the best predictors
Ramensky et al., Nucleic Acids Res. (2002) 30:3894-90
![Page 40: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/40.jpg)
PolyPhen http://www.bork.embl.de/PolyPhen
PolyPhen input :
Protein identifier OR sequence
Substitution position
Substitution type
![Page 42: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/42.jpg)
PolyPhen: nsSNPs data collection
![Page 43: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/43.jpg)
DAMAGING nsSNPs
Transphyretin
(PDB: 1tyr, SNP000012365)
Thr118 Asn occurs at the ligand (REA) binding site
Thr 118
REA 130
![Page 44: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/44.jpg)
DAMAGING nsSNPs
Trypsin
(PDB: 1trn, SNP000012965)
Ser142Phe results in the strong side chain volume change at a buried position
Ser 142
![Page 45: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/45.jpg)
Damaging nsSNPs
• We estimate that ~20% of non-synonymous cSNPs from databases are damaging
• Average allele frequency of non-synonymous cSNPs predicted to be damaging is twice lower than for benign non-synonymous cSNPs
• We propose to use these predictions for prioritisation of candidates for association studies
![Page 46: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/46.jpg)
Development directions
• Better multiple alignment pipeline• Compensated nsSNPs• Non-globular structural regions• Non-coding SNPs
![Page 47: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/47.jpg)
An example of compensated pathogenic deviation
![Page 48: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/48.jpg)
Polyphenism: the ability of a single genome to produce two or more alternative morphologies within a single population in response to an environmental cue (such as temperature, photoperiod, or nutrition). [Dr. Ehab Abouheif, McGill University, Montréal Québec]
The seasonal morphs of the buckeye butterfly, Precis coenia (Nymphalidae). The ventral surfaces are shown. The Summer morph ("linea") is on the left; the Fall morph ("rosa") is on the right. [Scott F.Gilbert, A Companion to Developmental Biology. Chapter 22, Seasonal Polyphenism in Butterfly Wings]
![Page 49: Полиморфизм генома человека Алма-Ата, 15.04.06](https://reader035.vdocuments.us/reader035/viewer/2022062315/56816050550346895dcf7af0/html5/thumbnails/49.jpg)
People
Shamil Sunyaev(1), Vasily Ramensky(2), Steffen Schmidt(1), Ivan Adzhubei(1)
(1) Division of Genetics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, USA) (2) Engelhardt Institute of Molecular Biology Moscow Russia)
Peer Bork, Yan P. Yuan (European Molecular Biology Laboratory, Heidelberg, Germany)