methods in genome wide association studies. norú moreno cs374:: algorithms in biology professor:...
TRANSCRIPT
Methods in genome wide Methods in genome wide association studies.association studies.Norú MorenoNorú Moreno
CS374::Algorithms in BiologyProfessor: Serafim Batzoglou
AgendaAgendaGWA PolymorphismsHap Map ProjectGenotyping chip
Integrating CNVs and SNPs
Imputation
Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays
Genome-wide Association Genome-wide Association Study (GWA study or Study (GWA study or GWAS)GWAS)•Completion of the Human Genome Project in 2003 •Examination of genetic variation across a given genome.• Objective: Identify genetic associations with observable traits
GWASGWAS
•Scan SNPs across many individuals to associate alleles with a particular disease
•Use a detected association to detect, treat and prevent the disease
•Pharmacogenomics.
PolymorphismsPolymorphismsA specific sequence variation that some
individuals possess
Some variations are common, others are rare
Examples:
◦ Blood types◦ Height◦ Skin Color◦ Etc…
Types of polymorphismsTypes of polymorphisms1. Copy Number Variation (CNV)
Segment of DNA that are found in different numbers of copies among individuals
Substantial regions, not single nucleotidesA B C
A C
A B CB B
Types of polymorphismsTypes of polymorphisms2. Single Nucleotide
Polymorphism (SNP)
)Murray 2007(
HapMapHapMapTwo unrelated people share about
99.5% of their DNA sequence.HapMap focuses only on common
SNPs, : 1% of the population
269 individuals, ~4M SNPs
Genotyped the individuals for these SNPs, and published the results
Genotyping chipGenotyping chip
ACTGGGCTAATCGATCGACTAGCTAGCTAGTCTCGATCAAT
AC
TG
GG
CTA
A
TC
GA
TC
GA
CTA
GC
TA
GC
TA
GT
CTC
GA
TC
AA
T
Probes
Genotyping chipGenotyping chip
(Liu 2007) (Affymetrix)
Genotyping chipGenotyping chip
(Affymetrix)
Genotyping chipGenotyping chip
A
B BB(0)
AB(0.5)
AA(1)
Genotyping chipGenotyping chipAffymetrix 100k chip set
◦Entire genome with 100 000 SNPs (low density).
Affymetrix 500k chip (SNP array 5.0) ◦Entire genome with 500 000 SNPs
(high density) Affymetrix 1M chip (SNP array 6.0)
◦Entire genome with 1 000 000 SNPs (very high density)
Integrated genotype Integrated genotype calling and association calling and association analysis of SNPs, common analysis of SNPs, common copy number copy number polymorphisms and rare polymorphisms and rare CNVs (Birdsuite)CNVs (Birdsuite) Korn, et al. Korn, et al.
BirdsuiteBirdsuiteTake in count CNVs and SNPs :: Raw
data from genotyping chip as input.
Output: integrated CNVs and SNPS genotype per locus
CNVs and SNPs coexist.
Both common and rare to understand the role of genetic variation in disease.
BirdsuiteBirdsuite
SNPs(AA, AB, CC)
CNPs
New Genotype
A-null
AAAB
BBBB
Birdsuite – 4 Birdsuite – 4 StagesStagesCanary – ‘Genotypes’ common copy-
number polymorphisms (CNPs) Birdseed - Genotypes SNPs using the
classical AA, AB, and BB genotypes.Birdseye - Identify rare CNVs via
HMMsFawkes - Integrates CNV information
to produce mutually consistent SNP genotypes (i.e. including genotypes such as A-null and AAB)
BirdsuiteBirdsuite - - CanaryCanaryDetermines the copy number of
each individual at each predefined CNP locus.
CNP = Copy number polymorphismCNV>1% frequency in population
Locus Number of copies
A 1
B 3
C 1
A B CB B
CanaryCanary
(Korn, p.1255)
Birdsuite - Birdsuite - BirdseedBirdseedWe expect only AA, AB or BB.From canary only CNPs with 2
No fewer or extra copies.
BB
AA
AB
(Korn, p.1257)
Birdsuite - Birdsuite - BirdseyeBirdseyeUsing Canary and Birdseed:
◦Identify rare and de novo CNVs◦Small number of real CNVs at
unknown sites.Search consistent evidence for
copy number variation across multiple neighboring probes.
Implement an HMM-based algorithm to find strong, consistent evidence for altered copy number states
Birdsuite - Birdsuite - BirdseyeBirdseyeHMM to find regions of variable
copy number in a sample.Hidden state: The true copy
number of the individual’s genome.
Observed states: The normalized intensity measurements of each probe on the array.
Birdsuite - Birdsuite - FawkesFawkesMerge all the results.
Show the CNVs within each SNP.
Utilize the imputed locations (in A/B intensity space) of copy-variable clusters.
Assign an allele-specific copy number genotype at each SNP.
(e.g. AAB, ABBB, A or B)
FawkesFawkes
(Korn, p. 1254,1257)
(Affymetrix website screenshot)
ImputationImputationDealing with missing data points by
filling in values.In SNPs:T A G G T ? T G C C T A G C G TWhy?- Cost-saving
- Avoid re-genotyping- Keep effective sample size- SNP comparisons between existing
platforms.
ImputationImputationHigh rate of occurrence.
◦‘Direct’ imputation.
T A G G T ? T G C C T A G C G T
T A G G T A T G C C T A G C G T
Linkage disequilibrium◦Non-random association of alleles at
two or more loci.
ImputationImputation
LD
SNP of interest
Resolving Individuals Resolving Individuals Contributing Trace Contributing Trace Amounts of DNA to Highly Amounts of DNA to Highly Complex Mixtures Using Complex Mixtures Using High-Density SNP High-Density SNP Genotyping MicroarraysGenotyping Microarrays Homer, et al. Homer, et al.
TheThe DNA Detective DNA Detective
Is an individual genome present in a DNA mixture?
Query Mixed DNA // Population
DNA DetectiveDNA Detective
We have:Different laboratories > different
conclusions.Usually not accurate at all.Hard and cannot be automatized.
DNA Detective - DNA Detective - MethodologyMethodologySummary:Cumulative sum of allele shifts over
all available SNPs.
Shift’s sign > individual of interest is closer to a reference sample or closer to a given mixture.
First genotype a single SNP for a single person, then adapt it to all mixtures and pooled data.
DNA Detective – Single SNP, Single DNA Detective – Single SNP, Single personpersonRaw preprocessed data > allele
instensity (How much of A and how much of B we have).
1.Transform normalized data into a ratio.
Yi is the estimate of allele frequency
BB AB AA
~0 ~0.5 ~1
DNA DNA Detective - Detective - MethodologyMethodologyUse relative probe intensity
data.Compare allele frequency
estimates from the mixture (M).
Assume reference population (Pop) has similar ancestral components interchangeable.
Distance measure for individual Yi
DNA Detective - DNA Detective - MethodologyMethodology
Null hypotheses, individual is not in the mixture, D(Yi,j) ~ 0
Alternative hypotheses, D(Yi,j) > 0
More similar to M than Pop
D(Yi,j) < 0 Yi,jc is more ancestral similar to
Pop than to M.
DNA Detective - DNA Detective - MethodologyMethodology
(Homer, p.4)
DNA Detective - DNA Detective - ResultsResultsAccurate findings.
Determined if a trace amount (<1%) of DNA is present in a DNA mixture.
Tested with different kinds of Mixtures from public available data.
DNA Detective - DNA Detective - ImplicationsImplicationsForensics application.TraceabilityLeak of privacy information.
◦Public data from many studies. Summary statistics of Allele Frequency.
Political implications.◦How to share the data now?
Thank You!
ReferencesReferences Korn J, et al. Integrated genotype calling and
association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature genetics. 2008 Oct;40(10): 1253-60
Homer N, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008 Aug 29;4(8):e1000167
Liu Y, DPhil, Prchal F. SNP-Chip-Based Genome-Wide Analysis of Genetic Alterations in Hematologic Disorders: The Way Forward?. The Hematologist. 2007
Murray, E. IST 341 Issues in Human Genetics. http://www.science.marshall.edu/murraye/341/snps/Human%20Genetics%20MTHFR%20SNP%20Page.html