structural variation landscape across 26 human populations ... · cao, h., et al., rapid detection...
TRANSCRIPT
Position (kbp)
Background
Methods
Abstract
(1) Long molecules of DNA are labeled with Bionano reagents by (2) incorporation of fluorophore labeled nucleotides at a specific sequence motif throughout the genome. (3) The labeled genomic DNA is then linearized in the Saphyr Chip using NanoChannel arrays (4) Single molecule are imaged and then digitized by the Saphyr instrument. (5) Molecules are labeled with a unique signature pattern that is uniquely identifiable and useful in assembly into genome maps. (6) Bionano maps may be used in a variety of downstream analysis using Bionano Access software.
Extraction of long DNA molecules Label DNA at specific sequence motifs
Saphyr Chip linearizes DNA in NanoChannel arrays
Saphyr automates imaging of single molecules in NanoChannel arrays
Molecules and labels detected in images by instrument software
Bionano Access software assembles optical maps
1 2 3 4 5 6
Blood Cell Tissue Microbes
Free DNA Solution DNA in a Microchannel DNA in a Nanochannel
Gaussian Coil Partially Elongated Linearized
Free DNA Displaced Strand
Polymerase Nick Site Nickase Recognition
Motif
©20
17 B
iona
no G
enom
ics.
All
right
s re
serv
ed.
Structural Variation Landscape Across 26 Human Populations Reveals Population Specific Variation Patterns in Complex Genomic regions
Structuralvaria+on(SV)studiesusingdifferentethnicgroupsatpopula+onlevelleadtogreaterinsightinthegenomicandtraitdiversityanddifferencesindiseasee+ology.Whilestructuralvaria+on(SV)basedonshort-readsequencesandsta+s+calphasinghavebeenconstructedforsamplescomprisingthe1000GenomesProject1,thesensi+vityofdetec+onandlocaliza+onofsomeclassesofSVs(suchaslonginser+ons,inversions,copynumbervaria+ons,andduplica+onsspanningtensofkbpormore)aresubop+mal.Wehaveconstructedgenomeop+calmaps2usingBionanonext-genera+onmapping(NGM)for146unrelatedindividualsfrom26humanpopula+onswithlongDNAmolecules(>150kbp)fluorescentlylabeledatspecificsequencemo+fs(nickaserecogni+onsites).These
samplesconsistof6individuals(3malesand3females)fromeachof26humanpopula+onsofthe1000GenomesCollec+on.Asthedataaregeneratedfromna+veDNAwithoutamplifica+onandassembledwithouttheuseofthe
humanreferencegenome,thegenomemapsaredenovoassembliesofthe146genomes.AllSVs>1.5kbparevisualizedandanalyzedbyalgorithmsdevelopedbyBionanoandtheteamthatpar+cipatedinthisstudy.
Whenthemo+fpaYernsfromthesegenomeop+calmapswerecomparedagainsttheinsilicomapsdigitallyderivedfromthehumanreferencegenomeandagainsteachother,wefoundthattherewereclearspecificSVpaYernsamongdifferentethnicgroupsandindividualsinthepopula+on.Thesepopula+onSVpaYernsaremostpronouncedincomplexregionsofthegenomewherelarge(>50kbp)inversionsandtandemduplica+onsaremixedtogetherinthesameloci.Theseregionsincludethelociformicrodele+onsyndromes(suchas7q11.23,15q13.3,16p11.2and22q11.2)andsubtelomericregionswhereneariden+cal,longrepeatsrenderthemhotspotsforSVforma+onandintractableforshort-readsequencestoassembleintouniquecon+gs.
Genera+nghigh-qualityfinishedgenomesrepletewithaccurateiden+fica+onofstructuralvaria+onandhighcomple+on(minimalgaps)remainschallengingusingshortreadsequencingtechnologiesalone.BionanoNGMprovidesdirectvisualiza+onoflongDNAmoleculesintheirna+vestate,bypassingthesta+s+calinferenceneededtoalignpaired-endreadswithanuncertaininsertsizedistribu+on.Theselonglabeledmoleculesaredenovoassembledintophysicalmapsspanningthewholegenome.Theresul+ngorderandorienta+onofsequenceelementsinthemapcanbeusedforanchoringNGScon+gsandstructuralvaria+ondetec+on.
HRCao4,C.Chu1,A.Leung3,L.Li3,C.Lin1,J.McCaffrey2,,Y.Mostovoy1,A.Naguib4,E.Lam4,A.Poon1,S.Pastor2,R.Rajagopalan2,J.Sibert2,M.Sakin1,W.Wang4,A.Has+e4,E.Young2,T.Chan3,K.Yip3,M.Xiao2,P.Kwok1
Conclusions Wehaveconstructedgenomeop+calmapsusingBionanoNGMfor146unrelatedindividualsfrom26humanpopula+onswithlongDNAmolecules
(>150kbp)fluorescentlylabeledatspecificsequencemo+fs(nickaserecogni+onsites).Thesesamplesconsistof6individuals(3malesand3females)fromeachof26humanpopula+onsofthe1000GenomesCollec+on.
HerewedemonstratetheabilityoflongsinglemoleculemappingtoresolvecomplexlongrangeSVs,some+meswithmul+plehaplotypes,inthehumangenomeandprovidenew“alterna+ve”humanpopula+onbasedreferencesfortheseregionsthatareassociatedwithimportanthumandiseases.Thepopula+onspecificSVpaYernshavebeenshowntopresentinrela+ve“well-behaved”aswellasvariablecomplexregions,sheddinglightontheoriginsofthecomplexregionsandthepaYernsmorecloselyassociatedwithhumandisease.Inconclusion,BionanoNGMmayprovetobetheonecost-effec<ve,fastandcomprehensivepla?ormforpopula<onlevelstudyoffunc<onally-relevantlargestructuralvariants,pavingthewayfortheeraofprecisiongenomicsandmedicine. .
Reference Sudmant PH et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015; 526:75-81. Mak AC et al. Genome-Wide Structural Variation Detection by Genome Mapping on NanoChannel Arrays. Genetics. 2016; 202:351-62. Cao, H., et al., Rapid detection of structural variation in a human genome using NanoChannel-based genome mapping technology. Gigascience (2014); 3(1):34 Lam, E.T., et al. Genome mapping on NanoChannel arrays for structural variation analysis and sequence assembly. Nature Biotechnology (2012); 10: 2303
1)UniversityofCalifornia,SanFrancisco,SanFrancisco,CA;2)DrexelUniversity,Philadelphia,PA. 3)CUHK,Sha+n,HongKong;4)BionanoGenomics,Inc.,LaJolla,CA
DenovoAssembledGenomeMapsof146unrelatedindividualsfrom26humanpopula<onsAnalyzedforSVs
SummaryofSVSta<s<cs
http://www.1000genomes.org/sites/1000genomes.org/files/documents/1000-genomes-map_11-6-12-2_750.jpg
• 5.6% of the reference genome not present in maps• ~20 Mbp new genomic content not found in reference genome• 5% of the reference genome is covered in <20% of the assemblies• ~70% of the genome is “well-behaved” and covered by most
individuals• ~1800SVsarecommoninallsuper-popula+on(Black)• ~1500SVsaresharedatleastin2oftheSuPop(Grey)• Largepropor+onsofuniqueSVsinAFR(~2100)(yellow)
• Largepropor+onsofuniqueSVsinAFR(42%)
VariableComplexityObservedintheMHCRegion(chr6:28.5-33.5M)
• Thewholeregionspansacrossalongrange(5Mbp)• Anoverviewofcon+g-to-referencemappingshows
differentdegreesofvaria<onsamongsub-regions28Mb 33Mb
C D F1yellowlinefor1con+gEachsamplemayhavemul+plecon+gsUnmappedregionsdenotedingreen
B E GA
Highcomplexity
1
Reference
Con+g
Pattern 4: C<-G
A C
D
E
F
G
B Pattern 1
Pattern 2
Pattern 3 Pattern 4 C
A B
F G
C G
Pattern 1: A->B->C->E->F->G
Pattern 2: A->B<-D->G
Pattern 3: A<-C->F->G
SegmentalDuplica+onRegion:16p12
AFRisthedeepestsplitsamong
Popula<onstructurestudyPhylogene<ctree(Fst)