the application of next generation sequencing for hla ... · mid mid key key a b sequence of...
TRANSCRIPT
The Application of Next Generation Sequencing for HLA Genotyping in the Clinical LaboratoryDianneDeSantisDepartment of Clinical Immunology, PathWest, Royal Perth Hospital, Perth, Australia
http://www.ebi.ac.uk/ipd/imgt/hla/intro.html
April 2013HLA‐A 2,244HLA‐B 2,934HLA‐C 1,788HLA‐DRB1 1,317HLA‐DQB1 323HLA‐DPB1 185
Why do we need Next Generation Sequencing in the HLA laboratory?
HLA ambiguity results from the amplification and Sanger sequencing based typing (SBT) of partial genesAlleleambiguity ‐ results when polymorphisms that distinguish alleles fall outside of the regions examined by the typing system or incomplete gene sequence in database
• A*01:01:01:01 vs A*01:01:01:02N (Intron 2)
•DRB1*12:01 vs DRB1*12:06 (Exon 3) Exon2 Intron2
Current genotyping strategy at DCI includes sequencing of exons 2-4 for HLA-A, -B, -C,exon 2-3 for DQB1, exon 2 for DPB1 and DRB genes
Genotypeambiguity ‐ results from an inability to establish phase(cis/transambiguity) between closely linked polymorphisms identified by the typing system
HLA ambiguity results from heterozygous sequencing by Sanger SBTA*01:01:01:01+24:02:01:01 ------------- Y-------Y----- -------------R- ------Y------ -------------MS A*01:14 + 24:46 ------------- Y-------Y----- -------------R- ------Y------ -------------MSA*24:02:01:01 -------------- C------T------ ------------A- ------C------ ------------ACA*24:46 -------------- C------T------ ------------A- ------C------ ------------CG
Allele 1 Allele 2A*02011 A*03011Allele 1 Allele 2A*02010101 A*03010101A*0226 A*0307A*0234 A*0308A*02010101 A*03010102NA*02010102L A*03010101A*02010102L A*03010102NA*02010101 A*03010103A*02010102L A*03010103
Allele 1 Allele 2A*02010101 A*03010101A*0226 A*0307A*0234 A*0308A*02010101 A*03010102NA*02010102L A*03010101A*02010102L A*03010102NA*02010101 A*03010103A*02010102L A*03010103A*0224 A*0317A*0290 A*0309A*02010103 A*03010101A*02010103 A*03010102NA*02010103 A*03010103A*020102 A*030112A*0323 A*9295
Allele 1 Allele 2A*02:01:01:01 A*03:01:01:01A*02:26 A*03:07A*02:34 A*03:08A*02:01:01:01 A*03:01:01:02NA*02:01:01:02L A*03:01:01:01A*02:01:01:02L A*03:01:01:02NA*02:01:01:01 A*03:01:01:03A*02:01:01:02L A*03:01:01:03A*02:24:01 A*03:17A*02:90 A*03:09A*02:01:01:03 A*03:01:01:01A*02:01:01:03 A*03:01:01:02NA*02:01:01:03 A*03:01:01:03A*02:01:02 A*03:01:12A*03:23:01 A*02:195A*02:01:52 A*03:01:03A*02:35:01 A*03:108A*02:237 A*03:05
Year IMGT Release2000 1.5Year IMGT Release2005 2.8Year IMGT Release2010 2.28Year IMGT Release2011 3.3.0
The discovery of new HLA alleles has resulted in an increase in heterozygous allele combinations that are identical in the commonly sequenced regions Year IMGT Release2013 3.9.01
550 combinations (ex2+3)
20 combinations (ex2-4)
Next-Generation Sequencing (NGS) is a method that can provide a complete solution to the limitations of currenttyping systemsDue to current limitations of existing methods and the increasing rate of new alleles, there is strong demand for a new method for HLA genotyping.NGS Features Important for HLA Typing:• Clonalamplification
provides sequencing information for a single DNA molecule –ensure the identification of phase• Massivellyparallel
large sequencing capacity enables an expansion of the HLA regions sequence and the ability to include other genes eg KIR, C4 amplification and sequencing of multiple loci (HLA class I & II) from many individuals (barcoding) in a single pool
Template preparationTemplate preparation SequencingSequencing Allele-callingAnalysisAllele-callingAnalysisShort range PCRLR-PCRExome captureWGS
CEMiSeqPGMGS-Jnr454/FLXPacific BiosciencesIn-houseCommercial
ThespecificpathwaywillvaryindifferentlaboratoriesCost? Convenience? Flexibility?
Next Generation Sequencing Strategy
Next ‘Second’ Generation Sequencing• 454 pyrosequencing (read length <1000bp)• Illumina Sequencing (read length 2 x 250bp)• SOLiD sequencing (read length 50-75bp)Third Generation Sequencing• Ion semiconductor sequencing (Ion Torrent) (read length <400bp)• Pacific Biosciences RS – Single molecule Real-time sequencing (read length 3-6kb)• Oxford Nanopore Single molecule
Next Generation Sequencing Technologies
Next Generation Sequencing Workflow: 454, Illumina, Ion Torrent PGM
MID
MID
key
keyA
B
Sequence of interest
Locus‐specific PCR amplification
emPCR Amplification and sequencing
Primer design includes:
• GS sequencing Primer A or Primer B 454 sequencing adapter (which includes a four‐base library “key” sequence) at the 5‐prime portion of the oligonucleotide (25 nt)
• Target‐specific sequence at the 3‐prime end of the oligonucleotide
• Multiplex Identifier (MID) sequence to allow for automated software identification of samples after pooling/multiplexing and sequencing
In recently introduced workflow improvements for HLA genotyping, pooling of amplicons is done immediately following this genomic PCR = less reagent cost, less hands-on time see www.454.com
454 Amplicon Library Generation: MIDS and adaptors are incorporated during genomic PCR
www.454.com
Read Flowgram
Mix amplicons& capture beads
Isolate DNA containing beads
PCR in “water-in-oil” emulsion
Add PCR Reagents& emulsion oil
Amplicon pool
A
B
Micro-reactors
Load Enzyme Beads
Load beads onto PicoTiter™Plate
DNA Capture Bead T
ATP
Light + oxyluciferin
Sulfurylase
Luciferase
APS PPi
Load PTP on Sequencer
Pyro-sequence
cooled 16Mpixel CCD camera
T A G C T
luciferin
454 Sequencing procedure
www.454.com
1,280 genotypes sequenced, 95% allele assignmentOverall concordance 97.2%Median ambiguity string ~ 1‐ 2Analysis time cut >25‐fold with little or no reflex testing required.
HLA Genotyping International Study
• 454 GS Junior• 17 exons of HLA-A, -B, -C, DQB1, DPB1, DRB1,3,4,5• 173 samples, 18 GS Junior runs• Protocol validated under routine conditions in accordance with established policies and procedures of the European Federation for Immunogenetics (EFI)• Average read count =66,078 + 16.8/run, median read length of 425bp + 24• Conexio ATF for analysis
Rapid, scalable and highly automated HLA Genotyping using next-generation sequencing: A transition from research to diagnostics: Danzer et al. BMC Genomics 2013
• From a total of 1,273 loci analysed, 1,241 (97.3%) were initially successful• DRB3 – amplification of exon 2 in the groups DRB3*01 ad *02 was inadequate leadingto uncertain results (n=20)• 77.2% of genotypes were called reliably without editing, • 22.8% needed manual editing
• Mostly DRB1 due to co-amplified DRB pseudo genes or PCR artefacts • HLA-C*07 as a result of C homopolymer region in exon 4
• The mean ambiguity reduction for the analysed loci was 93.5% with no significant improvement for DRB3, DRB4, DRB5
• In nature, when a nucleotide is incorporated into a strand of DNA by a polymerase, a hydrogen ion is released as a by-product.
• Each well holds a different DNA template, beneath the wells is an ion sensitive layer, and beneath that layer is an ion sensor.
PGM Ion Torrent Sequencing Technology
The chip is flooded with a nucleotide in T, G, A, C order. If a nucleotide is incorporated a hydrogen ion is released, and there is a pH change, the pH change is converted to voltage and is recorded by the semiconductor sensorIf the next nucleotide that floods the chip is not incorporated, no hydrogen ion released and therefore no pH change
If there are two identical bases on the DNA strand then the signal is doubled and two bases is recorded.
PGM Ion Torrent Sequencing Technology
A B C DRB DQB1 DPB1
Library Preparation for PGM NGS1 PCR and pool 2 Enzymatic Fragmentation
3 Adapter and Barcode LigationAdapterA Barcode001 P1Adapter
Barcode 002
Barcode 003
Library 1
Library 2
Library 3
TargetInsert
ISP
Emulsion PCR and bead enrichment1 Anneal ssDNA to ISPs 2 Emulsify beads and ssDNA into water-in-oil microreactors
3 Clonal amplification by PCR 4 Enrichment of templated ISPs
ISPISP
ISP
Sequencing of HLA fragment libraries with 400bp chemistryreveals a high proportion of 400bp reads
Uneven read depth distribution across HLA Class I genes remains a problem with longer read chemistry and is most likely due to GC content of these genes and bias in the emPCR
GeneratedfromHLAplugin,LifeTechnologies
Max ReadDepth Max ReadDepth
Ex2 Ex3Ex2 Ex3
Ex2 Ex3Ex2 Ex3
Ex2 Ex3Ex2 Ex3
Ex2 Ex3Ex2 Ex3
Ex2 Ex3Ex2 Ex3
Ex2 Ex3Ex2 Ex3
HLA-AHLA-BHLA-CDRB1DQB1DPB1
Patient 1: Patient 2:
However, minimum read depths obtained following minor modifications to ION PGM library preparation protocol is sufficient for accurate allele callingA*02:01:01:01,B*13;02:01,B*15:01:01:01,C*03:04:01:01,C*06:02:01:01,DRB1*04:04:01,DRB1*07:01:01
DatageneratedfromHLAplugin,LifeTechnologies
Minimum read depths of less than 100 reads/base still enables accurate allele callingA*11:01:01,A*25:01:01,B*07:02:01,B*35:01:01:01,C*04:01:01:01,C*07:02:01,DRB1*04:07:01,DRB1*15:01:01
Sequencing on the Ion PGM identifies DNA fragments representing each HLA allele
68:01:02
68:01:02
68:01:02
02:01:01:01
02:01:01:01
02:01:01:01
02:01:01:01
AdaptedfromASSIGN,MPS,ConexioGenomics
Good read depth ensures accurate base calling with an average base-call error rate ~1-5%68:01:0202:01:01:01
AdaptedfromASSIGN‐MPS,ConexioGenomics
READ DEPTH
NGS-PGM sequencing resolves common allele ambiguities and in some cases identifies the less common allele combination
EXON 1
SSBT B*08:01/08:19N B*27:05/27:13NGS B*08:01:01 B*27:13
08:01:01
08:01:01
27:13
27:13
B*27:13(A)vsB*27:05(C)
AdaptedfromASSIGN‐MPS,ConexioGenomics
Illumina Sequencing Technology: Sequencing by Synthesis• Sequencing by synthesis uses 4 fluorescentlylabelled nucleotides to sequence millions of clusters attached to a flow cell
• Clonal amplification occurs on a flow cell by bridge amplification
• Flow cell contains a dense lawn of primers complementary to adaptors on target fragments
• Unlabelled nucleotides and enzyme are addedto build double‐stranded bridges on flow cell
• Double‐stranded DNA then denatured to allowsequencing of single stranded template on flow cell
• Sequencing occurs by adding four labelled reversible terminators, primers and polymerase
• After laser excitation, fluorescence is emitted from each cluster and captured
• Then cycle is repeated to capture subsequent bases
Insert (variable length)
Read 2 (250 bases)Read 1 (250 bases)
Paired-End Sequencing on Illumina platform increases read length
B*07:02:01B*41:01
TC
TC
ATGC CG
TA
Insert sizes: 787 and 685 basesPhased Polymorphisms 609 bases apartUsing paired‐end 250 base reads
Phasing Paired-End Sequences
Phase-defined complete sequencing of the HLA genes by next-generation sequencing: Hosomichi et al. BMC Genomics 2013• Long range PCR amplicons (3.4kb-13.6kb) including entire regions of HLA-A, -B, -C,DRB1, -DQB1, DPB1• Paired-end reads of 2 x 250bp• 33 homozygous cell lines, 11 HLA heterozygous samples, and 3 parent-child families• 2 methods of sequencing;
• Individual-tagging method – all of the PCR amplicons of 6 HLA genes from an Individual were pooled before library preparation• Gene-tagging method – each PCR amplicon barcoded before library preparation
• Individual-tagging method • 66.35% reads mapped to reference hg19 with average read depth of 157x• 33 homozygous cell lines – 6 amplicons = 198 amplicons• 32/198 amplicons – HLA sequences could not be generated due to low read depth• 152/166 completely homozygous sequences were identical to reference sequence(IMGT/HLA)• 14 were found to be novel – variants in intronic sequence• Unable to obtain phase-define sequences for HLA heterozygous samples
Phase-defined complete sequencing of the HLA genes by next-generation sequencing: Hosomichi et al. BMC Genomics 2013• Gene-tagging method
• 73.1% of all reads mapped to hg19 reference for 66 amplicons• Average depth ranged from 146x – 6,678x, mean 2,281x• 100 HLA gene haplotypes were defined, 32 HLA gene haplotypes recorded1-5bp mm,• 17 HLA gene haplotype had mismatches in exonic region = ?sequence error• 15 HLA gene haplotypes had mismatches in intronic regions = new alleles
• Study concludes that although able to define phase in HLA heterozygous samplesusing the described mapping algorithm (map to hg19), the gene-tagging method would be low throughput and costly compared with individual-tagging method
• Readlength– Important for the linking of resolving polymorphisms and therefore establishing phase
Important considerations for HLA Typing by NGS Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Ex7 Ex8
145 differences between A*01:01:01:01 and A*02:01:01:01
< 211bp >
0 100
200
300
400
500
600
700
800
900
100 0
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
2800
2900
3000
3100
3200
3300
3400
3500
549 bp 374 bp
32 differences between A*66:02 and A*68:02:01:01
893 bp829 bp
2 differences between A*01:01:01:01 and A*01:03
1305 bp
21 differences between A*26:01:01 and A*34:02
A
G
C
T
Lind C et al., Hum Immunol. 10:1033-42 (2010)
‘Third’ Generation Sequencing: Pacific Biosciences RS-Single molecule real-time sequencing
Sam
ple
Prep
arat
ion
LS – long sequencing reads
• Large insert sizes (2kb-10kb)• Generates one pass on each molecule sequenced
• Small insert sizes 500bp• Generates multiple passes on each molecule
sequenced
Standard
Circular Consensus
CCS – high quality sequencing reads
Two sequencing modes of Pacific Biosciences RS
PacBio Single Molecule Sequencing and HLA genotyping• No PCR amplification of genomic DNA
• With the ability to sequence between 3‐6kb, technology will allow sequencing of complete HLA Class I genes and most of the HLA Class II genes eliminating the problem of phase ambiguity
• However, the technology is still in development for the application of HLA genotyping and the cost of sequencing a sample is currently too high for implementationinto the HLA diagnostic laboratory
BUTWATCHTHISSPACE!!!!