considerations for analyzing targeted ngs data hla
DESCRIPTION
Considerations for Analyzing Targeted NGS Data HLA. Tim Hague , CTO. Introduction. Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans. Group of genes ('superregion') on chromosome 6 Essentially encodes cell-surface antigen-presenting proteins. Functions. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/1.jpg)
Considerations for Analyzing Targeted NGS Data
HLA
Tim Hague, CTO
![Page 2: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/2.jpg)
Introduction Human leukocyte antigen (HLA) is the
major histocompatibility complex (MHC) in humans.
Group of genes ('superregion') on chromosome 6
Essentially encodes cell-surface antigen-presenting proteins.
![Page 3: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/3.jpg)
Functions
HLA genes have functions in: combating infectious diseasesgraft/transplant rejectionautoimmunity cancer
![Page 4: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/4.jpg)
Alleles Large number of alleles (and proteins). Many alleles are already known.
The number of known alleles is increasing
![Page 5: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/5.jpg)
HLA Class IGene A B C Alleles 2013 2605 1551Proteins 1448 1988 1119 HLA Class IIGene DRA DRB* DQA1 DQB1 DPA1 DPB1 Alleles 7 1260 47 176 34 155Proteins 2 901 29 126 17 134
HLA Class II - DRB AllelesGene DRB1 DRB3 DRB4 DRB5 Alleles 1159 58 15 20Proteins 860 46 8 17
![Page 6: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/6.jpg)
Analysis Challenges
HLA genes have specific analysis challenges regardless of the sequencing technology.
![Page 7: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/7.jpg)
High Polymorphism
High rate of polymorphism – up to 100 times the average human mutation rate.
The HLA-DRB1 and HLA-B loci have the highest
sequence variation rate within the human genome.High degree of heterozygosity – homozygotes are
the exception in this region.
![Page 8: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/8.jpg)
![Page 9: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/9.jpg)
Duplications
High level of segmental duplications Lots of similar genes and lots of very similar
pseudegenes. Duplicated segments can be more similar to each other
within an individual than they are similar to the corresponding segments of the reference genome.
![Page 10: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/10.jpg)
![Page 11: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/11.jpg)
Complex Genetics
Particularly HLA-DRB* The DR β-chain is encoded by 4 loci, however
only no more than 3 functional loci are present in a single individual, and only a maximum of 2 per chromosome.
![Page 12: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/12.jpg)
![Page 13: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/13.jpg)
Mitigating Factors
It's not all bad news:Many HLA alleles are already well known – both in terms of sequence and frequencies within the population. The HLA region is fairly small so there a high degree of linkage disequilibrium, and therefore lots of known haplotypes.
![Page 14: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/14.jpg)
Traditional Typing SSO – low resolution, high throughput,
cheap SSP – very fast results, low resolution SBT – sequence-based typing, high
resolution, usually done by Sanger sequencing.
![Page 15: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/15.jpg)
NGS Typing
High resolution, an alternative to Sanger-based SBT
Why is it needed?
![Page 16: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/16.jpg)
Sanger and HLA Sanger data is still the gold standard in
the genomic sequencing industry, even though it is very expensive compared to NGS.
1 in 1'000 base error rate, if forward and reverse typing are done, error rate drops to 1 in 1'000'000.
So why is it bad for HLA?
![Page 17: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/17.jpg)
Phase Resolution
2x chromosome 6 Many loci, many alleles Lots of heterozygosity
![Page 18: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/18.jpg)
reference sequence
A
T
Allele 1
Allele 2 A TAllele 1
Allele 2
OR???
Allele Phasing problem
T/ A
G/T
consensus sequence
![Page 19: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/19.jpg)
The Problem with Sanger There is only one signal High degree of heterozygosity = high degree of
ambiguity Requires statistical techniques based on known
allele frequencies, plus manual intervention by trained operators
Ambiguity can only be resolved statistically, which can lead to wrong assignment for rare types
![Page 20: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/20.jpg)
![Page 21: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/21.jpg)
HLA typing by Sanger method
GGACSGGRASACACGGAAWGTGAAGGCCCACTCACAGACTSACCGAGYGRACCTGGGGACCCTGCGCGGCTACTACAACCAGAGCGAGGMCGGT
0
50
100
150
200
250
300
350
400
450
500
550
Number of potential alleles
![Page 22: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/22.jpg)
NGS Advantages Can reduce ambiguity Phase resolution - two signals, but lots of
short reads Cheaper and faster than Sanger Less manual intervention required
![Page 23: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/23.jpg)
NGS Data - Unphased
![Page 24: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/24.jpg)
NGS Data - Phased
![Page 25: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/25.jpg)
NGS Approaches
HLA*IMP – chip based imputation engine
Reference-based alignment, followed by a HLA call based on the variants detected during alignment
Search against database of known alleles
![Page 26: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/26.jpg)
NGS Reference-based Fraught with difficulties Very hard to align reads to this region The variant/HLA call is only as good as the
alignment No coverage = no call
Has been attempted by Broad Institute (HLA Caller) and Roche
![Page 27: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/27.jpg)
Alignment Efforts
RainDance provide a targeted HLA amplification kit call HLAseq.
Target: the whole MHC superregion (except for some tandem repeat regions)
Goal: align this data, before doing variant/HLA call.
![Page 28: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/28.jpg)
Diverse variant “density” in the MHC superregion
Based on a single sample
![Page 29: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/29.jpg)
Default BWA alignment – No coverage at an exon of HLA-DMB
![Page 30: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/30.jpg)
Low coverage and orphaned reads at a HLA-DRB1 exon
![Page 31: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/31.jpg)
BWA vs more permissive alignment: higher coverage = higher noise
![Page 32: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/32.jpg)
Large targeted region without usable coverage
![Page 33: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/33.jpg)
NGS Reference-based Not providing enough coverage everywhere
What about de novo?
![Page 34: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/34.jpg)
De novo assembly (MIRA)
287 contigs (longest contig: 2199 bp)
Mean contig size: 268 bp
Median contig size: 209 bp
Total consensus: 77084 bp
RainDance target: ~ 3800000 bp
![Page 35: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/35.jpg)
De novo assembly (MIRA)
![Page 36: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/36.jpg)
NGS De Novo Alignment Not enough contigs produced, not enough coverage of
the target region.
What about a hybrid approach?
![Page 37: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/37.jpg)
De novo assembly with “backbone”
First, alignment to backbone, then de novo assembly
Backbone: 2220 contigs from HG19 chr 6 (sum: 3554852 bps) → almost whole RainDance target
Results:
Max reads / backbone contig: 197
Max coverage: 71
![Page 38: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/38.jpg)
De novo assembly with “backbone”
![Page 39: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/39.jpg)
NGS Typing - Alignment Based We tried: Burrows Wheeler alignerMore sensitive, seed and extend alignerDe novo aligner'Hybrid' de novo aligner
The variant/HLA call is only as good as the alignmentThe alignments were not good enough
![Page 40: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/40.jpg)
NGS Database Based Search against 'database' of known alleles Such as IMGT/HLA database, available from EBI
web site
Stanford, Connexio, JSI Medical, BC Cancer Agency and Omixon have all tried this approach.
![Page 41: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/41.jpg)
![Page 42: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/42.jpg)
DB Based Approach AdvantagesLess mapping headaches Unambiguous resultsPotential to be fast
DifficultiesNovel allele detectionHomozygous alleles
![Page 43: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/43.jpg)
![Page 44: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/44.jpg)
![Page 45: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/45.jpg)
Results with Exome data
![Page 46: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/46.jpg)
Exon level detail
![Page 47: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/47.jpg)
Detailed results - short read pileup
![Page 48: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/48.jpg)
Conclusions DB based approach to HLA typing is new but very
promising
NGS approaches can resolve much of the ambiguity of Sanger SBT
DB based approach can also overcome the limitations of NGS reference-based alignment
![Page 49: Considerations for Analyzing Targeted NGS Data HLA](https://reader035.vdocuments.us/reader035/viewer/2022062222/5681586e550346895dc5cce3/html5/thumbnails/49.jpg)
Conclusions Available DB based HLA typing tools differ in:SpeedSequencers supportedTypes of sequencing data supported (targeted,
exome, whole genome)Ease of useAmbiguity of resultsDegree of manual intervention requiredNovel allele detection capabilities