unique dual indexing pcr reduces chimeric contamination and...

7
Contents lists available at ScienceDirect Talanta journal homepage: www.elsevier.com/locate/talanta Unique dual indexing PCR reduces chimeric contamination and improves mutation detection in cell-free DNA of pregnant women Meijie Du a,1 , Yihua He b,1 , Jian Chen b , Hairui Sun b,c , Yuwei Fu b , Jianbin Wang a,a School of Life Sciences, Tsinghua University, Beijing, 100084, China b Department of Echocardiography, And Key Laboratory of Fetal Heart Disease, Maternal and Child Medicine, Beijing AnZhen Hospital Affiliated to Capital Medical University, Beijing, 100029, China c School of Biological Science and Medical Engineering, Beihang University, Beijing, 100083, China ARTICLEINFO Keywords: Multiplexed sequencing Chimeric amplicon Dual index Allele fraction Cell-free DNA ABSTRACT Allele fraction measurement is an essential component in nucleic acid analysis. The formation of chimeric amplicons during multiplex PCR amplification, however, greatly affects the allele fraction even before down- stream analysis. Previous error correction strategy with unique molecular indexing (UMI) targets mainly points mutations rather than chimeras. Since the mutant allele detection in pregnant women cell-free DNA (cfDNA) is limited by chimeric amplicon contamination, a more direct error correction solution is demanded. Here we demonstrate effective reduction of chimeric amplicon contamination by unique dual indexing. With error cor- rected deep sequencing analysis, we achieved 100% accuracy in 16 tests of the parental mutation inheritance and de novo mutationsincfDNAofpregnantwomen,whosefetuseswereatriskoftuberoussclerosiscomplexor Marfan syndrome. Our error correction strategy could offer a versatile solution for accurate multiplex PCR amplification. 1. Introduction The continuous development of sequencing technology has pro- pelled the adoption of high-throughput nucleic acid analysis in various fields.Multiplexsamplepreparationisthereforewidelyusedtoachieve higher cost efficiency. Index hopping in Illumina sequencing platforms hasbeenwidelyreportedsinceitsfirstdiscoveryin2016[1–4].Infact, such chimeric amplicons not only exit in sequencing platform, but also inanyPCRproceduresamongmultiplexedsamples.Chimeraisasource oferrorinPCRprocedureespeciallyinmultiple-cycleandmulti-sample PCR. Amplification stalls and switches to another template, and in this way, recombinant molecules form (Fig. 1A). In multi-sample sequen- cing, each sample is labeled with specific index to be distinguished in demultiplexingstep.Ifindexinganddemultiplexingarebasedonsingle end, then chimeric amplicons in general can cause sample cross con- tamination, leading to false judgement of allele fractions, and further errors in nucleic acid analysis (Fig. 1A). Noninvasive prenatal testing (NIPT) has been gaining more popularity by its noninvasive advantage over conventional methods such as amniocentesis and chorionic villus sampling, which carry 1% risk of miscarriage [5,6]. NIPT is based on the discovery of fetal DNA circulating in pregnant women's peripheral blood; by measuring ma- ternal peripheral blood cell-free DNA (cfDNA) [7], whether the fetus hasachromosomalormonogenicmutationcanbedeterminedandused for clinical suggestion. Chromosomal NIPT has achieved good perfor- mance and wide application in clinic [8–10]; in contrast, the mono- genic disease NIPT development is hindered for its stringent require- ment on target gene coverage and allele fraction accuracy. Targeted deep sequencing was recently established and refined as an effective strategy for monogenic disease NIPT [11–13]. By high coverage tar- geted sequencing of pregnant women's cfDNA, fetal allele buried in maternal genotype background can be measured at each locus of dis- ease-related genes. Targeted deep sequencing prevails over digital PCR (dPCR)indiscoveryof de novo sporadicmutations[14,15],andreduces cost compared to traditional whole genome sequencing. To improve the sequencing accuracy, researchers added unique https://doi.org/10.1016/j.talanta.2020.121035 Received 15 March 2020; Received in revised form 5 April 2020; Accepted 11 April 2020 Abbreviations: UMI, unique molecular indexing; cfDNA, cell-free DNA; NIPT, noninvasive prenatal testing; dPCR, digital PCR; EDTA, ethylenediaminetetraacetic acid; gDNA, genomic DNA; WBC, white blood cell; CVD, cardiovascular disease; OMIM, Online Mendelian Inheritance in Man Database; TSC, tuberous sclerosis complex; IDT, Integrated DNA Technologies; MAF, minor allele fraction; SNP, single nucleotide polymorphism; Indel, insertion-deletion variation Corresponding author. E-mail address: [email protected] (J. Wang). 1 These authors contributed equally to this work. Talanta 217 (2020) 121035 Available online 14 April 2020 0039-9140/ © 2020 Elsevier B.V. All rights reserved. T

Upload: others

Post on 20-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unique dual indexing PCR reduces chimeric contamination and …accuquantbio.life.tsinghua.edu.cn/images/publications/... · 2020. 8. 19. · 2.2. Samplecollection Fromeachpregnantwomanandherfamilymembers,2–8mlper-ipheralbloodwascollectedintoEDTAanticoagulanttubes.Fromeach

Contents lists available at ScienceDirect

Talanta

journal homepage: www.elsevier.com/locate/talanta

Unique dual indexing PCR reduces chimeric contamination and improvesmutation detection in cell-free DNA of pregnant womenMeijie Dua,1, Yihua Heb,1, Jian Chenb, Hairui Sunb,c, Yuwei Fub, Jianbin Wanga,∗a School of Life Sciences, Tsinghua University, Beijing, 100084, ChinabDepartment of Echocardiography, And Key Laboratory of Fetal Heart Disease, Maternal and Child Medicine, Beijing AnZhen Hospital Affiliated to Capital MedicalUniversity, Beijing, 100029, Chinac School of Biological Science and Medical Engineering, Beihang University, Beijing, 100083, China

A R T I C L E I N F O

Keywords:Multiplexed sequencingChimeric ampliconDual indexAllele fractionCell-free DNA

A B S T R A C T

Allele fraction measurement is an essential component in nucleic acid analysis. The formation of chimericamplicons during multiplex PCR amplification, however, greatly affects the allele fraction even before down-stream analysis. Previous error correction strategy with unique molecular indexing (UMI) targets mainly pointsmutations rather than chimeras. Since the mutant allele detection in pregnant women cell-free DNA (cfDNA) islimited by chimeric amplicon contamination, a more direct error correction solution is demanded. Here wedemonstrate effective reduction of chimeric amplicon contamination by unique dual indexing. With error cor-rected deep sequencing analysis, we achieved 100% accuracy in 16 tests of the parental mutation inheritanceand de novo mutations in cfDNA of pregnant women, whose fetuses were at risk of tuberous sclerosis complex orMarfan syndrome. Our error correction strategy could offer a versatile solution for accurate multiplex PCRamplification.

1. Introduction

The continuous development of sequencing technology has pro-pelled the adoption of high-throughput nucleic acid analysis in variousfields. Multiplex sample preparation is therefore widely used to achievehigher cost efficiency. Index hopping in Illumina sequencing platformshas been widely reported since its first discovery in 2016 [1–4]. In fact,such chimeric amplicons not only exit in sequencing platform, but alsoin any PCR procedures among multiplexed samples. Chimera is a sourceof error in PCR procedure especially in multiple-cycle and multi-samplePCR. Amplification stalls and switches to another template, and in thisway, recombinant molecules form (Fig. 1A). In multi-sample sequen-cing, each sample is labeled with specific index to be distinguished indemultiplexing step. If indexing and demultiplexing are based on singleend, then chimeric amplicons in general can cause sample cross con-tamination, leading to false judgement of allele fractions, and furthererrors in nucleic acid analysis (Fig. 1A).Noninvasive prenatal testing (NIPT) has been gaining more

popularity by its noninvasive advantage over conventional methodssuch as amniocentesis and chorionic villus sampling, which carry 1%risk of miscarriage [5,6]. NIPT is based on the discovery of fetal DNAcirculating in pregnant women's peripheral blood; by measuring ma-ternal peripheral blood cell-free DNA (cfDNA) [7], whether the fetushas a chromosomal or monogenic mutation can be determined and usedfor clinical suggestion. Chromosomal NIPT has achieved good perfor-mance and wide application in clinic [8–10]; in contrast, the mono-genic disease NIPT development is hindered for its stringent require-ment on target gene coverage and allele fraction accuracy. Targeteddeep sequencing was recently established and refined as an effectivestrategy for monogenic disease NIPT [11–13]. By high coverage tar-geted sequencing of pregnant women's cfDNA, fetal allele buried inmaternal genotype background can be measured at each locus of dis-ease-related genes. Targeted deep sequencing prevails over digital PCR(dPCR) in discovery of de novo sporadic mutations [14,15], and reducescost compared to traditional whole genome sequencing.To improve the sequencing accuracy, researchers added unique

https://doi.org/10.1016/j.talanta.2020.121035Received 15 March 2020; Received in revised form 5 April 2020; Accepted 11 April 2020

Abbreviations: UMI, unique molecular indexing; cfDNA, cell-free DNA; NIPT, noninvasive prenatal testing; dPCR, digital PCR; EDTA, ethylenediaminetetraaceticacid; gDNA, genomic DNA; WBC, white blood cell; CVD, cardiovascular disease; OMIM, Online Mendelian Inheritance in Man Database; TSC, tuberous sclerosiscomplex; IDT, Integrated DNA Technologies; MAF, minor allele fraction; SNP, single nucleotide polymorphism; Indel, insertion-deletion variation

∗ Corresponding author.E-mail address: [email protected] (J. Wang).

1 These authors contributed equally to this work.

Talanta 217 (2020) 121035

Available online 14 April 20200039-9140/ © 2020 Elsevier B.V. All rights reserved.

T

Page 2: Unique dual indexing PCR reduces chimeric contamination and …accuquantbio.life.tsinghua.edu.cn/images/publications/... · 2020. 8. 19. · 2.2. Samplecollection Fromeachpregnantwomanandherfamilymembers,2–8mlper-ipheralbloodwascollectedintoEDTAanticoagulanttubes.Fromeach

molecular indexing (UMI) into panel enrichment method [12]. Uniquemolecular indexing is reported to effectively detect mutant allele as lowas 0.1% [16] and is therefore a very necessary assisting technique forultralow frequency mutations such as somatic mutations of cancer pa-tients, or error-zero-tolerant ancient DNA for ultralow coverage. In theapplication of pregnant woman fetal allele detection, fetal fraction is4%–30% [17] during pregnancy increasing with gestational age, andfetal allele signal in pregnant woman cfDNA is over 2%. Therefore weperceive, without use of high-cost UMI, there should be far more effortsdevoted to improving panel enrichment technique, especially aiming atPCR procedure.Routine single indexing has the reported rate of read misalignment

ranging from 0 to 10% [2,18,19] which means single end index hoppingrate would cause too high noise prohibiting accurate allele counting.Replacing traditional single indexing with non-redundant dual indexingcould reduce such errors greatly. Similar method was reported to de-crease index errors caused by different sequencing platforms [2]. In thispaper, we detected single indexing hopping rate in various PCR pro-cedures during targeted sequencing, which illustrates why routine tar-geted sequencing have contamination among samples and can't achievepracticable accuracy in low-fetal-fraction maternal cfDNA. We opti-mized PCR procedures with unique dual indexing of multiplex samples,acquired clean signals for> 1% mutant fraction in pregnant women

cfDNA. We validated our method with clinical samples from pregnantwomen, whose fetuses were at risk of carrying mutations from parentsor de novo origin. We obtained 100% accuracy in all types of clinicalscenarios and our method showed great potential in clinical applica-tion.

2. Material and methods

2.1. Materials

The study was approved by the Medical Ethics Committee of bothBeijing Anzhen Hospital and Tsinghua University. All methods wereperformed in accordance with the relevant guidelines and regulations.Pregnant women and their family members who came to Maternal-FetalConsultation Center, Beijing Anzhen Hospital, were enrolled in thisstudy. Patients and family members were given full description of re-search program, including potential risk. Mock samples were mixedwith mother blood and child blood. We obtained informed consent fromall patients and family members before genetic testing. All pedigreeswere at risk of dominant monogenic cardiovascular diseases for therewas a confirmed proband in this family or the fetus had definite phe-notype in the ultrasound diagnosis during pregnancy.

Fig. 1. Unique dual indexing effectively reduces chimeric amplicon contamination. (A) The generation of chimeric amplicons during PCR. (B) Unique dual indexinglabels chimeric amplicons. (C) Contamination levels of different multiplex PCR strategies. (D) On the locus where mother is homozygous and fetus is heterozygous,0.5 x fetal fraction can be detected in maternal cfDNA as non-maternal allele B; on the locus where mother is homozygous and fetus is homozygous, no non-maternalallele shall be detected in maternal cfDNA; on the locus where mother is heterozygous and fetus is heterozygous, allele fraction is 1:1 for mother and fetus share thesame genotype; on the locus where mother is heterozygous and fetus is homozygous, allele fraction is no longer 1:1 for fetus brings more allele A, and the increasingfraction of A-B is 1 x fetal fraction.

M. Du, et al. Talanta 217 (2020) 121035

2

Page 3: Unique dual indexing PCR reduces chimeric contamination and …accuquantbio.life.tsinghua.edu.cn/images/publications/... · 2020. 8. 19. · 2.2. Samplecollection Fromeachpregnantwomanandherfamilymembers,2–8mlper-ipheralbloodwascollectedintoEDTAanticoagulanttubes.Fromeach

2.2. Sample collection

From each pregnant woman and her family members, 2–8 ml per-ipheral blood was collected into EDTA anticoagulant tubes. From eachpregnant woman within the proper gestation stage, we collected10–20 ml amniotic fluid with a special linear puncture probe underguidance of an Aloka 1400 ultrasound instrument. Samples were kepton ice before being processed within 2 h. From each new born baby, wecollected umbilical cord (~5 cm in length) tissue and froze it under−80 °C. 16 pedigrees (in pedigrees with a proband, the proband samplewas collected as control; in all pedigrees, the pregnant woman cfDNAand the fetus sample were collected for testing and validation) wereused in clinical case research (Table S1); genomic DNA (gDNA) of 33healthy people were also used in single indexing and double indexingmeasurement.

2.3. cfDNA isolation from plasma

Peripheral blood was centrifuged at 1,600 g for 10 min at 4 °C toseparate plasma and white blood cells (WBCs). Plasma underwent an-other round of centrifugation at 16,000 g for 10 min at 4 °C to removetrace number of WBCs. Cell-free DNA was extracted using commercialkits (Magen, D3182-02) following manufacturer's instruction. GenomicDNA of pedigree was extracted from whole blood, buffy coat, amnioticfluid or umbilical cord using commercial kits (Magen, D3018-03) fol-lowing manufacturer's instruction.

2.4. Cardiovascular disease gene panel design

We curated a set of cardiovascular disease (CVD) genes from OnlineMendelian Inheritance in Man (OMIM) and expanded the list withgenes from published research. We also included several genes relatedto cardiomyopathy and tuberous sclerosis complex (TSC). We designedprobes using the IDT (Integrated DNA Technologies) web designer,excluding repetitive regions. 120-nt single strand RNA probes withbiotin labeling was manufactured by GeneX Health.

2.5. Unique dual indexing library construction

Library construction workflow begins with standard whole-genomeDNA sequencing library construction [20]. Briefly, 50 ng genomic DNAin 130 μl TE buffer was fragmented to ~180 bp using sonication(Covaris S220) with the following parameter setting: 175 W peak in-cident power, 10% duty factor, 200 cycles per burst, 300 s treatmenttime. Libraries were constructed from fragmented genomic DNA orcfDNA using NEBNext® Ultra™ II DNA Library Prep Kit for Illuminafollowing manufacturer's protocol. Multiplex Oligos were used for un-ique dual indexing, so that the N5 and N7 indices of each sample consista unique dual combination which will be used to distinguish samples.Four to eight libraries were pooled together and concentrated to500–1000 ng in 3 μl for enrichment. In-solution target enrichment wasperformed using iGeneTech reagents with standard protocols. We se-quenced each library with 2 × 150 bp mode using Illumina HiSeq 4000or XTen. The logic of unique dual index is that each N5 index shouldonly pair with one particular N7 index (Fig. 1B). Chimeric ampliconsbring N5 and N7 indices of different origins together. Such recombinantindex pairs, together with the chimeric amplicons, can be easily re-cognized and filtered out.

2.6. Sequencing data analysis

Raw sequencing reads were first filtered to remove adaptors andlow-quality bases (Phred score< 20) by homemade Perl codes. Readswere then mapped to human GRCh38 reference genome by Bowtie 2;and then reads were furthermore filtered to meet stricter standards,which are mapping quality (Q ≥ 15), limited mismatch (XM ≤ 3) and

best mapping (alignment score AS larger than second alignment scoreXS). Mapped bam was then realigned by GATK followed by genotypecalling with GATK and Pindel. Minor allele fraction (MAF) of all validtargeted and padding 300bp loci was calculated and plotted. Valid locidepth cutoff of pregnant woman cfDNA was 500. Base calling requiredat least 5 supporting reads for all mutation kind including single nu-cleotide polymorphism (SNP) and insertion-deletion variation (Indel).

2.7. Clinical annotation and statistics

We annotated variations with Annovar, supplemented withRefGene, ESP6500, 1000 Genome, ExAC, COSMIC, ClinVar and OMIMdatabases. For de novo mutation detection, mutation with annotatedpopulation frequency over 0.5% were filtered.Statistical analysis is based on a previous dPCR NIPT study [14]

with modifications for sequencing compared to dPCR. First, mean (μ) offetal fraction (ε) is calculated from the weighted median of all lociwhere fetus is heterozygous and the mother is homozygous(1% < MAF<25%) using μ = 2MAF. Error of ε is calculated as σ withthe STD of fetal fraction counting and Poisson noise expected from it( N2STD/ locus count ). Then for each locus of interest, likelihood dis-tributions of the two possible fetal genotypes are determined, based onthe fetal fraction and the allele count on this locus. In maternal case,unaffected distribution: μ equals maternal ideal background valueminus fetal part in maternal DNA (0.5-ε/2), σ consists of Poisson noiseof unaffected μ (which decreases when allele count on this locus in-creases) and σ of ε; affected distribution: μ equals maternal idealbackground value 0.5 for same fetal genotype with mother, σ consists ofPoisson noise of affected μ and σ of ε. In paternal, sibling, de novo, all ofnon-maternal case, unaffected distribution: μ equals maternal idealbackground (0) for no non-maternal part existing in cfDNA; affecteddistribution: μ equals maternal ideal background value 0 plus ε/2; two σare determined similarly with maternal case. Distribution displays alikelihood density curve, and real detected allele fraction has twopossibility density values on two curves which determine a likelihoodratio (ratio of affected inheritance possibility over unaffected in-heritance one); a likelihood ratio larger than 8 suggests affected gen-otype whereas a likelihood ratio smaller than 1/8 suggests unaffectedgenotype.

3. Results and discussion

3.1. Unique dual indexing effectively reduces chimeric ampliconcontamination

We first established a targeted deep sequencing pipeline with aCVD-oriented (cardiovascular disease) panel of 208 genes (Table S2).The hybridization probes were 120bp each and covered ~0.9 Mbpcoding regions of these 208 genes. Most genes had over 95% coverageby probes (Fig. S1A). With 1 Gbp raw sequencing data, we obtained~600x mean coverage of the targeted genes (Fig. S1B). Higher coveragecould be easily achieved with 5–10 Gbp sequencing data (Figs. S1C andS1D). The loss of effective data (~2%) was mainly due to high strin-gency data mapping filtering (~1% loss) and non-specific hybridization(~44% loss).To evaluate the levels of chimeric amplicon contamination in mul-

tiplex targeted deep sequencing results, we prepared target-enrichedlibraries from several patient donor genomic DNA samples using uniquedual indexing. Multiplexing was implemented at different stages such asmultiplex PCR pre-amplification and multiplex target enrichment (Fig.S2). Multiple libraries were pooled before multiplex high-throughputsequencing. We performed sequencing data demultiplexing using onlyN7 indices to mimic the results of single indexing.Theoretically, each of the samples should present a very clean bi-

modal distribution of minor allele fractions, since the minor allelefractions can only be either 0% (homozygous genotype) or ~50%

M. Du, et al. Talanta 217 (2020) 121035

3

Page 4: Unique dual indexing PCR reduces chimeric contamination and …accuquantbio.life.tsinghua.edu.cn/images/publications/... · 2020. 8. 19. · 2.2. Samplecollection Fromeachpregnantwomanandherfamilymembers,2–8mlper-ipheralbloodwascollectedintoEDTAanticoagulanttubes.Fromeach

(heterozygous genotype). If we mixed the DNA samples after adaptorligation and performed multiplex PCR pre-amplification and multiplextarget enrichment, a large number of low fraction variations emerged(Fig. S3A). We calculated the weighted mean minor allele fractions ofthese affected loci (0% < MAF<25%) and obtained a quantitativeview of the contamination (Fig. 1C). If we performed separated PCRpre-amplification first, and then mixed the samples for multiplex targetenrichment, many low fraction variations remained (Figs. 1C and S3B).Even if we performed separated PCR pre-amplification and separatedtarget enrichment, lane-sharing DNA sequencing would still producesome low fraction variations (Figs. 1C and S3C). Such artefacts couldcause fake fetal parts with a fraction from 5% to 17% (Figs. S3A–C),which would influence real maternal cfDNA tests severely. When weused unique dual indices for sequencing data demultiplexing, lowfraction variation was suppressed (Figs. 1C and S3D) to a level thatwould not interfere with real maternal cfDNA mutations.It is worth noting that unique dual indexing does not prevent the

formation of chimeric amplicons. Instead, it filters out recombinantsequencing reads by recognizing non-compliant hybrid index combi-nations. The unique dual indexing results shown in Fig. 1C were basedon the same raw sequencing data as the multiplexed enrichment results.The DNA samples were separately PCR pre-amplificated followed bymultiplexed target enrichment (Fig. S2). With unique dual indexing, wedetected hybrid index combinations within and across multiplexed li-braries in raw sequencing data (Fig. S3E), and their levels agreed withlow fraction variations shown in Fig. 1C. Unique dual indexing thusallows multiplexing during early stages of sequencing library con-struction, greatly increasing the throughput of sample processing.

3.2. Unique dual indexing enabled accurate fetal fraction measurement

To test our ability to measure fetal fraction using deep sequencingwith unique dual indexing, we prepared control samples from genomicDNA of a family trio. Genomic DNA was sheered to ~180bp to mimiccfDNA. Mother and child DNA were mixed at different ratios to re-present fetal presence in maternal cfDNA.While pure DNA from each family member gave out a clean bimodal

distribution of minor allele fractions (Fig. 2A), mixture samples showed

additional low fraction variations (Fig. 2B). These variations representthe loci where mother is homozygous (AA), and fetus is heterozygous(AB). We could therefore determine the fetal genotype where mother ishomozygous, based on the presence (heterozygous fetal genotype AB)or absence (homozygous fetal genotype AA) of low fraction variations(Fig. 1D).The presence of fetal DNA also distorted the minor allele fraction

distribution near the 50% end (mother heterozygous genotype AB), asmarked by the yellow lines in Fig. 2B. The decreasing of minor allelefractions here is due to fetus being homozygous (AA). We couldtherefore determine the fetal genotype where mother is heterozygous(AB), based on the minor allele fraction being (homozygous fetal gen-otype AA) or not being (heterozygous fetal genotype AB) significantlydifferent from 50% (Fig. 1D).Based on the abovementioned analytical strategy, we measured the

fetal fraction based on the aggregated results of these loci where motheris homozygous (AA), and fetus is heterozygous (AB). The mean fractionof these fetal specific B alleles represents the half of the fetal fraction(Fig. 1D). We identified very low fraction alleles (143, 122 and 268fetal alleles in 6.6%, 13.3% and 27.4% mixtures respectively) that weredeemed fetal-specific and used their fractions to calculate the fetalfraction (Figs. 2B and S4). To evaluate our accuracy of measuring allelefraction, we measured the Y chromosome fractions in each sample withwhole genome sequencing data. The general agreement of two sets ofresults validated the accuracy of minor-allele-based fetal fraction esti-mations (Fig. 2C).

3.3. Fetal fraction measurement of clinical samples

We then set out to test the performance of unique dual indexing inthe measurement of allele fractions in clinical samples. We recruited 16families whose fetuses were at risk of TSC or Marfan syndrome due toparental mutation inheritance of de novomutation (Table S1). Using thesame strategy mentioned above, we measured the fetal fraction in thematernal cfDNA. As expected, fetal fraction generally increases withgestational age (Fig. 2D, linear regression r = 0.69). Most of thepregnant women, visiting the clinic during their second trimester, havefetal fractions ranging from 6% to 25% (Fig. 2D).

Fig. 2. Unique dual indexing enabled accurate fetal fraction measurement. (A) The distribution of minor allele fractions of pure maternal genomic DNA. (B) Thedistribution of minor allele fractions of 6.6% mixture of child and maternal genomic DNA. (C) General agreement of fetal fraction results between minor allelefraction analysis and Y chromosome analysis. (D) Correlation between gestational age and fetal fraction in clinical samples.

M. Du, et al. Talanta 217 (2020) 121035

4

Page 5: Unique dual indexing PCR reduces chimeric contamination and …accuquantbio.life.tsinghua.edu.cn/images/publications/... · 2020. 8. 19. · 2.2. Samplecollection Fromeachpregnantwomanandherfamilymembers,2–8mlper-ipheralbloodwascollectedintoEDTAanticoagulanttubes.Fromeach

3.4. Unique dual indexing is accurate enough to measure paternalinheritance

As mentioned above, NIPT accuracy is likely limited by indexhopping rather than sequencing errors. We therefore evaluated theNIPT performance with unique dual indexing. We first tested the se-quencing depth requirement using the same control sample data men-tioned above. Focusing on all SNPs where mother was AA and fatherwas AB, we down-sampled the control sample data to different depthsand calculated the likelihood ratio between two possible fetal geno-types. Fetal AB loci were called with likelihood ratio greater than 8 andcompared to true fetal genotype information. False positive and falsenegative rates were calculated. As shown in Figs. 3A and S5, false po-sitive rate was generally low regardless of sequencing depth. Lowcoverage produced lots of false negative calls, mainly due to under-sampling of low fraction variations. 500x sequencing depth suppressesfalse positive and false negative rates to an undetectable level, even forthe mixture sample with 6.6% fetal fraction (Fig. 3B). We then set out toanalyze 6 clinical cfDNA samples where the fetuses were at risk of in-heriting paternal TSC or Marfan syndrome mutations. All 6 tests, in-cluding 2 with low fetal fractions (Fig. 3C and D), had high statisticalpower, demonstrating the confidence of our test, and were confirmedusing amniotic fluid, umbilical cord or cord blood as true reference (Fig.S6).

3.5. Unique dual indexing is accurate enough to detect de novo mutations

De novo mutation detection presents higher challenge than in-heritance analysis, due to the lack of specific target and the necessary toscan a large region. Data from control samples (Fig. 3a) have shownthat false positive and false negative rates could be well controlled withdeep sequencing. Considering that de novo mutation detection requiresthe screening of a larger region, we performed further evaluation withdata from the families with paternal proband. We included FBN1, be-sides TSC1/2, to represent a larger region of interests of a clinicalsymptom. Using the fetal genotypes as ground truth, we calculated the

false positive and false negative rates of de novo pathogenic mutationdetection within these three genes (Fig. S7A). Considering the se-quencing coverage efficiency (Fig. S7B), we chose 1000x (rather than500x in the familial inheritance scenario) as the coverage threshold forclinical tests. In six pregnancies which were at risk of carrying de novoTSC mutations, we analyzed the maternal cfDNA data. Through mul-tiple steps of clinical annotation filtering, we eventually identified pa-thogenic TSC mutations in four out of six fetuses (Fig. 4A and Table S1).All four mutations were confirmed by Sanger sequencing of amnioticfluid or fetal tissue. For the two fetuses with no detectable TSC-asso-ciated pathogenic point mutations, test results of amniotic fluid werealso negative. The pregnancies were carried to term, and ultimatelydelivered two newborns with no further abnormalities found in post-natal echocardiography.

3.6. Unique dual indexing enabled measurement of maternal inheritance

When the mother is a carrier (genotype AB), the maternally origi-nated mutant allele would obscure the detection of mutant allele fromthe fetus. It generally requires much higher sequencing depth to dis-tinguish two possible fetal genotypes. We conducted theoretical ana-lysis to determine the sequencing depth requirement for different fetalfractions. As shown in Fig. 4B, higher fetal fraction and sequencingcoverage contribute to better separation of the two likelihood dis-tributions, which leads to higher sensitivity (shaded area under insertcurves). 1500x or even higher coverage is typically necessary to obtaindefinitive judgement. With this criterial, we successfully determinedthe maternal mutation inheritance status (Fig. 4C and D). All resultswere confirmed using amniotic fluid, umbilical cord or cord blood astrue reference (Fig. S6).

4. Conclusions

In summary, we proved that chimeric amplicon contamination isthe major source of error in maternal cfDNA analysis. With unique dualindexing, we greatly suppressed the level chimeric amplicon

Fig. 3. Unique dual indexing is accurate enough to measure paternal mutation inheritance. (A) Sequencing depth requirement for low fraction mutation detection.(B) Statistical results of individual SNPs. (C) and (D) Paternal mutation inheritance testing results for each clinical sample using a likelihood ratio classifier.

M. Du, et al. Talanta 217 (2020) 121035

5

Page 6: Unique dual indexing PCR reduces chimeric contamination and …accuquantbio.life.tsinghua.edu.cn/images/publications/... · 2020. 8. 19. · 2.2. Samplecollection Fromeachpregnantwomanandherfamilymembers,2–8mlper-ipheralbloodwascollectedintoEDTAanticoagulanttubes.Fromeach

contamination while maintaining the high throughput processingthrough multiplexing. This strategy is easier and more cost-efficientthan the use of UMI, demonstrating great potential in future clinicalanalysis.

CRediT authorship contribution statement

Meijie Du: Data curation, Formal analysis, Writing - original draft,Writing - review & editing. Yihua He: Conceptualization, Writing -review & editing, Funding acquisition. Jian Chen: Resources, Datacuration, Writing - review & editing. Hairui Sun: Data curation,Validation, Writing - review & editing. Yuwei Fu: Resources, Datacuration, Writing - review & editing. Jianbin Wang: Conceptualization,Formal analysis, Writing - original draft, Writing - review & editing,Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financialinterests or personal relationships that could have appeared to influ-ence the work reported in this paper.

Acknowledgements

The authors would like to thank the Peking University High-throughput Sequencing Center for experimental assistance. This workwas supported by Ministry of Science and Technology of China[2018YFC1002300, 2017YFC1308000 to Y.H.]; National NaturalScience Foundation of China [21675098, 21927802 to J.W.].

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.talanta.2020.121035.

References

[1] E.S. Wright, K.H. Vetsigian, Quality filtering of Illumina index reads mitigatessample cross-talk, BMC Genom. 17 (2016) 876.

[2] M. Costello, M. Fleharty, J. Abreu, Y. Farjoun, S. Ferriera, L. Holmes, B. Granger,

L. Green, T. Howd, T. Mason, G. Vicente, M. Dasilva, W. Brodeur, T. DeSmet,S. Dodge, N.J. Lennon, S. Gabriel, Characterization and remediation of sampleindex swaps by non-redundant dual indexing on massively parallel sequencingplatforms, BMC Genom. 19 (2018) 332.

[3] Q. Li, X. Zhao, W. Zhang, L. Wang, J. Wang, D. Xu, Z. Mei, Q. Liu, S. Du, Z. Li,X. Liang, X. Wang, H. Wei, P. Liu, J. Zou, H. Shen, A. Chen, S. Drmanac, J.S. Liu,L. Li, H. Jiang, Y. Zhang, J. Wang, H. Yang, X. Xu, R. Drmanac, Y. Jiang, Reliablemultiplex sequencing with rare index mis-assignment on DNB-based NGS platform,BMC Genom. 20 (2019) 215.

[4] V.D.T. Valk, F. Vezzi, M. Ormestad, L. Dalén, K. Guschanski, Index hopping on theIllumina HiseqX platform and its consequences for ancient DNA studies, Mol. Ecol.Resour. 8 (2019) 1–11, https://doi.org/10.1111/1755-0998.13009.

[5] A. Tabor, J. Philip, M. Madsen, J. Bang, E.B. Obel, B.N. Pedersen, Randomisedcontrolled trial of genetic amniocentesis in 4606 low-risk women, Lancet 327(1986) 1287–1293.

[6] C.W. Kong, T.N. Leung, T.Y. Leung, L.W. Chan, D. S Sahota, T.Y. Fung, T.K. Lau,Risk factors for procedure-related fetal losses after mid-trimester genetic amnio-centesis, Prenat. Diagn. 26 (2006) 925–930.

[7] Y.M. Lo, M.S. Tein, T.K. Lau, C.J. Haines, T.N. Leung, P.M. Poon, J.S. Wainscoat,P.J. Johnson, A.M. Chang, N.M. Hjelm, Quantitative analysis of fetal DNA in ma-ternal plasma and serum: implications for noninvasive prenatal diagnosis, Am. J.Hum. Genet. 62 (1998) 768–775.

[8] T.K. Lau, S.W. Cheung, P.S. Lo, A.N. Pursley, M.K. Chan, F. Jiang, H. Zhang,W. Wang, L.F. Jong, O.K. Yuen, H.Y. Chan, W.S. Chan, K.W. Choy, Non-invasiveprenatal testing for fetal chromosomal abnormalities by low-coverage whole-genome sequencing of maternal plasma DNA: review of 1982 consecutive cases in asingle center, Ultrasound Obstet. Gynecol. 43 (2014) 254–264.

[9] M.E. Norton, B. Jacobsson, G.K. Swamy, L.C. Laurent, A.C. Ranzini, H. Brar,M.W. Tomlinson, L. Pereira, J.L. Spitz, D. Hollemon, H. Cuckle, T.J. Musci,R.J. Wapner, Cell-free DNA analysis for noninvasive examination of trisomy, N.Engl. J. Med. 372 (2015) 1589–1597.

[10] R.M. McCullough, E.A. Almasri, X. Guan, J.A. Geis, S.C. Hicks, A.R. Mazloom, C.Deciu, P. Oeth, A.T. Bombard, B. Paxton, N. Dharajiya, J.S. Saldivar, Non-invasiveprenatal chromosomal aneuploidy testing–clinical experience: 100,000 clinicalsamples, PloS One 9, e109173. https://doi.org/10.1371/journal.pone.0109173.

[11] Y. You, Y. Sun, X. Li, Y. Li, X. Wei, F. Chen, H. Ge, Z. Lan, Q. Zhu, Y. Tang, S. Wang,Y. Gao, F. Jiang, J. Song, Q. Shi, X. Zhu, F. Mu, W. Dong, V. Gao, H. Jiang, X. Yi,W. Wang, Z. Gao, Integration of targeted sequencing and NIPT into clinical practicein a Chinese family with maple syrup urine disease, Genet. Med. 16 (2014)594–600.

[12] J. Zhang, J. Li, J.B. Saucier, Y. Feng, Y. Jiang, J. Sinson, A.K. McCombs,E.S. Schmitt, S. Peacock, S. Chen, H. Dai, X. Ge, G. Wang, C.A. Shaw, H. Mei,A. Breman, F. Xia, Y. Yang, A. Purgason, A. Pourpak, Z. Chen, X. Wang, Y. Wang,S. Kulkarni, K.W. Choy, R.J. Wapner, I.B.V. Van den, A. Beaudet, S. Parmar,L.J. Wong, C.M. Eng, Non-invasive prenatal sequencing for multiple Mendelianmonogenic disorders using circulating cell-free fetal DNA, Nat. Med. 25 (2019)439–447.

[13] Y. Luo, B. Jia, K. Yan, S. Liu, X. Song, M. Chen, F. Jin, Y. Du, J. Wang, Y. Hong, S.Cao, D. Li, M. Dong M, Pilot study of a novel multi-functional noninvasive prenataltest on fetus aneuploidy, copy number variation, and single-gene disorderscreening, Mol. Genet. Genomic. Med. 7, e00597. https://doi.org/10.1002/mgg3.597.

Fig. 4. Unique dual indexing is capable of detecting de novo mutation and maternal mutation inheritance. (A) De novo mutation analysis of a clinical sample. (B)Sequencing depth requirement for detecting maternal mutation inheritance. (C) and (D) Maternal mutation inheritance testing results for each clinical sample using alikelihood ratio classifier.

M. Du, et al. Talanta 217 (2020) 121035

6

Page 7: Unique dual indexing PCR reduces chimeric contamination and …accuquantbio.life.tsinghua.edu.cn/images/publications/... · 2020. 8. 19. · 2.2. Samplecollection Fromeachpregnantwomanandherfamilymembers,2–8mlper-ipheralbloodwascollectedintoEDTAanticoagulanttubes.Fromeach

[14] J.C. Soler, H. Lee, L. Hudgins, S.R. Hintz, Y.J. Blumenfeld, Y.Y.E. Sayed, S.R. Quake,Noninvasive prenatal diagnosis of single-gene disorders by use of droplet digitalPCR, Clin. Chem. 64 (2018) 336–345.

[15] M.Y. Chang, S. Ahn, M.Y. Kim, J.H. Han, H.R. Park, H.K. Seo, J. Yoon, S. Lee,D.Y. Oh, C. Kang, B.Y. Choi By, One-step noninvasive prenatal testing (NIPT) forautosomal recessive homozygous point mutations using digital PCR, Sci. Rep. 8(2018) 2877.

[16] Y. Yang, D. Zheng, C.Y. Wu, A. Lizaso, J.Y. Ye, S. Chuai, J. Ni, J.F. Xu, G.N. Jiang,Detecting ultralow frequency mutation in circulating cell-free DNA of early-stagenonsmall cell lung cancer patients with unique molecular identifiers, Small Methods3 (2019), https://doi.org/10.1002/smtd.201900206 1900206.e1-1900206.e8.

[17] H.C. Fan, W. Gu, J. Wang, Y.J. Blumenfeld, Y.Y.E. Sayed, S.R. Quake, Non-invasiveprenatal measurement of the fetal genome, Nature 487 (2012) 320–324.

[18] S. Rahul, S. Geoff, S.G. Gunsagar, E. Camille, J.T. Kyle, W. Eric, K.F.C. Charles,N.N. Ahmad, S. Tianying, M.M. Rachel, D.C. Stephanie, C. Hassan, R.H. Kristy,T.L. Michael, P.S. Michael, A.K. Mark, L.W. Irving, Index switching causes“spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNAsequencing, BioRxiv 19 (2017), https://doi.org/10.1101/125724.

[19] J.A. Griffiths, A.C. Richard, B. Karsten, A.T.L. Lun, J.C. Marioni, Detection andremoval of barcode swapping in single-cell RNA-seq data, Nat. Commun. 9 (2018)2667–2668.

[20] Y.M. Lo, K.C. Chan, H. Sun, E.Z. Chen, P. Jiang, F.M. Lun, Y.W. Zheng, T.Y. Leung,T.K. Lau, C.R. Cantor, R.W. Chiu, Maternal plasma DNA sequencing reveals thegenome-wide genetic and mutational profile of the fetus, Sci. Transl. Med. 2 (2010)61ra91, https://doi.org/10.1126/scitranslmed.3001720.

M. Du, et al. Talanta 217 (2020) 121035

7