tla based transgene sequencing analyses · paired-end sequencing analyses can be used to determine...

4
..... .............................................................................................................. ............. BLLC1 CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM TLA BASED TRANSGENE SEQUENCING ANALYSES INTRODUCTION Cergentis’ TLA Technology (Nature Biotech 2014 1 ) uniquely enables the efficient targeted complete Next Generation Sequencing (NGS) of transgenes and their integration sites. TLA is therefore a powerful tool to characterise transgenic cell-lines and animal models and select those in which the transgene has integrated in the appropriate manner and without unintended sequence changes. TLA based transgene sequencing analyses: - identify the integration site(s) of the transgene - detect structural changes surrounding the transgene integration site. - sequence the complete transgene and detect any single nucleotide variants as well as structural changes within the transgene. - provide information about the transgene copy number. TLA TECHNOLOGY The TLA Technology enables the targeted amplification and NGS of any locus of interest using just one primer pair complementary to a short sequence unique to the locus (Figure 1). 1 http://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html. In typical transgene sequencing analyses, primer sets complementary to short transgene-specific sequences are used. Such TLA analyses provide sequence information across the entire transgene sequence and across the locus/loci in the genome where the transgene has integrated (Figure 2). TARGETED SEQUENCING OF TARGETED KNOCK-OUTS, TARGETED INTEGRATION SITES ETC. As is sketched in Figure 3, TLA analyses can also be used to perform targeted sequencing of loci in which genetic alterations have been introduced using a targeted gene editing method (e.g. targeted knock-outs or -integrations using CRISPR/Cas9). TLA can thus be used to assess whether genetic alterations have been generated successfully. This approach can also be used to further characterise individual transgene integration sites and assess which variants (i.e. which Single Nucleotide Variants and transgene fusions) occur in which integration site. Figure 1: Overview of TLA-based amplification and sequencing of a locus of interest. TLA amplifications use one primer pair complementary to a short locus specific sequence. Generated NGS sequencing coverage (i.e. the number of NGS sequencing reads) is highest in immediate vicinity to the locus specific sequence and declines with greater physical distance from the locus specific sequence. Sequencing coverage PCR primers Locus Specific Sequence 50 - 100 kb 50 - 100 kb Locus Targeted site HOST GENOME HOST GENOME Sequenced locus Figure 3: TLA-based analyses of targeted genetic modifications. A TLA analysis with a primer pair in proximity to the targeted site, provides sequence information across this locus. TLA PRIMER PAIR Transgene Sequenced locus Figure 2: TLA-based transgene sequencing. Using one TLA primer pair complementary to a sequence unique to the transgene sequence, complete sequence information is generated across the transgene and its integration site(s). TLA PRIMER PAIR HOST GENOME HOST GENOME

Upload: others

Post on 24-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TLA BASED TRANSGENE SEQUENCING ANALYSES · Paired-end sequencing analyses can be used to determine which breakpoints occur in physical proximity to each other and therefore which

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

TLA BASED TRANSGENE SEQUENCING ANALYSES

INTRODUCTIONCergentis’ TLA Technology (Nature Biotech 20141) uniquely enables the e�cient targeted complete Next Generation Sequencing (NGS) of transgenes and their integration sites. TLA is therefore a powerful tool to characterise transgenic cell-lines and animal models and select those in which the transgene has integrated in the appropriate manner and without unintended sequence changes.

TLA based transgene sequencing analyses:

- identify the integration site(s) of the transgene

- detect structural changes surrounding the transgene integration site.

- sequence the complete transgene and detect any single nucleotide variants as well as structural changes within the transgene.

- provide information about the transgene copy number.

TLA TECHNOLOGYThe TLA Technology enables the targeted amplification and NGS of any locus of interest using just one primer pair complementary to a short sequence unique to the locus (Figure 1).

1 http://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html.

In typical transgene sequencing analyses, primer sets complementary to short transgene-specific sequences are used. Such TLA analyses provide sequence information across the entire transgene sequence and across the locus/loci in the genome where the transgene has integrated (Figure 2).

TARGETED SEQUENCING OF TARGETED KNOCK-OUTS, TARGETED INTEGRATION SITES ETC.

As is sketched in Figure 3, TLA analyses can also be used to perform targeted sequencing of loci in which genetic alterations have been introduced using a targeted gene editing method (e.g. targeted knock-outs or -integrations using CRISPR/Cas9). TLA can thus be used to assess whether genetic alterations have been generated successfully. This approach can also be used to further characterise individual transgene integration sites and assess which variants (i.e. which Single Nucleotide Variants and transgene fusions) occur in which integration site.

Figure 1: Overview of TLA-based amplification and sequencing of a locus of interest. TLA amplifications use one primer pair complementary to a short locus specific sequence. Generated NGS sequencing coverage (i.e. the number of NGS sequencing reads) is highest in immediate vicinity to the locus specific sequence and declines with greater physical distance from the locus specific sequence.

Sequencing coverage

PCR primers

Locus Specific Sequence

50 - 100 kb 50 - 100 kb

Locus

Targeted site

HOST GENOME HOST GENOME

Sequenced locus

Figure 3: TLA-based analyses of targeted genetic modifications. A TLA analysis with a primer pair in proximity to the targeted site, provides sequence information across this locus.

TLA PRIMER PAIR

Transgene

Sequenced locus

Figure 2: TLA-based transgene sequencing. Using one TLA primer pair complementary to a sequence unique to the transgene sequence, complete sequence information is generated across the transgene and its integration site(s).

TLA PRIMER PAIR

HOST GENOME HOST GENOME

Page 2: TLA BASED TRANSGENE SEQUENCING ANALYSES · Paired-end sequencing analyses can be used to determine which breakpoints occur in physical proximity to each other and therefore which

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

INTEGRATION SITESIntegration sites are detected by analysing the coverage profile across the host genome sequence and by identifying breakpoint sequences between the host genome and the transgene sequence.

Partial integrations can also be detected by performing multiple TLA amplifications with di�erent primer pairs specific for di�erent positions in the transgene sequence.

Identified breakpoint sequences are specified in tables (Figure 4).

In these tables: - Seq1 specifies the transgene sequence name.- Pos1 specifies the position in the transgene sequence at which the breakpoint occurs.- Ori1 specifies the orientation of the read: + indicates that the fusion read continues with increasing positions (downstream) across the transgene. - indicates the fusion read continues with decreasing positions (upstream) across the transgene.- Seq2 specifies the Chromosome in which the integration breakpoint occurs.- Pos2 specifies the position in the Chromosome at which the breakpoint occurs.- Ori2 specifies the orientation of the read (see above).

2 More detailed information about this approach can be found on www.cergentis.com and in our Nature Biotechnology publication: http://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html.

Frequently, structural changes occur as a result of an integration of the transgene. Such structural changes include deletions, as shown in Figure 4, as well as more complex rearrangements. Paired-end sequencing analyses can be used to determine which breakpoints occur in physical proximity to each other and therefore which breakpoints constitute the same integration site2.

TLA analyses with primers specific for identified integration sites can be performed on wild-type samples to further characterise any rearrangements resulting from transgene integrations.

HOST GENOME

Figure 4: An example of a mouse whole genome coverage plot and a zoom-in into coverage across the integration site on chromosome 19. The table specifies the breakpoint sequences resulting from the integration. As is apparent from the coverage profile as well as from the positions in chromo-some 19 where the breakpoints occur a deletion has happened as a result of the transgene integration. The integration in chromosome 13 has not resulted in a deletion.

HOST GENOMETransgene

seq1 pos1 ori1 seq2 pos2 ori2Transgene 200 + Chromosome 19 59765432 +Transgene 9000 - Chromosome 19 59864972 -Transgene 165 + Chromosome 13 41896736 -Transgene 9065 - Chromosome 13 41896737 +

INTEGRATION SITE

INTEGRATION SITE

Page 3: TLA BASED TRANSGENE SEQUENCING ANALYSES · Paired-end sequencing analyses can be used to determine which breakpoints occur in physical proximity to each other and therefore which

INTEGRATION SITESIntegration sites are detected by analysing the coverage profile across the host genome sequence and by identifying breakpoint sequences between the host genome and the transgene sequence.

Partial integrations can also be detected by performing multiple TLA amplifications with di�erent primer pairs specific for di�erent positions in the transgene sequence.

Identified breakpoint sequences are specified in tables (Figure 4).

In these tables: - Seq1 specifies the transgene sequence name.- Pos1 specifies the position in the transgene sequence at which the breakpoint occurs.- Ori1 specifies the orientation of the read: + indicates that the fusion read continues with increasing positions (downstream) across the transgene. - indicates the fusion read continues with decreasing positions (upstream) across the transgene.- Seq2 specifies the Chromosome in which the integration breakpoint occurs.- Pos2 specifies the position in the Chromosome at which the breakpoint occurs.- Ori2 specifies the orientation of the read (see above).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

TRANSGENE SEQUENCE: SINGLE NUCLEOTIDE VARIANTS AND INDELSIdentified Single Nucleotide Variants (SNVs) and InDels (insertions or deletions) are specified in tables (Figure 5).

In these tables: - Seq1 specifies the transgene sequence analysed.- Pos specifies the position at which the mutation is detected.- Ref specifies the reference sequence.- Alt specifies the identified mutation.- Hom specifies which two positions in the transgene have sequence homology to each other. The mutation likely occurs in one of the two positions rather than in both.- %SNV specifies the percentage of reads that contains the mutation.- Cov specifies the sequencing coverage on the position of the mutation.- Cov and %SNV are specified for data generated in two individual TLA amplifications of the same sample using di�erent transgene specific primer pairs.

SNVs and InDels are reported when they are found to occur with at least a 1% frequency in two independent TLA amplifications. The %SNV value provides a good estimation of the percentage of copies of the transgene in the cell line that contains the specified mutation3.

3 Three factors determine the sensitivity of NGS analyses:• Sequencing coverage; the detection of rare sequence variants requires su�cient sequencing coverage across the region in which these rare alleles are to be detected.• Sequencing errors; A low percentage of reads will contain sequencing errors. No absolute figure can be given as sequencing errors are context-dependent (see for instance http://genomebiology.com/2013/14/5/R51).• Mapping errors; the analysis of NGS data is based on the high-throughput mapping and processing of large number of sequencing reads. Errors in mapping can result in false positives. Potential false positives can be further analyzed with a more detailed inspection of generated sequences and the analysis of control samples.

Selection of mutations, identified with two transgene-specific primer sets, increases the reliability of their detection.

TRANSGENE-TRANSGENE FUSIONSOften transgenes concatamerise and multiple copies will integrate in one integration site. Frequently, such concatamerisation will include partial copies of the transgene that have fused in di�erent orientations.Depending on where these fusions occur, they can result in the expression of undesired aberrant proteins. Changes in (the number of) transgene fusions indicate genomic instability of the concatamer and integration site.

If concatamerisation has occurred, each copy will contribute to sequencing coverage. TLA thus provides comprehensive sequence information across the entire concatamer (Figure 6).

Frequently, structural changes occur as a result of an integration of the transgene. Such structural changes include deletions, as shown in Figure 4, as well as more complex rearrangements. Paired-end sequencing analyses can be used to determine which breakpoints occur in physical proximity to each other and therefore which breakpoints constitute the same integration site2.

TLA analyses with primers specific for identified integration sites can be performed on wild-type samples to further characterise any rearrangements resulting from transgene integrations.

seq1 pos ref alt hom cov %SNV cov %SNVTransgene 141 A C 610 30 464 28Transgene 489 A G 740 1 570 1Transgene 816 T G 5698 462 1 360 2Transgene 1013 T C 780 100 202 100Transgene 1304 A C 970 100 380 100Transgene 1305 G C 960 100 380 100Transgene 2956 T C 1220 1 490 2Transgene 3561 C A 897 100 200 100Transgene 4638 G A 972 100 482 99Transgene 5698 A C 986 1 195 2Transgene 8836 T G

816812 1 650 1

Transgene 9487 T C 1278 1 850 1Transgene 11037 A G 870 20 850 21

primer-set 1 primer-set 2

Figure 5: An example of a coverage profile across a transgene. In this case, a deletion has occurred in the transgene sequence between positons 8400 and 9100. Most sequences within the transgene have been sequenced with > 1000x coverage (i.e. with at least 1000 NGS reads). The table provides an example of SNVs identified in a transgene sequence.

position in T-DNA 0 2000 4000 6000 8000 10000 12000

1000

800

600

400

200

0

NGS

cove

rage

Del

etio

n

Page 4: TLA BASED TRANSGENE SEQUENCING ANALYSES · Paired-end sequencing analyses can be used to determine which breakpoints occur in physical proximity to each other and therefore which

TRANSGENE SEQUENCE: SINGLE NUCLEOTIDE VARIANTS AND INDELSIdentified Single Nucleotide Variants (SNVs) and InDels (insertions or deletions) are specified in tables (Figure 5).

In these tables: - Seq1 specifies the transgene sequence analysed.- Pos specifies the position at which the mutation is detected.- Ref specifies the reference sequence.- Alt specifies the identified mutation.- Hom specifies which two positions in the transgene have sequence homology to each other. The mutation likely occurs in one of the two positions rather than in both.- %SNV specifies the percentage of reads that contains the mutation.- Cov specifies the sequencing coverage on the position of the mutation.- Cov and %SNV are specified for data generated in two individual TLA amplifications of the same sample using di�erent transgene specific primer pairs.

SNVs and InDels are reported when they are found to occur with at least a 1% frequency in two independent TLA amplifications. The %SNV value provides a good estimation of the percentage of copies of the transgene in the cell line that contains the specified mutation3.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

TRANSGENE COPY NUMBER ANALYSESAn exact copy number cannot be determined using TLA. However, a good estimation can be made based on the number of integration sites, number of fusion reads and the ratio of the coverage on the transgene and its integration site.

CONCLUSIONTLA based transgene sequencing provides comprehensive information about the integrated transgene sequences and their integration site(s). The technology is thus highly suited for the characterisation and selection of transgenic cell-lines and animal models.

Transgene fusions are specified in tables (Figure 7).

In these tables:- Pos1 specifies the position in the transgene sequence at which the first breakpoint occurs.- Ori1 specifies the orientation of the read: + indicates that the fusion read continues with increasing positions across the transgene. - indicates the fusion read continues with decreasing positions across the transgene.- Pos2 specifies the position in the contig at which the second breakpoint occurs.- Ori2 specifies the orientation of the read (see above).

2000 2500 5000 5500Fusion 1:

Fusion 2:

Fusion 3:

Fusion 4:

Fusions pos1 ori1 pos2 ori2Fusion 1 2500 - 5000 +Fusion 2 2500 - 5500 -Fusion 3 2000 + 5000 +Fusion 4 2000 + 5500 -

0 10.000

Figure 7: A graphical depiction of di�erent fusions that can occur between two transgene sequences and how these di�erent fusion events are shown in the table.

Transgene -Transgene fusion

Transgene -Transgene fusion

Sequenced locus

Figure 6: TLA-based sequencing of a transgene concatamer. Since each locus- specific sequence will contribute to sequencing coverage, the entire concatamer will be sequenced. The positions of transgene-transgene fusions are highlighted with the red arrows. An example of a table specifying the position and orientation of breakpoints between di�erentcopies of a transgene is shown in Figure 6.

TLA PRIMER PAIRTLA PRIMER PAIRTLA PRIMER PAIR

TRANSGENE-TRANSGENE FUSIONSOften transgenes concatamerise and multiple copies will integrate in one integration site. Frequently, such concatamerisation will include partial copies of the transgene that have fused in di�erent orientations.Depending on where these fusions occur, they can result in the expression of undesired aberrant proteins. Changes in (the number of) transgene fusions indicate genomic instability of the concatamer and integration site.

If concatamerisation has occurred, each copy will contribute to sequencing coverage. TLA thus provides comprehensive sequence information across the entire concatamer (Figure 6).

HOST GENOMEHOST GENOME Transgene Transgene Transgene