supplementary materials for...2013/08/05 · peiyong guan, wen-hui weng, ee yan siew, yujing liu,...
TRANSCRIPT
www.sciencetranslationalmedicine.org/cgi/content/full/5/197/197ra101/DC1
Supplementary Materials for
Genome-Wide Mutational Signatures of Aristolochic Acid and Its Application as a Screening Tool
Song Ling Poon, See-Tong Pang,* John R. McPherson, Willie Yu, Kie Kyon Huang,
Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache, Dachuan Huang, Lian Dee Ler, Maarja-Liisa Nairismägi, Ming Hui Lee, Ying-Hsu Chang, Kai-Jie Yu,
Waraporn Chan-on, Bin-Kui Li, Yun-Fei Yuan, Chao-Nan Qian, Kwai-Fong Ng, Ching-Fang Wu, Cheng-Lung Hsu, Ralph M. Bunte, Michael R. Stratton, P. Andrew Futreal, Wing-Kin Sung, Cheng-Keng Chuang, Choon Kiat Ong, Steven G. Rozen,*
Patrick Tan,* Bin Tean Teh*
*Corresponding author. E-mail: [email protected] (B.T.T.); [email protected] (P.T.); [email protected] (S.G.R.); [email protected] (S.-T.P.)
Published 7 August 2013, Sci. Transl. Med. 5, 197ra101 (2013)
DOI: 10.1126/scitranslmed.3006086
This PDF file includes:
Materials and Methods Fig. S1. Frequency of mutations in carcinogen-induced cancers. Fig. S2. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the AA-UTUC whole genome. Fig. S3. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the exomes of nine AA-UTUCs. Fig. S4. Systematic up-regulation of NMD gene transcripts in AA-UTUC compared to adjacent normal tissue. Fig. S5. A heterozygous 3′ splice-site mutation results in skipping of RFC2 exon 10 in AA-UTUC. Fig. S6. Strong association between CAG>CTG mutations at 3′ splice sites and altered splicing. Fig. S7. Further details of the in vivo model of AA-induced damage. Fig. S8. Superimposed individual tumor data points for the total nonsynonymous single-nucleotide variants and each of the separate mutation types in AA-HCCs and non–AA-HCCs. Fig. S9. Nineteen HCCs exhibiting a “weak” AA mutational signature. Fig. S10. Schematic representation of 3′ splice-site CAGs. Table S1. Clinical characteristics of AA-UTUC patients analyzed by whole-genome and/or exome sequencing.
Table S2. Sequence analysis summary of whole genome–sequenced AA-UTUC (9T). Table S3. Breakdown of somatic mutations by genomic region. Table S4. Somatic nonsynonymous substitutions in protein-coding genes of the whole genome–sequenced AA-UTUC. Table S5. Somatic substitutions in unspliced transcript regions (transcribed, including introns and untranslated regions) of AA-UTUC (9T). Table S6. Somatic substitutions in the intergenic regions of AA-UTUC (9T). Table S7. Sequence analysis summary of nine exome-sequenced AA-UTUCs. Table S8. Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs. Table S9. The effect of +/− one base flanking the mutated adenine or thymidine on the number of unspliced transcript (transcribed region, including introns) mutations in AA-UTUC. Table S10. The effect of +/− one base flanking the mutated adenine or thymidine on the number of intergenic mutations in AA-UTUC. Table S11. The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced transcript (transcribed region, including introns) mutations in AA-UTUC. Table S12. The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic mutations in AA-UTUC. Table S13. The effect of +/− one base flanking the mutated TAG on the rates of unspliced transcript (transcribed regions, including intron) mutations in AA-UTUC. Table S14. The effect of +/− one base flanking the mutated CAG on the rates of unspliced transcript (transcribed regions, including intron) mutations in AA-UTUC. Table S15. Hypergeometric analysis for enrichment of CAG splice-site mutations in AA-UTUCs, AA-treated HK2 clones, and non–AA-associated cancers. Table S16. RPKM gene expression values for 15 NMD pathway genes in the AA-UTUC and matched normal tissue. Table S17. Identities of 3′ splice sites with CAG>CTG mutations and RPKM > 2. Table S18. 3′ splice sites without CAG>CTG mutations for evaluating the proportion of unmutated sites associated with aberrant splicing. Table S19. Sequence analysis summary of two exome-sequenced AA-treated HK2 clones. Table S20. Somatic nonsynonymous substitutions in protein-coding genes of AA-treated HK2 clones. Table S21. Comparison of mutation rates in AA-UTUC, carcinogen-induced cancers, mismatch repair–defective colorectal cancers, and POLE/POLD1 mutated colorectal cancers. Table S22. Primer sequences.
Other Supplementary Material for this manuscript includes the following: (available at www.sciencetranslationalmedicine.org/cgi/content/full/5/197/197ra101/DC1)
Tables S1 to S22 (Microsoft Excel format)
1
Materials and Methods Study Design
This was an observational, hypothesis-generating study based on archival tissue samples. The
study did not involve treatments for UTUC or HCC and did not consider prognosis or clinical
endpoints. The rationale for this study was the hypothesis that examination of genome and
exome sequences of UTUC tumors from patients with likely AA exposure, compared to matched
non-malignant tissues, would reveal new information about the mechanisms by which AA
induces tumorigenesis. The finding that some HCCs also show strong evidence of AA exposure
was serendipitous. Most HCC sequence data was from a previously published study (31), as
specified in the Main Text. After we detected likely AA signatures in the published HCC data,
we examined five additional HCC tumors sequenced by our group and found one that was likely
to have been exposed to AA.
The purpose of the mouse studies was to confirm that the compounds used for the in vitro
(cell line) experiments were indeed nephrotoxic. We planned to assess nephrotoxicity at three
time points, and therefore included 3 controls in the study. We planned to examine
nephrotoxicity in 4 mice after 10 days’ exposure, reasoning that there would be little change in
physiological status of the kidney. Based on previous literature (9), we expected clear
nephrotoxicity after 30 days’ exposure, and we planned to examine 10 mice to confirm this.
Again, based on previous literature, we expected severe nephrotoxicity after 90 days’ exposure
and planned to examine 6 mice to confirm this.
2
Whole genome sequencing
We used Illumina TruSeq DNA Sample Prep Kit (Illumina Inc.) to prepare DNA for whole
genome shotgun (WGS) libraries. In brief, 1 µg of DNA from each sample was sheared using the
Covaris E210 instrument (Covaris) to a range of 100-700 base pairs. The resulting DNA
fragments were end-repaired, phosphorylated, and adenylated at the 3’ ends. Standard paired-end
adaptors were ligated to both ends, and fragments were subjected to gel electrophoresis (2%
agarose, 120 volts, 1 hour) and size selected by gel excision of the bands (400-500 bp). The
selected fragments were purified with MinElute Gel Extraction Kit (Qiagen Inc.) follow by
enrichment with PCR amplification (10 cycles) as for Illumina protocol. This produced a WGS
library for each sample, with inserts averaging 300 bp to 400 bp. AMPure XP Beads (Illumina)
were used for the PCR clean-up. WGS libraries were sequenced on an Illumina HiSeq 2000 as
paired-end 76-base pair reads, resulting in an average haploid coverage of 33X for the tumor
genome and 33X for the normal genome.
Exome sequencing
Genomic DNA from the AA-UTUC and the adjacent normal tissues from nine patients (3 µg per
sample) were used to prepare fragment libraries suitable for massively parallel paired-end
sequencing as previously described (1, 2). The coding sequences were enriched by the SureSelect
Human All Exon kit v3 and sequenced using the Illumina Genome Analyzer IIx using 76-bp
paired-end reads.
3
Bioinformatic analysis of genome and exome
We used the Burrows-Wheeler Aligner (BWA, http://bio-bwa.sourceforge.net/) to align the
sequence reads to the human reference genome NCBI GRC Build 37 (hg19) and employed
SAMtools (http://samtools.sourceforge.net/) to remove PCR duplicates. In order to detect single-
nucleotide variants (SNVs), we used a discovery pipeline based on the Genome Analyzer Toolkit
(GATK). Our pipeline first recalibrated base qualities and realigned the sequence reads around
micro-indels. Next, we employed the GATK Unified Genotyper, which performs consensus
calling, in order to identify SNVs. Only well-mapped reads (mapping quality ≥30 and number of
mismatches within a 40-bp window ≤3) were used as input for the genotyper. We retained SNVs
that passed additional quality filters (quality by depth ≥3, variant depth ≥10 and normal depth ≥5)
and discarded any SNV close to a micro-indel or to several other SNVs. The quality-by-depth
score is GATK's consensus quality score for a variant divided by the (unfiltered) read depth at
that position (http://www.broadinstitute.org/gsa/wiki/index.php/main_page). The consensus
quality score increases when there is more evidence for the existence of a variant at that position.
If quality by depth is low, the inference is that evidence for the existence of a variant is weak in
proportion to the number of reads available. The variant depth refers to the number of variant
reads in the tumor sample, and the normal depth refers to the number of reads in the same
position in the corresponding normal sample. We compared our variants against the common
polymorphisms present in dbSNPv135 (http://www.ncbi.nlm.nih.gov/projects/SNP/) and in the
1000 Genomes Project databases (http://www.1000genomes.org/), in order to discard any
common SNPs. Several cancer somatic mutations are also present in dbSNP, and we retained any
common variants also found to be present in COSMIC v52. All variants retained after this step
were considered to be newly identified here. Whenever possible, we used the Consensus CDS
4
(CCDS) gene annotation database to determine amino acid changes. In the few instances where
CCDS annotations were not available for SureSelect capture targets, we used RefSeq annotations,
or, if those also were not available, Ensembl or UCSC annotations were used. For exome
sequencing data, only SNVs in exons or in canonical splice sites were further analyzed. Amino
acid changes corresponding to SNVs were annotated according to the largest transcript of the
gene. In order to determine somatic variants for analysis, we compared the lists of newly
identified non-synonymous SNVs detected in the tumor with those found in the corresponding
normal sample and retained the ones that were present only in the tumor. All such somatic SNVs
were submitted to PolyPhen (http://genetics.bwh.harvard.edu/pph2/indext.shtml) for functional
prediction. Sanger capillary sequencing was used to validate selected mutations predicted by
deep sequencing.
Sanger sequencing validation
Sanger sequencing primers were designed using Primer3 software (http://frodo.wi.mit.edu).
Purified PCR products were sequenced in the forward and reverse directions using ABI PRISM
BigDye Terminator Cycle Sequencing Ready Reaction kits (Version 3) and an ABI PRISM 3730
Genetic Analyzer.
Given the high mutation rates observed in the AA-UTUC, it is impractical to validate all
candidate mutations. Thus, for exome sequencing data, we randomly selected at least 100
candidate somatic coding mutations from each tumor for validation. Thirty mutations could not
be tested due to PCR failure, and of the remaining 970 mutations, 946 were confirmed by Sanger
sequencing (97.5%). For whole genome sequencing, we randomly selected 50 coding mutations,
5
100 intragenic mutations and 100 intergenic mutations for validation. ~98% of the WGS
predicted somatic substitutions were confirmed as genuine somatic mutations.
Splice site mutation enrichment analysis
We used a hypergeometric test to determine the probability of observing > X somatic CAG
mutations at splice sites from among a total of N somatic CAG mutations. To determine the
number of assayed CAGs at splice sites and the total number of assayed CAGs at splice and non-
splice sites, we used the CCDS entries corresponding to the Agilent SureSelect v3 all-exon
capture kit to generate a list of genes to be considered. We used the CCDS (Consensus CDS)
database (http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) to extract the corresponding
ENSEMBL (http://www.ensembl.org/index.html) transcript annotations, including the following
information: genomic start and stop coordinates for 3’ UTR, 5’UTR, and exons. After removal of
3’ and 5’ UTRs, a CAG was considered to be at a 3’ splice site according to fig. S10.
In a total of 16,546 genes, the total number of CAGs at splice sites is 93,292, while the
total number of CAGs at non-splice sites is 776,280. Given these background counts of splice-
site and non-splice-site CAGs in the captured exome, for each tumor we used the phyper()
function in the R statistical package (http://www.r-project.org/) to calculate the probability of
observing greater than X somatic CAG splice site mutations in a total of N somatic CAG
mutations.
RNA sequencing and analysis
Libraries for sequencing were prepared using the Illumina Tru-Seq RNA Sample Preparation v2,
according to the manufacturer’s instructions. Briefly, poly-A RNA was recovered from 1 µg of
6
total RNA using poly-T oligo attached magnetic beads. The recovered poly-A RNA was then
chemically fragmented and converted to cDNA using SuperScript II and random primers. The
second strand was synthesized using the Second Strand Master Mix provided in the kit, followed
by purification with AMPure XP beads. The ends of the cDNA were repaired using 3’ to 5’
exonuclease. A single adenosine was added to the 3′ end, and the adaptors were attached to the
ends of the cDNA using T4 DNA ligase. The fragments with adapters ligated onto both ends
were enriched by PCR. Libraries were validated with an Agilent Bioanalyzer (Agilent
Technologies). Libraries were diluted to 11 pM and applied to an Illumina flow cell using the
Illumina Cluster Station. Sequencing was performed on an Illumina High Seq2000 sequencer
with the paired-end 76-bp read option, according to the manufacturer’s instructions.
For the analysis of the RNA sequencing data, we converted Illumina-format “bcl” (base
call) files to fastq files using Illumina’s CASAVA 1.8 software (http://
support.illumina.com/sequencing/sequencing/sequencing_software/casava.ilmn). We used
Tophat 1.2 (http://tophat.cbcb.umd.edu/) to map the reads to hg19 and the ENSEMBL60 gene
annotation. We visualized the exons surrounding splice site mutations using the UCSC genome
browser (http://genome.ucsc.edu/). Mapped reads were analyzed via RNA-SeQC
(https://confluence.broadinstitute.org/display/CGATools/RNA-SeQC) for quality control and
exon level quantification. We used Cufflinks 1.3 (http://cufflinks.cbcb.umd.edu/) for whole-
transcript quantification of NMD pathway genes.
RT-qPCR
Total purified RNA was reverse-transcribed to cDNA using iScript™ cDNA Synthesis Kit (Bio-
Rad) following the manufacturer’s instructions. Real-time PCR was performed with SsoFast
7
EvaGreen Supermix according to the manufacturer’s recommendations using a CFX96™ Real-
Time PCR Detection System (Bio-Rad), and the primers for UPF1, UPF2, UPF3A and MAGOH
used in this study were designed accordingly (table S22). All RT-qPCR experiments were run in
triplicate, and a mean value was used for the determination of mRNA levels. Negative controls
containing water instead of sample cDNAs were used in each experiment. Relative quantification
of the mRNA levels of UPF1, UPF2, UPF3A and MAGOH was performed using the
comparative Cq method with GAPDH as the reference gene and with the formula 2-∆∆Cq.
Human proximal tubule epithelial cells (HK2) exposed to AA
Human immortalized proximal tubule epithelial cells (HK2) were obtained from the American
Type Culture Collection and grown in 10% Keratinocyte medium supplemented with bovine
pituitary extract and human epidermal growth factor (Life Technologies) at 37º C in a humidified
atmosphere of 5% CO2. After reaching 90% confluence, the cells were sub-cultured using a
trypsin/EDTA solution (0.05% trypsin, 0.5 mM EDTA). HK2 cells were treated with aristolochic
acid-I (Sigma-Aldrich Co., Cat# A9451) at a subtoxic dose (10 µM) for six months. Two
individual clones from these treated cells were then randomly selected and subjected to exome
analysis as described above.
Induction of AA-associated nephropathy in C57/B6 mice
A total of 23 mice were used in this experiment. 20 mice were gavaged with AA at a dose of 50
mg/kg of body weight in phosphate-buffered saline (PBS) for three days while 3 non-treated
mice were used as controls at each indicated time point. The mice were killed under diethyl ether
anesthesia, on days 10 (4 mice), 30 (10 mice), and 90 (6 mice) after AA treatment. The AA-
8
treated and non-treated mouse kidneys were fixed with 4% formaldehyde and sectioned at 4 µm
throughout the kidney. To evaluate glomerulonephritis and tubulointerstitial nephritis in mouse
kidneys, one section in every 20 sequential sections was selected for hematoxylin and eosin
staining. The procedures for the present study were approved by the Animal Committee under
Singapore Health Institute, and all animals were treated according to the guidelines for animal
experimentation of Singapore Health Institute (IACUC protocol #2012/SHS/773).
Statistical analysis Statistical analysis was carried out in the R statistical programming environment (http://r-
project.org). Enrichment for splice-site mutations (Fig. 3A, table S15) was determined by
hypergeometric tests (R function phyper). Up-regulation of NMD genes (Fig. 3B) was analyzed
by one-sided Wilcoxon rank-sum tests (function wilcox.test). Up-regulation of 13 out of 15
NMD genes (fig. S4 and table S16) was analyzed by a two-sided Wilcoxon signed-rank test
(function wilcox.test). fig. S6 used a two-sided Fisher’s exact test (function fisher.test).
9
Supplementary Figures
Supplementary Figure 1. Frequency of mutations in carcinogen-induced cancers.
10
Supplementary Figure 2. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the AA-UTUC whole genome.
11
Supplementary Figure 3. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the exomes of nine AA-UTUCs.
12
Supplementary Figure 4. Systematic up-regulation of NMD gene transcripts in AA-UTUC
compared to adjacent normal tissue. The same 13 genes reported to be up-regulated in the
activation of NMD in myelodysplasia (26) were also up-regulated in AA-UTUC tumor. P-value
by Wilcoxon signed-rank test.
13
Supplementary Figure 5. A heterozygous 3′ splice-site mutation resulting in skipping of RFC2
exon 10 in AA-UTUC. Bridging reads, confirming the exon skipping, are shown.
14
Supplementary Figure 6. Strong association between CAG>CTG mutations at 3′ splice-sites
and altered splicing. (A) We looked for altered splicing in tumor transcripts at the locations of 15
CAG > CTG mutations at 3' splice-sites preceding internal exons. These were mutations in the
15 genes with adequate coverage to assess splicing alteration (Whole gene RPKM by
Cufflinks >2), table S17. (B) As a control data set, we searched for altered splicing near 29
unmutated 3’ splice sites with similar RPKMs (table S18). (C) We tested for enrichment of
altered splicing at the sites of the 15 CAG > CTG mutations.
15
Supplementary Figure 7. Further details of the in vivo model of AA-induced damage. (A,B) Non-treated and AA-treated mouse kidneys. (C,D) Non-treated vs. AA-treated mouse kidney, H&E staining, 100X, scale bar = 100 µm and 200X, scale bar = 50 µm. At day 10, there was a dramatic accumulation of proteinic fluid within the tubules of the AA-treated kidney (*). At days 30 and 90, the tubular epithelial cells were necrotic and had collapsed, leaving the tubular basement membrane naked (#). There were few signs of regeneration. The glomeruli demonstrated ischemic shrinkage (arrowheads). At day 90, local inflammatory infiltration (arrows) was evident in the interstitium. In addition, urothelial dysplasia developed.
16
Supplementary Figure 8. Superimposed individual tumor data points for (A) total
nonsynonymous single-nucleotide variants and (B) each of the separate mutation types in AA-
HCCs and non–AA-HCCs.
17
Supplementary Figure 9. Nineteen HCCs exhibiting a “weak” AA mutational signature. (A) Numbers of mutations in each of six possible mutation classes. (B) The sequence contexts for A > T mutations in the HCC exomes. Mutation rates are expressed as fractions of the counts of the given triplet. For example, the rate of C[A>T]G mutations is the rate per million CAG triplets. (C) Strand bias: there are about twice as many A > T mutations on the sense (s; non-transcribed) strand as on the antisense (a; transcribed) strand.
18
Supplementary Figure 10. Schematic representation of 3′ splice-site CAG.
19
Supplementary Table 1
Clinical characteristics of AA-UTUC patients analyzed by whole-genome and/or exome sequencing
Sample
Age at
diagnosis Gender Characteristic Grade
Herb
intake
Renal function before
operation Surgery time*
Exome
sequenced
Whole
genome
sequenced MSI
3T 60 Female papillary Low Yes Normal 2009/03/18 Yes No MSS
6T 56 Female papillary Low Yes Poor (Cr 2.41 mg/dl) 2008/09/03 Yes No MSS
9T 43 Female infiltrating High Yes Poor (Cr 4.2 mg/dl) 2007/08/15 Yes Yes MSS
10T 67 Female infiltrating High Yes Normal 2007/04/10 Yes No MSS
13T 64 Female papillary Low Yes Normal 2008/05/20 Yes No MSS
20T 71 Female papillary Low Yes Normal 2008/06/04 Yes No MSS
79T 70 Female papillary Low Yes Normal 2009/08/14 Yes No MSS
80T 78 Female infiltrating High Yes ESRD 2009/09/21 Yes No MSS
100T 72 Female infiltrating High Yes Poor (Cr 3.7 mg/dl) 2010/06/21 Yes No MSS
Note: Cr unit: creatinine ratio ESRD: End stage renal disease *Surgery time is when tissue was collected MSS: Microsatellite stable
20
Supplementary Table 2
Sequence analysis summary of whole genome–sequenced AA-UTUC (9T)
Ave. Depth
Per Targeted
Base
Targeted Bases
with Depth at
Least 1X
Targeted Bases
with Depth at
Least 20X
9T Normal 33 96.9% 81.3%
Tumor 33 96.8% 85.2%
21
Supplementary Table 3
Breakdown of somatic mutations by genomic region
Category Class Total
Mutation Rates
(per Mb) Region Size
Base Substitutions Intragenic (transcribed) 201,192 130.7 1,538,519,851
Coding region 2858 75.44 37,883,392
Intron, 5' UTR, 3' UTR 198,334 132.17 1,500,636,459
Intergenic Region 237,680 174.9 1,358,790,611
Total Mutations 438,872 151.4 2,897,310,462
22
Supplementary Table 4
Somatic nonsynonymous substitutions in protein-coding genes of the whole genome–sequenced AA-
UTUC
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
AP3B2 9T g.chr15: 83331919 A>T L836Q Missense
ATP2B3 9T g.chrX: 152818503 A>G I612V Missense
BCAP31 9T g.chrX: 152966174 T>A H165L Missense
BTBD7 9T g.chr14: 93730138 T>A Y455F Missense
CIT 9T g.chr12: 120241041 T>A S422C Missense
CUL9 9T g.chr6: 43189421 A>T - Splice site
DNAH6 9T g.chr2: 84954839 A>T Q3340L Missense
DNAH6 9T g.chr2: 84936672 T>A - Splice site
FBXO9 9T g.chr6: 52938277 A>T - Splice site
GOLGA6L2 9T g.chr15: 23688976 T>A E180V Missense
HDAC6 9T g.chrX: 48673239 A>T - Splice site
IKBKAP 9T g.chr9: 111653686 T>A - Splice site
KCNA6 9T g.chr12: 4919578 A>T Y124F Missense
KCNH6 9T g.chr17: 61619736 T>A Y697N Missense
KCNQ1 9T g.chr11: 2591856 A>T - Splice site
KDM6A 9T g.chrX: 44928975 A>T Q692L Missense
KRT18P29 9T g.chr2: 182826925 T>A Y94F Missense
KRT37 9T g.chr17: 39577829 A>T L344Q Missense
LRRC1 9T g.chr6: 53764543 A>T - Splice site
MED24 9T g.chr17: 38179458 T>A S726C Missense
MERTK 9T g.chr2: 112767579 C>T P672L Missense
MLL 9T g.chr11: 118371808 A>T T2086P Missense
MSH2 9T g.chr2: 47635644 A>T R106X Nonsense
MYO7B 9T g.chr2: 128341759 T>A V469E Missense
MYO7B 9T g.chr2: 128341759 T>A V469E Missense
NCAPD2 9T g.chr12: 6627094 A>T I520F Missense
NCRNA00219 9T g.chr5: 111497846 A>T - Splice site
PIK3C2A 9T g.chr11: 17190802 T>A S163C Missense
PPIP5K2 9T g.chr5: 102482229 A>T - Splice site
PRPF4B 9T g.chr6: 4041004 A>T - Splice site
PSD4 9T g.chr2: 113958889 T>A L1023Q Missense
RFX4 9T g.chr12: 107113771 A>T Q391L Missense
RP11-141O11.2 9T g.chr5: 68265748 T>C - Splice site
RP11-569O4.6 9T g.chr13: 21523056 T>A H80L Missense
23
Supplementary Table 4 continued
Somatic nonsynonymous substitutions in protein-coding genes of the whole genome–sequenced AA-
UTUC
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
RP1-238O23.5 9T g.chr6: 41100612 A>T - Splice site
RPS4XP21 9T g.chr19: 34583858 A>T Y131N Missense
SEC14L1 9T g.chr17: 75202813 A>T S449C Missense
SLC4A1 9T g.chr17: 42330725 T>A K691I Missense
SOLH 9T g.chr16: 599377 A>T Y583F Missense
SRPK3 9T g.chrX: 153049776 T>A V392E Missense
TBC1D21 9T g.chr15: 74180019 T>A L279Q Missense
TECPR2 9T g.chr14: 102916105 A>T Q1072L Missense
TP53 9T g.chr17: 7578556 T>A - Splice site
TP53 9T g.chr17: 7579329 T>A K120X Nonsense
TRPM4 9T g.chr19: 49703872 A>T K928M Missense
U52112.12 9T g.chrX: 153154005 T>A M149K Missense
USP20 9T g.chr9: 132637837 A>T - Splice site
WDR6 9T g.chr3: 49051910 C>T H951Y Missense
WDR87 9T g.chr19: 38383525 A>T C901S Missense
ZBTB44 9T g.chr11: 130131321 T>A N150Y Missense
24
Supplementary Table 5
Somatic substitutions in unspliced transcript regions (transcribed, including introns and untranslated
regions) of AA-UTUC (9T)
Gene symbol Sample
ID Nucleotide (Genomic)
SLC9A6 9T g.chrX: 135103323 A>T
HIP1 9T g.chr7: 75275455 T>A
EFCAB2 9T g.chr1: 245226587 A>T
uc002lcx.2 9T g.chr18: 45111034 A>T
IL1RAPL2 9T g.chrX: 103935618 T>A
CHSY3 9T g.chr5: 129280152 T>A
HS3ST4 9T g.chr16: 25911747 T>A
C8orf46 9T g.chr8: 67427820 T>A
HECW2 9T g.chr2: 197274587 T>A
IDE 9T g.chr10: 94264889 T>A
TRPM3 9T g.chr9: 73666998 A>T
NP_005699 9T g.chr13: 94463217 A>T
RP11-445O3.2 9T g.chr5: 4730066 T>A
SMOC2 9T g.chr6: 168893650 A>T
LINGO2 9T g.chr9: 28177909 T>A
RBMS3 9T g.chr3: 28979525 G>A
B4DJW3 9T g.chr10: 89901054 T>A
Q59GJ1 9T g.chr9: 87545086 A>T
NBEA 9T g.chr13: 35733188 A>T
NRXN3 9T g.chr14: 79573597 T>A
SORCS2 9T g.chr4: 7287997 A>T
MBOAT1 9T g.chr6: 20144580 T>A
RP3-410B11.1 9T g.chrX: 18066741 T>A
REEP1 9T g.chr2: 86552956 A>T
TNIK 9T g.chr3: 170958372 G>T
NDFIP2 9T g.chr13: 80093913 A>T
GCLC 9T g.chr6: 53401065 T>A
RNASEN 9T g.chr5: 31414219 T>A
ABCC9 9T g.chr12: 22068897 T>A
FSTL4 9T g.chr5: 132584785 A>T
ETV2 9T g.chr19: 36133111 T>A
CDH18 9T g.chr5: 19742210 T>A
NP_001139395 9T g.chr15: 76688922 T>A
AC011294.3 9T g.chr7: 46766363 T>A
25
Supplementary Table 5 continued
Somatic substitutions in unspliced transcript regions (transcribed, including introns and untranslated
regions) of AA-UTUC (9T)
Gene symbol Sample
ID Nucleotide (Genomic)
AL355916.1 9T g.chr14: 62026870 A>T
NPAS3 9T g.chr14: 33915975 A>T
ENOX2 9T g.chrX: 129871217 T>A
RAP1GDS1 9T g.chr4: 99358642 A>T
ACO2 9T g.chr22: 41885492 A>T
CADM2 9T g.chr3: 85287890 T>A
AC133680.1 9T g.chr3: 24902748 A>G
RALGPS2 9T g.chr1: 178850123 T>A
RP11-400D2.3 9T g.chr4: 135345443 T>A
uc001pkv.1 9T g.chr11: 110207442 A>T
Q67FW5 9T g.chr17: 80989752 T>A
FANCC 9T g.chr9: 98058513 T>A
PDE8B 9T g.chr5: 76602773 T>A
SLC25A36 9T g.chr3: 140694351 A>T
ABCA4 9T g.chr1: 94521077 T>A
ZNF473 9T g.chr19: 50538507 A>T
GABRB1 9T g.chr4: 47204188 G>T
MSH3 9T g.chr5: 79969597 A>G
RP11-20B7.1 9T g.chr3: 73914159 T>A
NP_443068 9T g.chr10: 73178045 T>A
NEK11 9T g.chr3: 130918963 A>T
uc003xyc.1 9T g.chr8: 69905012 A>T
MITF 9T g.chr3: 69815171 A>T
RP11-978I15.10 9T g.chr1: 247774573 A>T
C9orf171 9T g.chr9: 135308436 A>T
Q8WWA6 9T g.chr7: 111940387 A>T
RP11-779P15.2 9T g.chr3: 99226142 A>T
NCRNA00210 9T g.chr1: 218093698 A>C
RPL14 9T g.chr3: 40500832 A>T
uc010kgo.2 9T g.chr6: 135900536 C>T
RP11-454C18.2 9T g.chr3: 151613997 T>A
C1orf57 9T g.chr1: 233093436 A>T
Q8NB90-3 9T g.chr4: 123885255 A>T
MOBKL2B 9T g.chr9: 27475826 A>T
26
Supplementary Table 5 continued
Somatic substitutions in unspliced transcript regions (transcribed, including introns and untranslated
regions) of AA-UTUC (9T)
Gene symbol Sample
ID Nucleotide (Genomic)
RBM6 9T g.chr3: 50031686 A>T
CCNB3 9T g.chrX: 50089427 A>T
WDR64 9T g.chr1: 241962096 T>C
HECW1 9T g.chr7: 43423583 T>T
CTD-2245E15.3 9T g.chr5: 1546591 A>G
GREB1L 9T g.chr18: 18890938 T>A
KIAA0802 9T g.chr18: 8771099 A>T
PIK3C2G 9T g.chr12: 18496516 A>G
DGKB 9T g.chr7: 14976051 A>T
CTD-2340E1.3 9T g.chr5: 101977310 T>A
NELL1 9T g.chr11: 21236047 A>T
HS6ST3 9T g.chr13: 97412407 T>A
Q0VG04 9T g.chr18: 48134672 T>A
TMEM64 9T g.chr8: 91650736 T>A
SETD3 9T g.chr14: 99941572 T>A
KCNJ6 9T g.chr21: 39141463 T>C
PALM2 9T g.chr9: 112544722 A>T
CDH18 9T g.chr5: 20465415 A>T
ZNF438 9T g.chr10: 31162692 A>T
AC068987.1 9T g.chr12: 52744781 A>T
27
Supplementary Table 6
Somatic substitutions in the intergenic regions of AA-UTUC (9T)
Gene symbol Sample
ID Nucleotide (Genomic)
Intra_1 9T g.chr10: 65943421 A>T
Intra_2 9T g.chr18: 12394557 A>T
Intra_3 9T g.chrX: 132126132 T>C
Intra_4 9T g.chr6: 164071634 A>T
Intra_5 9T g.chr3: 104590294 T>A
Intra_6 9T g.chr4: 60329949 T>A
Intra_7 9T g.chr14: 45940707 A>T
Intra_8 9T g.chr13: 63012417 T>A
Intra_9 9T g.chr11: 59241667 A>T
Intra_10 9T g.chr9: 29710104 A>G
Intra_11 9T g.chrX: 76611475 A>T
Intra_12 9T g.chr14: 62637522 T>A
Intra_13 9T g.chrX: 42634291 A>T
Intra_14 9T g.chr4: 185814004 A>T
Intra_15 9T g.chr4: 182088275 A>T
Intra_16 9T g.chr12: 41534083 A>T
Intra_17 9T g.chr15: 79596062 T>A
Intra_18 9T g.chr14: 103730618 T>A
Intra_19 9T g.chr14: 96738041 T>A
Intra_20 9T g.chr13: 96729807 T>A
Intra_21 9T g.chr7: 67921420 T>A
Intra_22 9T g.chr12: 11544214 A>T
Intra_23 9T g.chr2: 79295864 T>A
Intra_24 9T g.chr3: 137440666 T>A
Intra_25 9T g.chr4: 61236737 A>C
Intra_26 9T g.chr11: 25686709 T>A
Intra_27 9T g.chr9: 11174306 A>T
Intra_28 9T g.chr4: 60814990 C>A
Intra_29 9T g.chr7: 9431510 T>A
Intra_30 9T g.chr1: 159712935 T>A
Intra_31 9T g.chr2: 138590367 A>T
Intra_32 9T g.chr5: 110294140 A>T
Intra_33 9T g.chrX: 15910211 A>T
Intra_34 9T g.chr4: 185891404 A>T
28
Supplementary Table 6 continued
Somatic substitutions in the intergenic regions of AA-UTUC (9T)
Gene symbol Sample
ID Nucleotide (Genomic)
Intra_35 9T g.chr3: 11946867 T>A
Intra_36 9T g.chr6: 155694030 A>T
Intra_37 9T g.chr1: 80454030 T>G
Intra_38 9T g.chr9: 28756586 G>A
Intra_39 9T g.chr1: 51639186 T>A
Intra_40 9T g.chr6: 49781341 G>A
Intra_41 9T g.chr11: 10929472 A>G
Intra_42 9T g.chr2: 194100276 T>A
Intra_43 9T g.chrX: 113647575 A>G
Intra_44 9T g.chr7: 49373001 C>T
Intra_45 9T g.chr15: 39711715 T>A
Intra_46 9T g.chr7: 1218518 T>A
Intra_47 9T g.chr8: 33126381 T>A
Intra_48 9T g.chr6: 87758958 A>T
Intra_49 9T g.chr9: 5244825 A>T
Intra_50 9T g.chr15: 26147197 A>T
Intra_51 9T g.chrX: 62336318 A>G
Intra_52 9T g.chr3: 5733085 A>T
Intra_53 9T g.chr10: 89844130 A>T
Intra_54 9T g.chr2: 62666724 G>A
Intra_55 9T g.chr7: 17783888 T>A
Intra_56 9T g.chr5: 14580920 A>T
Intra_57 9T g.chr15: 80293494 A>T
Intra_58 9T g.chr7: 6105635 C>A
Intra_59 9T g.chr4: 153229559 A>T
Intra_60 9T g.chr7: 88233656 A>C
Intra_61 9T g.chr7: 9088664 A>T
Intra_62 9T g.chrX: 20609773 C>T
Intra_63 9T g.chr6: 92265045 T>A
Intra_64 9T g.chr2: 173016965 T>A
Intra_65 9T g.chr4: 12804502 T>A
Intra_66 9T g.chr7: 108844198 A>T
Intra_67 9T g.chr13: 89950282 T>A
Intra_68 9T g.chr4: 146977164 C>A
29
Supplementary Table 6 continued
Somatic substitutions in the intergenic regions of AA-UTUC (9T)
Gene symbol Sample
ID Nucleotide (Genomic)
Intra_69 9T g.chr1: 160444224 A>T
Intra_70 9T g.chr5: 119014089 A>T
Intra_71 9T g.chr8: 97197166 A>T
Intra_72 9T g.chr5: 16275367 C>T
Intra_73 9T g.chr4: 180521786 A>T
Intra_74 9T g.chr19: 31229836 A>T
Intra_75 9T g.chr13: 59405509 A>T
Intra_76 9T g.chr15: 97568375 A>T
Intra_77 9T g.chr5: 108052849 A>T
Intra_78 9T g.chrX: 1694957 T>A
Intra_79 9T g.chr16: 26262204 T>A
Intra_80 9T g.chr9: 38704742 T>A
Intra_81 9T g.chr5: 63025558 T>C
Intra_82 9T g.chr12: 94484571 T>A
Intra_83 9T g.chr14: 29059801 T>A
Intra_84 9T g.chrX: 108137918 T>A
Intra_85 9T g.chr8: 140419290 A>T
Intra_86 9T g.chr13: 44609566 T>A
Intra_87 9T g.chr1: 63428116 T>A
Intra_88 9T g.chr1: 85382956 T>A
Intra_89 9T g.chr19: 39292235 T>A
Intra_90 9T g.chr7: 98172016 T>A
Intra_91 9T g.chr5: 86141968 T>A
Intra_92 9T g.chr10: 113298318 T>A
Intra_93 9T g.chr4: 45618818 T>A
Intra_94 9T g.chr12: 30124771 T>A
Intra_95 9T g.chr1: 68018886 T>A
30
Supplementary Table 7
Sequence analysis summary of nine exome-sequenced AA-UTUCs
Bases in Target
Region
Bases Mapped to
Target Region
Ave. Depth
Per Targeted
Base
Targeted Bases
with Depth at
Least 1X
Targeted Bases
with Depth at
Least 20X
Somatic Mutations
Identified in
Targeted Region
3T Normal 37804019 37192007 40 92.4 64
Tumor 37804019 31149098 33 92.5 60 437
6T Normal 37804019 19051035 35 94.1 58
Tumor 37804019 19025237 35 94.1 57 1364
9T Normal 37804019 18557621 34 94.5 58
Tumor 37804019 18537281 34 94.9 59 1775
10T Normal 37804019 18674154 35 94.8 59
Tumor 37804019 18877699 35 94.6 58 403
13T Normal 37804019 33166677 35 92.2 61
Tumor 37804019 35941590 37 93.2 64 955
20T Normal 37804019 18924524 35 95.7 61
Tumor 37804019 18851564 35 96.7 55 1383
79T Normal 37804019 39151870 42 93.1 68
Tumor 37804019 35165500 37 92.5 63 1037
80T Normal 37804019 31127604 33 92.7 61
Tumor 37804019 32370161 34 92.8 58 1102
100T Normal 37804019 31670901 33 93.8 59
Tumor 37804019 32050111 34 92.7 62 1477
31
Supplementary Table 8
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
RP11-790I12.1 3T g.chr4: 70265447 A>T K11X Nonsense
CCNB3 3T g.chrX: 50052086 A>T Q306L Missense
TET1 3T g.chr10: 70332976 A>T H294L Missense
ARHGEF38 3T g.chr4: 106588667 A>T D652V Missense
FAM178A 3T g.chr10: 102684332 A>T K525I Missense
ARSK 3T g.chr5: 94927237 A>T H335L Missense
LRRTM4 3T g.chr2: 77746288 A>T L236X Nonsense
RP4-555N2.2 3T g.chrX: 118355124 A>T Q127L Missense
WDR60 3T g.chr7: 158705715 A>T I544F Missense
OVCH1 3T g.chr12: 29630115 A>T S433T Missense
OR7A17 3T g.chr19: 14991680 A>T V163E Missense
TTN 3T g.chr2: 179466753 A>T D15847E Missense
GRHL1 3T g.chr2: 10101201 A>T Q102L Missense
HIVEP1 3T g.chr6: 12122558 A>T T844S Missense
PKD1L3 3T g.chr16: 71984095 A>T L1102Q Missense
DOCK3 3T g.chr3: 51395271 A>T - Splice site
F8 3T g.chrX: 154221283 A>T Y177N Missense
GRID2 3T g.chr4: 94693392 A>T R923W Missense
OR4C10P 3T g.chr11: 48454529 A>T M57K Missense
SCN8A 3T g.chr12: 52156408 A>T E831V Missense
TSHB 3T g.chr1: 115576652 A>T Y74F Missense
OXSM 3T g.chr3: 25833031 A>T K174X Nonsense
C11orf30 3T g.chr11: 76169243 A>T S88C Missense
CDH11 3T g.chr16: 64982547 A>T Y680N Missense
GK2 3T g.chr4: 80328321 A>T I345N Missense
SLITRK2 3T g.chrX: 144905604 A>T N554T Missense
DMD 3T g.chrX: 32490358 A>T S958T Missense
FMN2 3T g.chr1: 240492706 A>T T1459S Missense
PRRC2C 3T g.chr1: 171491384 A>T L271F Missense
TMEM225 3T g.chr11: 123755985 A>T W50R Missense
AC010872.2 3T g.chr2: 21362956 A>T S873C Missense
KCNU1 3T g.chr8: 36793232 A>T T1082S Missense
PDE1C 3T g.chr7: 31862741 A>T W510R Missense
RP11-513I15.3 3T g.chr6: 34187365 A>T I50L Missense
SCN1A 3T g.chr2: 166901793 A>T S474R Missense
32
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
TTC3 3T g.chr21: 38538777 A>T T1421S Missense
ACTRT1 3T g.chrX: 127186125 A>T C21S Missense
C4orf44 3T g.chr4: 3265639 A>T R253X Nonsense
FRAS1 3T g.chr4: 79296985 A>G N1082H Missense
ADAM9 3T g.chr8: 38947672 A>T R725S Missense
VPS8 3T g.chr3: 184542427 A>T N3Y Missense
ZNF99 3T g.chr19: 22939396 A>T F925L Missense
C11orf82 3T g.chr11: 82644226 A>T R616W Missense
CEP192 3T g.chr18: 13073006 A>T - Splice site
DGKK 3T g.chrX: 50113475 A>T V1222E Missense
KCMF1 3T g.chr2: 85273323 A>T S175C Missense
KDM6A 3T g.chrX: 44949966 A>T - Splice site
USH2A 3T g.chr1: 216390880 A>T C1002X Nonsense
PDE6B 3T g.chr4: 663865 C>A A845E Missense
HCN1 3T g.chr5: 45303881 C>A D480Y Missense
HDAC4 3T g.chr2: 239988551 C>A C952F Missense
KIAA1377 3T g.chr11: 101828996 G>A E202K Missense
OR2J1 3T g.chr6: 29069377 G>T G220C Missense
LPHN3 3T g.chr4: 62598993 G>A D306N Missense
LPHN3 3T g.chr4: 62598912 G>A E279K Missense
NBEA 3T g.chr13: 35806645 G>C E1889Q Missense
MGAM 3T g.chr7: 141708506 G>T - Splice site
MYO6 3T g.chr6: 76596696 G>T L881F Missense
PCDH18 3T g.chr4: 138449633 T>A - Splice site
ATP10B 3T g.chr5: 160112022 T>C Y69C Missense
KRBA2 3T g.chr17: 8273402 T>A T177S Missense
COL4A3BP 3T g.chr5: 74677020 T>A N670Y Missense
FAT4 3T g.chr4: 126336256 T>A D2046E Missense
NEB 3T g.chr2: 152528966 T>A K1406X Nonsense
PRR14L 3T g.chr22: 32110638 T>A K1063X Nonsense
PHKA2 3T g.chrX: 18944568 T>A - Splice site
TAS2R30 3T g.chr12: 11286345 T>A K167X Nonsense
ZNF274 3T g.chr19: 58723646 T>G L126V Missense
OR52A4 3T g.chr11: 5142647 T>A K54N Missense
33
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
SCN3A 3T g.chr2: 165947154 T>A M1837L Missense
CDC42BPA 3T g.chr1: 227333321 T>A K338X Nonsense
MUC16 3T g.chr19: 8996369 T>A T13735S Missense
PLG 3T g.chr6: 161127436 T>A - Splice site
FAT4 3T g.chr4: 126412143 T>A H4722Q Missense
NPFFR2 3T g.chr4: 73013419 T>A F487I Missense
OR6B3 3T g.chr2: 240984538 T>A S318C Missense
ANKRD34B 3T g.chr5: 79855624 T>A E72V Missense
N6AMT2 3T g.chr13: 21306187 T>C I101V Missense
CDK19 3T g.chr6: 110953242 T>A K213X Nonsense
USP29 3T g.chr19: 57640739 T>A C232X Nonsense
GABRQ 3T g.chrX: 151820129 T>A Y348N Missense
RFTN2 3T g.chr2: 198511279 T>A Q84L Missense
ZFHX4 3T g.chr8: 77763949 T>A C1553S Missense
VTCN1 3T g.chr1: 117695815 T>A M208L Missense
CCDC54 3T g.chr3: 107096438 T>A Y2N Missense
HGD 3T g.chr3: 120394676 T>A E17V Missense
KRBA2 3T g.chr17: 8273480 T>A K151X Nonsense
RP11-4H14.1 3T g.chr3: 140621361 T>A L25Q Missense
SIN3A 3T g.chr15: 75702483 T>A S385C Missense
TRIM3 3T g.chr11: 6478534 T>A K230X Nonsense
ATP13A3 3T g.chr3: 194180616 T>A N104Y Missense
C6orf112 3T g.chr6: 105614442 T>A S58T Missense
OR51J1 3T g.chr11: 5424630 T>A H268Q Missense
STAG2 3T g.chrX: 123200222 T>A L734Q Missense
SYF2 3T g.chr1: 25553933 T>A H156L Missense
BTBD9 3T g.chr6: 38548045 T>A Q328L Missense
GPR110 3T g.chr6: 46989725 T>A Y174F Missense
HSPA13 3T g.chr21: 15746182 T>A E391V Missense
CREBBP 3T g.chr16: 3789633 T>G F1409C Missense
CDH10 3T g.chr5: 24509775 T>A R386W Missense
KDM6A 6T g.chrX: 44922793 A>T S552C Missense
ABCF1 6T g.chr6: 30553654 A>T - Splice site
GRIN3A 6T g.chr9: 104433085 A>T C537S Missense
LRRK2 6T g.chr12: 40716983 A>T D1844V Missense
LRRK2 6T g.chr12: 40742246 A>G M2106V Missense
34
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
MBD5 6T g.chr2: 149248144 A>T Q1415L Missense
SPTA1 6T g.chr1: 158615301 A>T L1327X Nonsense
SYNE2 6T g.chr14: 64516447 A>T Q2499L Missense
SYNE2 6T g.chr14: 64430615 A>T - Splice site
USH2A 6T g.chr1: 216172244 A>T Y2214X Nonsense
BSN 6T g.chr3: 49693462 A>T H2158L Missense
CNTLN 6T g.chr9: 17457560 A>T K1051N Missense
COL5A1 6T g.chr9: 137710896 A>T - Splice site
COL6A3 6T g.chr2: 238271909 A>T L2017Q Missense
COL6A3 6T g.chr2: 238265910 A>T L43Q Missense
CPS1 6T g.chr2: 211542689 A>T S1501C Missense
DAPK1 6T g.chr9: 90296322 A>T T669S Missense
DAPK1 6T g.chr9: 90321813 A>T K1276M Missense
FLG 6T g.chr1: 152280558 A>T H2268Q Missense
GPR180 6T g.chr13: 95275557 A>T - Splice site
HERC2 6T g.chr15: 28422635 A>T L3062V Missense
LAMC2 6T g.chr1: 183196728 A>T D455V Missense
LRFN5 6T g.chr14: 42356084 A>T T86S Missense
LRP1B 6T g.chr2: 141027893 A>T S4389T Missense
LRRC7 6T g.chr1: 70484405 A>G K404E Missense
NF1 6T g.chr17: 29556961 A>T S987C Missense
NF1 6T g.chr17: 29509652 A>T K286I Missense
NRP1 6T g.chr10: 33475318 A>T C721S Missense
PCNX 6T g.chr14: 71479729 A>T R936X Nonsense
PDCD11 6T g.chr10: 105173795 A>T R420X Nonsense
PIK3CA 6T g.chr3: 178952085 A>T H1047L Missense
PIKFYVE 6T g.chr2: 209190552 A>T Q1006L Missense
PPFIA1 6T g.chr11: 70184484 A>T Q499L Missense
ROS1 6T g.chr6: 117715413 A>T L359Q Missense
ADAMTSL1 6T g.chr9: 84690254 N1456Y Missense
SCN10A 6T g.chr3: 38812891 A>T F160I Missense
SEZ6L 6T g.chr22: 26689017 A>T E247V Missense
SLC22A14 6T g.chr3: 38347971 A>T T152S Missense
TRPM6 6T g.chr9: 77411805 A>T I748N Missense
ABCA9 6T g.chr17: 67028333 A>T L454H Missense
KCNA2 6T g.chr1: 111146380 A>T F342Y Missense
35
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
SCN10A 6T g.chr3: 38812891 A>T F160I Missense
SLC12A2 6T g.chr5: 127484544 A>T L660F Missense
SLC22A14 6T g.chr3: 38347971 A>T T152S Missense
TRPM1 6T g.chr15: 31294399 A>T S1480T Missense
TRPM6 6T g.chr9: 77411805 A>T I748N Missense
ATP1A2 6T g.chr1: 160105048 A>T Q693L Missense
CACNA1S 6T g.chr1: 201046136 A>T L580H Missense
CACNG4 6T g.chr17: 65014347 A>T E88V Missense
KCTD7 6T g.chr7: 66098401 A>T Y95F Missense
SLC17A4 6T g.chr6: 25769280 A>T Q53H Missense
SLC6A16 6T g.chr19: 49793914 A>T M630K Missense
SLC4A1AP 6T g.chr2: 27888030 A>T K297X Nonsense
ATP6V0A4 6T g.chr7: 138444559 A>T C193S Missense
SCN3A 6T g.chr2: 165986519 A>T C951X Nonsense
IGSF10 6T g.chr3: 151155382 C>T V2323M Missense
SCN1A 6T g.chr2: 166848516 C>T G1746R Missense
RIPK1 6T g.chr6: 3105964 C>T P419S Missense
SCN1A 6T g.chr2: 166848516 C>T G1746R Missense
SNTG1 6T g.chr8: 51314780 G>A G13E Missense
SORBS2 6T g.chr4: 186515083 G>A R1031C Missense
TAS1R1 6T g.chr1: 6639019 G>A R634H Missense
SLC6A2 6T g.chr16: 55703518 G>T G106W Missense
KDM6A 6T g.chrX: 44950111 T>A - Splice site
DNAH9 6T g.chr17: 11783478 T>A L3521Q Missense
MYO5C 6T g.chr15: 52556407 T>A N343Y Missense
TP53 6T g.chr17: 7578291 T>A - Splice site
ANKRD26 6T g.chr10: 27368093 T>A - Splice site
IGSF10 6T g.chr3: 151163661 T>A T1370S Missense
SH3TC2 6T g.chr5: 148427447 T>A Q86L Missense
SPG11 6T g.chr15: 44921537 T>A R595S Missense
SPTA1 6T g.chr1: 158653252 T>A E100V Missense
SYNE1 6T g.chr6: 152708252 T>A Q2814H Missense
SYNE1 6T g.chr6: 152738003 T>A S1857C Missense
USH2A 6T g.chr1: 216108136 T>A - Splice site
ALS2 6T g.chr2: 202587806 T>A Y1221F Missense
CHD6 6T g.chr20: 40122545 G>T D378Y Missense
36
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
CASK 6T g.chrX: 41428922 T>A - Splice site
COL6A3 6T g.chr2: 238287413 T>A Q788L Missense
FAM5B 6T g.chr1: 177250326 T>A S672T Missense
HERC2 6T g.chr15: 28380829 T>A T4009S Missense
ITPR2 6T g.chr12: 26640004 T>A - Splice site
MYH1 6T g.chr17: 10395849 T>A R1902W Missense
MYO5B 6T g.chr18: 47479655 T>A Q576L Missense
RB1CC1 6T g.chr8: 53573219 T>A - Splice site
RPE65 6T g.chr1: 68912474 T>A E55V Missense
RPE65 6T g.chr1: 68904766 T>A - Splice site
SCN2A 6T g.chr2: 166170519 T>A Y428X Nonsense
TG 6T g.chr8: 133961126 T>A V1780E Missense
UBR4 6T g.chr1: 19494518 T>A - Splice site
UBR4 6T g.chr1: 19407930 T>A E5049V Missense
WDR72 6T g.chr15: 53901780 T>A E961V Missense
ABCC11 6T g.chr16: 48226622 T>A S839C Missense
ATP6V1C2 6T g.chr2: 10923387 T>A - Missense
ATP8B2 6T g.chr1: 154310133 T>A C383S Missense
KCNQ5 6T g.chr6: 73904885 T>A S849R Missense
ATP8B2 6T g.chr1: 154313388 T>A C398S Missense
KCNJ1 6T g.chr11: 128709031 T>A T389S Missense
SLC12A3 6T g.chr16: 56918015 T>A I575N Missense
SLC12A5 6T g.chr20: 44664504 T>A V146E Missense
SLC17A1 6T g.chr6: 25813324 T>A - Splice site
SLC17A8 6T g.chr12: 100790131 T>A H204Q Missense
SLC22A8 6T g.chr11: 62763484 T>C Y96C Missense
SLC5A6 6T g.chr2: 27429792 T>A N138Y Missense
SLC6A13 6T g.chr12: 344372 T>A T239S Missense
SCN2A 6T g.chr2: 166170519 T>A Y428X Nonsense
SLC4A5 6T g.chr2: 74483029 T>A K300X Nonsense
KCNQ5 6T g.chr6: 73787616 T>A - Splice site
SLC12A6 6T g.chr15: 34547459 T>A - Splice site
SLC5A5 6T g.chr19: 17999266 T>A - Splice site
DCHS2 6T g. chr4: 155155838 T>A E2867D Missense
ARID1A 6T g.chr1: 27097616 A>T K1069X Nonsense
SETX 6T g. chr9: 135158647 A>T - Splice site
37
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
CREBBP 6T g. chr16: 3786686 T>A K1509X Nonsense
KDM6A 9T g.chrX: 44928975 A>T Q692L Missense
DNAH9 9T g.chr17: 11757413 A>T R3201W Missense
DNAH9 9T g.chr17: 11511538 A>T Q170H Missense
MYO5C 9T g.chr15: 52571196 A>T M108K Missense
GRIN3A 9T g.chr9: 104449238 A>T L315H Missense
IGSF10 9T g.chr3: 151162912 A>T S1619R Missense
MBD5 9T g.chr2: 149247504 A>T S1202C Missense
SH3TC2 9T g.chr5: 148407581 A>T Y572N Missense
SPTA1 9T g.chr1: 158653289 A>T - Splice site
SYNE2 9T g.chr14: 64465690 A>T T1138S Missense
USH2A 9T g.chr1: 215847599 A>T W4552R Missense
ARMC3 9T g.chr10: 23248385 A>T Y140F Missense
CASK 9T g.chrX: 41485949 A>T V308E Missense
COL5A1 9T g.chr9: 137734011 A>T K1793N Missense
CPS1 9T g.chr2: 211476878 A>T Q816L Missense
DAPK1 9T g.chr9: 90312118 A>T - Splice site
FLG2 9T g.chr1: 152323217 A>T Y2349N Missense
GPR180 9T g.chr13: 95275532 A>T E355V Missense
ADAMTSL1 9T g.chr9: 84592659 A>T H664L Missense
HMCN1 9T g.chr1: 186055437 A>T I2982F Missense
HMCN1 9T g.chr1: 186151407 A>T K5468X Nonsense
LAMA1 9T g.chr18: 6985242 A>T L1885Q Missense
LAMC2 9T g.chr1: 183209276 A>C E1057D Missense
MACF1 9T g.chr1: 39782061 A>T - Splice site
MACF1 9T g.chr1: 39910443 A>T E4957V Missense
MACF1 9T g.chr1: 39833821 A>T - Splice site
NF1 9T g.chr17: 29509547 A>G D251G Missense
PCNX 9T g.chr14: 71568795 A>T H1893L Missense
PDCD11 9T g.chr10: 105183303 A>C Q884P Missense
PDE3B 9T g.chr11: 14852316 A>T K627M Missense
PPFIA1 9T g.chr11: 70201892 A>T Q821H Missense
RELN 9T g.chr7: 103206814 A>T M1598K Missense
RELN 9T g.chr7: 103214542 A>C - Splice site
RELN 9T g.chr7: 103163883 A>T I2482K Missense
RIPK1 9T g.chr6: 3085645 A>T - Splice site
38
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
SCN10A 9T g.chr3: 38763750 A>T - Splice site
SLC22A14 9T g.chr3: 38347882 A>T Q122L Missense
SNTG1 9T g.chr8: 51449364 A>T S226C Missense
SPEG 9T g.chr2: 220352936 A>T I2588F Missense
SRCAP 9T g.chr16: 30740854 A>T T2030S Missense
TRPM6 9T g.chr9: 77397732 A>T I986K Missense
TRPM6 9T g.chr9: 77415213 A>T M732K Missense
ABCD2 9T g.chr12: 39980033 A>T H571Q Missense
ATP11A 9T g.chr13: 113490562 A>T - Splice site
ATP1A4 9T g.chr1: 160141428 A>G T579A Missense
KCNH7 9T g.chr2: 163302648 A>T N478K Missense
KCNJ6 9T g.chr21: 38997661 A>T Y358N Missense
SCN10A 9T g.chr3: 38763750 A>T - Splice site
SLC17A3 9T g.chr6: 25862661 A>T C35S Missense
SLC19A3 9T g.chr2: 228567028 A>T C3S Missense
SLC22A14 9T g.chr3: 38347882 A>T Q122L Missense
SLC47A2 9T g.chr17: 19584847 A>T M448K Missense
SLC7A11 9T g.chr4: 139144472 A>T V176E Missense
TRPM6 9T g.chr9: 77415213 A>T M732K Missense
TRPM6 9T g.chr9: 77397732 A>T I986K Missense
ATP2B3 9T g.chrX: 152818503 A>G I612V Missense
ATP6V0A1 9T g.chr17: 40620122 A>T - Splice site
ATP6V0A4 9T g.chr7: 138433945 A>T Y383N Missense
ATP8A1 9T g.chr4: 42571217 A>T V434D Missense
CACNA1F 9T g.chrX: 49070726 A>T Y1212N Missense
KCNB2 9T g.chr8: 73849709 A>T S707C Missense
SLC20A1 9T g.chr2: 113410198 A>T Q66H Missense
SLC25A14 9T g.chrX: 129492624 A>T Q170L Missense
SLC45A2 9T g.chr5: 33947323 A>T L438Q Missense
TRPM4 9T g.chr19: 49703872 A>T K928M Missense
KCNH8 9T g.chr3: 19491657 A>T R479X Nonsense
SLC10A4 9T g.chr4: 48487157 A>T K267X Nonsense
AQP10 9T g.chr1: 154295715 A>T - Splice site
SLC22A3 9T g.chr6: 160831759 A>T - Splice site
TRPC5 9T g.chrX: 111155885 A>T C178X Nonsense
USH2A 9T g.chr1: 216011333 C>A R3124I Missense
39
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
APOB 9T g.chr2: 21232009 C>T M2577I Missense
SLC6A17 9T g.chr1: 110717451 C>T R208X Nonsense
FLG 9T g.chr1: 152282665 G>T T1566N Missense
SCN2A 9T g.chr2: 166246035 G>A V1907M Missense
SORBS2 9T g.chr4: 186508806 G>A P1093L Missense
SCN2A 9T g.chr2: 166246035 G>A V1907M Missense
SLC5A5 9T g.chr19: 17985491 G>T W165L Missense
DCHS2 9T g.chr4: 155253631 T>A - Splice site
MYO5C 9T g.chr15: 52532060 T>A K858M Missense
TP53 9T g.chr17: 7579329 T>A K120X Nonsense
ANKRD26 9T g.chr10: 27342321 T>A - Splice site
ATRX 9T g.chrX: 76944313 T>A - Splice site
IGSF10 9T g.chr3: 151162889 T>A Q1627L Missense
LRRK2 9T g.chr12: 40646807 T>A L426X Nonsense
SCN1A 9T g.chr2: 166897815 T>A N770Y Missense
SPG11 9T g.chr15: 44888412 T>A K1435X Nonsense
SYNE1 9T g.chr6: 152754941 T>A S1484C Missense
SYNE1 9T g.chr6: 152652963 T>A Q4286L Missense
ALS2 9T g.chr2: 202592003 T>A - Splice site
ARMC3 9T g.chr10: 23295742 T>A S23T Missense
COL6A3 9T g.chr2: 238243488 T>A S3004C Missense
CPS1 9T g.chr2: 211469972 T>A - Splice site
CUBN 9T g.chr10: 16893316 T>A N3194I Missense
DDX60 9T g.chr4: 169195161 T>A Y793F Missense
EPHA7 9T g.chr6: 94120446 T>A Y202F Missense
HERC2 9T g.chr15: 28380628 T>A R4076X Nonsense
IKBKAP 9T g.chr9: 111679834 T>A E286V Missense
IKBKAP 9T g.chr9: 111653686 T>A - Splice site
ITPR2 9T g.chr12: 26640123 T>A Y1811F Missense
ITPR2 9T g.chr12: 26648195 T>A - Splice site
KRTAP10-11 9T g.chr21: 46066685 T>A C104S Missense
LRFN5 9T g.chr14: 42360707 T>A I547N Missense
MAGI1 9T g.chr3: 65361589 T>A - Splice site
MLL3 9T g.chr7: 151877842 T>A Q2368L Missense
MYH1 9T g.chr17: 10397768 T>A - Splice site
PIK3C2A 9T g.chr11: 17190802 T>A S163C Missense
40
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
RB1CC1 9T g.chr8: 53574107 T>A H449L Missense
RELN 9T g.chr7: 103207089 T>G K1569T Missense
RELN 9T g.chr7: 103124129 T>A Q3384H Missense
RPE65 9T g.chr1: 68904738 T>A K295N Missense
SCN2A 9T g.chr2: 166245507 T>A C1731S Missense
SCN2A 9T g.chr2: 166245286 T>A L1657H Missense
SEZ6L 9T g.chr22: 26736587 T>A M734K Missense
SNTG1 9T g.chr8: 51621455 T>A Y401N Missense
SORBS2 9T g.chr4: 186536279 T>A I892F Missense
STAT2 9T g.chr12: 56742590 T>A - Splice site
UBR4 9T g.chr1: 19518874 T>A - Splice site
ABCA10 9T g.chr17: 67187398 T>A I644F Missense
ABCA8 9T g.chr17: 66928642 T>A H195L Missense
ABCB1 9T g.chr7: 87196278 T>A Y118F Missense
ABCC4 9T g.chr13: 95686889 T>A E1280D Missense
ATP12A 9T g.chr13: 25274951 T>A M591K Missense
ATP5E 9T g.chr20: 57605482 T>A Y12F Missense
ATP6V1H 9T g.chr8: 54682222 T>A R377S Missense
CATSPER1 9T g.chr11: 65793850 T>A M1L Missense
CATSPER4 9T g.chr1: 26527439 T>A L369H Missense
CATSPERB 9T g.chr14: 92102815 T>A S566C Missense
KCNH1 9T g.chr1: 211280614 T>A Q62L Missense
SCN1A 9T g.chr2: 166897815 T>A N770Y Missense
SCN2A 9T g.chr2: 166245507 T>A C1731S Missense
SCN2A 9T g.chr2: 166245286 T>A L1657H Missense
SCN8A 9T g.chr12: 52200293 T>A Y1675N Missense
SCN9A 9T g.chr2: 167055808 T>A M1770L Missense
SLC11A2 9T g.chr12: 51389416 T>A Q329L Missense
SLC38A11 9T g.chr2: 165755161 T>A H314L Missense
TRPA1 9T g.chr8: 72987592 T>A Q18L Missense
TRPC4 9T g.chr13: 38211320 T>A Q890L Missense
TRPM6 9T g.chr9: 77442805 T>A M244L Missense
ABCG1 9T g.chr21: 43711705 T>A M543K Missense
ATP13A5 9T g.chr3: 193029642 T>A Q803L Missense
ATP7B 9T g.chr13: 52539118 T>A T587S Missense
KCNB2 9T g.chr8: 73848740 T>A Y384N Missense
41
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
KCNB2 9T g.chr8: 73480015 T>A S16T Missense
KCND2 9T g.chr7: 119914973 T>A L96Q Missense
KCNH6 9T g.chr17: 61619736 T>A Y697N Missense
KCNK10 9T g.chr14: 88652253 T>A S420C Missense
KCTD1 9T g.chr18: 24039649 T>A N792Y Missense
SLC1A2 9T g.chr11: 35327669 T>A M228L Missense
SLC25A12 9T g.chr2: 172666140 T>A Q18L Missense
SLC30A8 9T g.chr8: 118174094 T>A S230R Missense
ABCB4 9T g.chr7: 87046833 T>A - Splice site
ATP13A4 9T g.chr3: 193232515 T>A E69V Missense
SLC39A6 9T g.chr18: 33703455 T>A - Splice site
CDH10 9T g.chr5: 24491891 T>C H557R Missense
KDM6A 10T g.chrX: 44966653 A>T - Splice site
DCHS2 10T g.chr4: 155157973 A>T Y2156N Missense
SCN1A 10T g.chr2: 166912988 A>T C136S Missense
COL5A1 10T g.chr9: 137593015 A>T - Splice site
CUBN 10T g.chr10: 16919073 A>T C2977S Missense
HMCN1 10T g.chr1: 186121990 A>T E5002V Missense
MACF1 10T g.chr1: 39896469 A>T R4182S Missense
MYH2 10T g.chr17: 10428845 A>T L1487Q Missense
PIKFYVE 10T g.chr2: 209200622 A>T - Splice site
SNTG1 10T g.chr8: 51362289 A>T - Splice site
TNKS 10T g.chr8: 9565879 A>T - Splice site
ABCC9 10T g.chr12: 22089572 A>T Y13N Missense
SCN1A 10T g.chr2: 166912988 A>T C136S Missense
SLC13A1 10T g.chr7: 122757637 A>T L513H Missense
SLC36A1 10T g.chr5: 150847473 A>T Q237L Missense
ATP6V0A4 10T g.chr7: 138444626 A>T - Splice site
MBD5 10T g.chr2: 149247139 C>T P1080L Missense
FAM5B 10T g.chr1: 177226406 C>G N185K Missense
HERC2 10T g.chr15: 28436146 C>T D2872N Missense
ATP2B4 10T g.chr1: 203678616 C>T P582L Missense
HMCN1 10T g.chr1: 186024775 G>C M2371I Missense
MACF1 10T g.chr1: 39549979 G>A R30Q Missense
KCNB1 10T g.chr20: 47989776 G>T P774H Missense
MYO5C 10T g.chr15: 52511984 T>A S1253C Missense
42
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
TP53 10T g.chr17: 7577100 T>A R280X Nonsense
ATRX 10T g.chrX: 76939787 T>A S321C Missense
LRRK2 10T g.chr12: 40668734 T>A L627Q Missense
SH3TC2 10T g.chr5: 148408046 T>A R417X Nonsense
SPTA1 10T g.chr1: 158605793 T>C D1781G Missense
ALS2 10T g.chr2: 202608999 T>A R718W Missense
CUBN 10T g.chr10: 16957975 T>A H2352L Missense
STAT2 10T g.chr12: 56750063 T>A E46D Missense
TAS1R1 10T g.chr1: 6639211 T>A L698Q Missense
TMEM132D 10T g.chr12: 129566383 T>A Q615L Missense
WDR72 10T g.chr15: 53997395 T>A N380Y Missense
ZNF608 10T g.chr5: 123983113 T>A Q988H Missense
SCN11A 10T g.chr3: 38888399 T>C Y1721C Missense
TRPM3 10T g.chr9: 73225527 T>A - Splice site
SETX 10T g.chr9: 135205772 T>A N405Y Missense
C3orf35 13T g.chr3: 37476357 A>T K83N Missense
SERAC1 13T g.chr6: 158569947 A>T V102E Missense
ITGA11 13T g.chr15: 68605190 A>T L965Q Missense
MRPS5P3 13T g.chr5: 126479587 A>G C15G Missense
GPKOW 13T g.chrX: 48972313 A>T L358M Missense
RP11-543B16.2 13T g.chr1: 211380761 A>T R61W Missense
MYOM3 13T g.chr1: 24413244 A>T L563Q Missense
KBTBD5 13T g.chr3: 42730146 A>T Y453F Missense
PARVG 13T g.chr22: 44602220 A>T S304C Missense
KCNG4 13T g.chr16: 84270838 A>T L85Q Missense
REL 13T g.chr2: 61118816 A>T - Splice site
ABCB8 13T g.chr7: 150733629 A>T - Splice site
NLRC5 13T g.chr16: 57079358 A>T - Splice site
TNR 13T g.chr1: 175360422 A>T - Splice site
RBBP7 13T g.chrX: 16863081 A>T - Splice site
MYH14 13T g.chr19: 50812935 A>T Q1967L Missense
SIN3B 13T g.chr19: 16962260 A>T Q255L Missense
PIK3R6 13T g.chr17: 8722405 A>T F663Y Missense
RNF112 13T g.chr17: 19319213 A>T M541L Missense
SH2D4B 13T g.chr10: 82394257 A>T H400L Missense
KIF1A 13T g.chr2: 241664775 A>T M1289K Missense
43
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
APOB 13T g.chr2: 21235181 A>T L1520Q Missense
RP11-302I18.2 13T g.chr1: 220487703 A>T L17H Missense
ZFYVE19 13T g.chr15: 41105034 A>T R322W Missense
PANX1 13T g.chr11: 93862626 A>T I50F Missense
NPTX1 13T g.chr17: 78444723 A>T C397S Missense
RP11-432I13.3 13T g.chr10: 45742206 A>T W56R Missense
KCNMA1 13T g.chr10: 78846244 A>T - Splice site
USH2A 13T g.chr1: 216465699 A>T L553X Nonsense
ODF2 13T g.chr9: 131235199 A>T K173X Nonsense
WNT2 13T g.chr7: 116918365 A>T C309X Nonsense
SHROOM4 13T g.chrX: 50341538 A>T - Splice site
TET1 13T g.chr10: 70404793 A>T E769D Missense
MAST1 13T g.chr19: 12976569 A>T T615S Missense
PTGFR 13T g.chr1: 78963602 A>T Y281F Missense
CDH23 13T g.chr10: 73501502 A>T I1602F Missense
C1orf129 13T g.chr1: 170931064 A>T N108Y Missense
DNTTIP1 13T g.chr20: 44439543 A>T E272V Missense
DMD 13T g.chrX: 31497171 A>G L2866P Missense
HS2ST1 13T g.chr1: 87570369 A>T K354M Missense
TGFBI 13T g.chr5: 135394898 A>T S600C Missense
GTF2H3 13T g.chr12: 124139477 A>C K165Q Missense
CXCR2P1 13T g.chr2: 218925611 A>T L37Q Missense
ABCG8 13T g.chr2: 44099141 A>T S331C Missense
UBXN2A 13T g.chr2: 24199914 A>T S86C Missense
ARHGEF10L 13T g.chr1: 17953828 A>T M472L Missense
DNM3 13T g.chr1: 172037957 A>T - Splice site
PTEN 13T g.chr10: 89720654 A>T K269X Nonsense
CHD2 13T g.chr15: 93486296 A>T Q350H Missense
ARID1B 13T g.chr6: 157528178 A>G H1950R Missense
SORL1 13T g.chr11: 121454205 C>T R1207X Nonsense
COL27A1 13T g.chr9: 117002747 C>T R939C Missense
COPS7B 13T g.chr2: 232653609 C>G P110R Missense
SLC24A4 13T g.chr14: 92920322 C>T T303I Missense
NKX2-2 13T g.chr20: 21494154 C>A A52S Missense
DUSP27 13T g.chr1: 167097723 C>T Q1119X Nonsense
SLC18A1 13T g.chr8: 20038417 C>T R20Q Missense
44
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
HDAC7 13T g.chr12: 48191205 G>C P180R Missense
OR2AG2 13T g.chr11: 6789911 G>C S93C Missense
RP11-572H4.2 13T g.chr9: 31253974 G>A V26M Missense
C1orf54 13T g.chr1: 150248149 G>C - Splice site
CABYR 13T g.chr18: 21736705 G>A E414K Missense
ROBO4 13T g.chr11: 124766568 T>A - Splice site
PTPN3 13T g.chr9: 112190951 T>A K260X Nonsense
WDR45 13T g.chrX: 48934338 T>A K104X Nonsense
ATRX 13T g.chrX: 76875968 T>A I1723F Missense
MUC5B 13T g.chr11: 1155619 T>A C145S Missense
ABCC4 13T g.chr13: 95673932 T>A Y1292F Missense
GP2 13T g.chr16: 20334208 T>A Q213L Missense
EIF2C2 13T g.chr8: 141549543 T>A H682L Missense
BZRAP1 13T g.chr17: 56386657 T>A S1326C Missense
HAPLN4 13T g.chr19: 19371792 T>A Q105L Missense
P2RX5 13T g.chr17: 3593938 T>A S133C Missense
RPS6KA2 13T g.chr6: 166836808 T>A Q568L Missense
C15orf38 13T g.chr15: 90451608 T>A I69F Missense
AC005400.1 13T g.chr7: 35187404 T>A - Splice site
MED12 13T g.chrX: 70356734 T>A Y1802X Nonsense
TECPR1 13T g.chr7: 97861181 T>A T637S Missense
MYF5 13T g.chr12: 81111087 T>A M82K Missense
SLC7A3 13T g.chrX: 70149486 T>A Y121F Missense
RP11-226L15.1 13T g.chr1: 159990364 T>A V51E Missense
ABP1 13T g.chr7: 150554807 T>G C417G Missense
PTCHD3 13T g.chr10: 27702383 T>A N266I Missense
TEK 13T g.chr9: 27218797 T>A W1029R Missense
UNC80 13T g.chr2: 210840906 T>A L2818H Missense
GRIPAP1 13T g.chrX: 48837713 T>A K615M Missense
NEGR1 13T g.chr1: 72400854 T>A Q106L Missense
ATG2A 13T g.chr11: 64664084 T>A Q1759H Missense
AJ239329.1 13T g.chrX: 37351507 T>A H47Q Missense
SMC1A 13T g.chrX: 53436189 T>A E450V Missense
ANKRD31 13T g.chr5: 74484441 T>A S481C Missense
DUSP27 13T g.chr1: 167095806 T>A W480R Missense
DMKN 13T g.chr19: 35990861 T>A - Splice site
45
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
C2CD3 13T g.chr11: 73844604 T>A - Splice site
SPG11 13T g.chr15: 44914098 T>A K827X Nonsense
HDAC9 13T g.chr7: 18767270 T>A M600K Missense
CHD1 13T g.chr5: 98228232 T>A - Splice site
DCHS2 13T g. chr4: 155156196 T>A E2748V Missense
KDM6A 13T g.chrX: 44949979 A>T K1250X Nonsense
ADAMTSL1 13T g.chr9: 18770776 A>T - Splice site
KDM6A 20T g.chrX: 44918297 A>T R308W Missense
DNAH9 20T g.chr17: 11603157 A>T E1661V Missense
LRRK2 20T g.chr12: 40716170 A>T E1789D Missense
SCN1A 20T g.chr2: 166896089 A>T F800L Missense
SYNE2 20T g.chr14: 64520317 A>T K3229I Missense
APOB 20T g.chr2: 21231467 A>T L2758Q Missense
APOB 20T g.chr2: 21252632 A>G M499T Missense
APOB 20T g.chr2: 21230211 A>T Y3177N Missense
COL6A3 20T g.chr2: 238275367 A>T H1821Q Missense
DAPK1 20T g.chr9: 90272978 A>T N620I Missense
FAM5B 20T g.chr1: 177249988 A>T Y559F Missense
LRFN5 20T g.chr14: 42356415 A>T K196M Missense
LRFN5 20T g.chr14: 42361138 A>T T691S Missense
MYH1 20T g.chr17: 10411976 A>T F534Y Missense
PCNX 20T g.chr14: 71502846 A>T H1280L Missense
PDCD11 20T g.chr10: 105183312 A>T E887V Missense
PDE3B 20T g.chr11: 14865386 A>T R778S Missense
PIKFYVE 20T g.chr2: 209200846 A>T Q1481L Missense
SCN2A 20T g.chr2: 166226646 A>T D1229V Missense
SRCAP 20T g.chr16: 30747572 A>T K2261X Nonsense
SRCAP 20T g.chr16: 30748505 A>T S2382C Missense
TNKS 20T g.chr8: 9590849 A>T L736F Missense
TRPM6 20T g.chr9: 77377164 A>T C1475S Missense
TRPM6 20T g.chr9: 77391042 A>T F10I Missense
WDR72 20T g.chr15: 53889332 A>T L1031Q Missense
ATP10D 20T g.chr4: 47578794 A>T Y1124F Missense
ATP11B 20T g.chr3: 182605478 A>T L940F Missense
ATP6V0A2 20T g.chr12: 124233220 A>T K608M Missense
CACNG2 20T g.chr22: 36960768 A>T F201Y Missense
46
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
KCNAB2 20T g.chr1: 6158599 A>T N357Y Missense
KCNQ5 20T g.chr6: 73751755 A>T R196W Missense
SCN4B 20T g.chr11: 118014760 A>T V84E Missense
SLC2A2 20T g.chr3: 170727818 A>T M142K Missense
SLC35A1 20T g.chr6: 88187080 A>T - Splice site
SLC44A1 20T g.chr9: 108127919 A>T - Splice site
SLC4A1 20T g.chr17: 42337223 A>T L188Q Missense
ABCA12 20T g.chr2: 215823144 A>T S1992T Missense
ABCA4 20T g.chr1: 94508359 A>G S1096P Missense
ABCC8 20T g.chr11: 17432181 A>T L859Q Missense
ABCD2 20T g.chr12: 39998687 A>T W428R Missense
AQP4 20T g.chr18: 24442173 A>T S140R Missense
ATP11B 20T g.chr3: 182583486 A>T - Splice site
CACNG7 20T g.chr19: 54445336 A>T E206V Missense
NALCN 20T g.chr13: 101714399 A>T L1559Q Missense
SLC38A8 20T g.chr16: 84050785 A>T S305T Missense
SLC45A1 20T g.chr1: 8390964 A>T S471C Missense
SLC6A15 20T g.chr12: 85260867 A>T L534Q Missense
SLC7A3 20T g.chrX: 70147714 A>T L326H Missense
SLC8A2 20T g.chr19: 47944660 A>T Y601N Missense
TRPM6 20T g.chr9: 77377164 A>T C1475S Missense
TRPM8 20T g.chr2: 234894401 A>T D944V Missense
ATP11A 20T g.chr13: 113470418 A>T K155X Nonsense
KCTD5 20T g.chr16: 2752423 A>T K207X Nonsense
ABCG1 20T g.chr21: 43645779 A>T - Splice site
SLC25A48 20T g.chr5: 135186112 A>T - Splice site
DNAH9 20T g.chr17: 11757708 C>T A3299V Missense
USH2A 20T g.chr1: 216363588 C>A G1458V Missense
COL6A3 20T g.chr2: 238285856 C>A V877L Missense
IKBKAP 20T g.chr9: 111640407 C>A K1241N Missense
RB1CC1 20T g.chr8: 53540721 C>A V1503L Missense
SEZ6L 20T g.chr22: 26702041 C>T T482M Missense
ZNF521 20T g.chr18: 22807406 C>T R159H Missense
SLC9A3 20T g.chr5: 484719 C>T R283H Missense
KCNF1 20T g.chr2: 11053888 C>T R446C Missense
KCNJ4 20T g.chr22: 38823596 C>T R181Q Missense
47
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
SLC29A3 20T g.chr10: 73082758 C>T R83C Missense
SEZ6L 20T g.chr22: 26771655 G>C - Splice site
AQP11 20T g.chr11: 77301461 G>A G142S Missense
MYO5C 20T g.chr15: 52529767 T>A Q927L Missense
TP53 20T g.chr17: 7578208 T>A H214L Missense
ANKRD26 20T g.chr10: 27324267 T>A R1038X Nonsense
ATRX 20T g.chrX: 76952194 T>A - Splice site
COL11A1 20T g.chr1: 103488300 T>A - Splice site
IGSF10 20T g.chr3: 151163277 T>A M1498L Missense
LRRK2 20T g.chr12: 40734258 T>A - Splice site
SCN1A 20T g.chr2: 166901676 T>A E513D Missense
SH3TC2 20T g.chr5: 148386493 T>A Y1209F Missense
SPG11 20T g.chr15: 44890850 T>A S1291C Missense
SPTA1 20T g.chr1: 158645938 T>A T369S Missense
SYNE1 20T g.chr6: 152646266 T>A K5204X Nonsense
USH2A 20T g.chr1: 216371708 T>A M1344L Missense
CASK 20T g.chrX: 41394169 T>A K733M Missense
CPS1 20T g.chr2: 211456674 T>A V362D Missense
CUBN 20T g.chr10: 17171671 T>A R32X Nonsesne
CUBN 20T g.chr10: 17157479 T>A Q237H Missense
DDX60 20T g.chr4: 169177011 T>A L1136F Missense
EPHA7 20T g.chr6: 94120890 T>A - Splice site
ITPR2 20T g.chr12: 26875388 T>A N156I Missense
KRTAP10-11 20T g.chr21: 46066685 T>A C104S Missense
LAMA1 20T g.chr18: 6958623 T>A E2606V Missense
LAMC2 20T g.chr1: 183196772 T>A C470S Missense
LRFN5 20T g.chr14: 42357204 T>A L459H Missense
MAGI1 20T g.chr3: 65433698 T>A - Splice site
MYH2 20T g.chr17: 10443262 T>A E377V Missense
MYH2 20T g.chr17: 10440607 T>G K614Q Missense
NRP1 20T g.chr10: 33552708 T>A Y175F Missense
PIK3C2A 20T g.chr11: 17139093 T>A Q1054L Missense
ROS1 20T g.chr6: 117609792 T>A R2303W Missense
RPE65 20T g.chr1: 68910270 T>G T147P Missense
SCN10A 20T g.chr3: 38797361 T>A H460L Missense
STAT2 20T g.chr12: 56740405 T>A K622M Missense
48
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
TAS2R1 20T g.chr5: 9630054 T>A N31Y Missense
TG 20T g.chr8: 133882039 T>A L81Q Missense
TMEM132D 20T g.chr12: 129566579 T>A - Splice site
TRPM6 20T g.chr9: 77397384 T>A S1035C Missense
ZNF608 20T g.chr5: 123980182 T>A E1293V Missense
ABCA10 20T g.chr17: 67183857 T>A L765F Missense
CACNA1S 20T g.chr1: 201030546 T>A Y1035F Missense
CACNA1S 20T g.chr1: 201009109 T>A E1824D Missense
KCNA10 20T g.chr1: 111060436 T>A Q325L Missense
KCNH8 20T g.chr3: 19389326 T>A L227H Missense
KCNJ12 20T g.chr17: 21319391 T>A L246Q Missense
SCN1A 20T g.chr2: 166901676 T>A E513D Missense
SLC12A8 20T g.chr3: 124826598 T>A R478W Missense
SLC27A6 20T g.chr5: 128302153 T>A V108E Missense
SLC30A4 20T g.chr15: 45814216 T>A R113W Missense
SLC34A2 20T g.chr4: 25674799 T>A L380Q Missense
SLC6A7 20T g.chr5: 149580653 T>A - Splice site
TRPC3 20T g.chr4: 122820799 T>A N839Y Missense
ABCA8 20T g.chr17: 66925823 T>A Y273F Missense
ATP13A4 20T g.chr3: 193185209 T>A Q337L Missense
CATSPER1 20T g.chr11: 65787795 T>A K686M Missense
KCNQ2 20T g.chr20: 62038530 T>A T696S Missense
SCN10A 20T g.chr3: 38797361 T>A H460L Missense
SLC12A3 20T g.chr16: 56913515 T>A L466H Missense
SLC13A3 20T g.chr20: 45224960 T>A Q188L Missense
SLC19A3 20T g.chr2: 228563932 T>A M167L Missense
SLC1A7 20T g.chr1: 53580467 T>A I132F Missense
SLC26A9 20T g.chr1: 205904913 T>A R12S Missense
SLC7A8 20T g.chr14: 23612372 T>A T184S Missense
SLC8A3 20T g.chr14: 70633441 T>A R567W Missense
SLC8A3 20T g.chr14: 70517824 T>A - Splice site
SLC9A9 20T g.chr3: 143185967 T>A M461L Missense
ABCC8 20T g.chr11: 17470203 T>A K398X Nonsense
ATP7B 20T g.chr13: 52511764 T>A K1251X Nonsense
SLCO1B1 20T g.chr12: 21358925 T>A C485X Nonsense
SLC12A3 20T g.chr16: 56904009 T>A - Splice site
49
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
SLC14A2 20T g.chr18: 43258991 T>A - Splice site
SLC7A8 20T g.chr14: 23597255 T>A K472X Nonsense
DCHS2 20T g.chr4: 155161818 A>T F1955L Missense
CHD6 20T g.chr20: 40162122 T>A S41C Missense
DAGLA 79T g.chr11: 61496472 A>T K281X Nonsense
MGAT5B 79T g.chr17: 74909775 A>T Q350H Missense
MYOZ3 79T g.chr5: 150042513 A>T K4X Nonsense
ASPSCR1 79T g.chr17: 79954475 A>T Q229L Missense
GIMAP5 79T g.chr7: 150439885 A>T S220C Missense
ABCC12 79T g.chr16: 48134841 A>T S994T Missense
B4GALNT3 79T g.chr12: 665985 A>T Q778L Missense
PRICKLE4 79T g.chr6: 41753148 A>T Q151L Missense
KBTBD5 79T g.chr3: 42730169 A>T M461L Missense
AUTS2 79T g.chr7: 70255695 A>T T1165S Missense
FAM135B 79T g.chr8: 139189617 A>T L359H Missense
TRAF3IP1 79T g.chr2: 239237760 A>T Q231L Missense
XIRP2 79T g.chr2: 168103624 A>T T1908S Missense
SLC24A3 79T g.chr20: 19664954 A>T M346L Missense
AATK 79T g.chr17: 79094987 A>T Y917N Missense
SOX11 79T g.chr2: 5832960 A>T E36V Missense
TIPRL 79T g.chr1: 168154055 A>G Y108C Missense
APOH 79T g.chr17: 64216826 A>C F150L Missense
GAPVD1 79T g.chr9: 128067358 A>T Q349L Missense
MMRN1 79T g.chr4: 90830441 A>T Y213F Missense
HHATL 79T g.chr3: 42735194 A>T L388Q Missense
PACSIN1 79T g.chr6: 34498289 A>T K321M Missense
RP13-228J13.10 79T g.chrX: 154539707 A>T N7Y Missense
DPP3 79T g.chr11: 66264867 A>T - Splice site
PAEP 79T g.chr9: 138457182 A>G - Splice site
COL11A1 79T g.chr1: 103469998 A>T - Splice site
USP47 79T g.chr11: 11901748 A>T R16X Nonsense
CEP250 79T g.chr20: 34061712 A>T Q469L Missense
ACAN 79T g.chr15: 89417121 A>T Q2461L Missense
FLYWCH1 79T g.chr16: 2979894 A>T M70L Missense
MYO18B 79T g.chr22: 26177749 A>T S754C Missense
STAB1 79T g.chr3: 52551593 A>T S1531C Missense
50
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
CUBN 79T g.chr10: 17024572 A>T C1536S Missense
CDH16 79T g.chr16: 66946629 A>T L407H Missense
ACAN 79T g.chr15: 89398382 A>T S856C Missense
MOXD1 79T g.chr6: 132619050 A>T L518H Missense
ADAMTS13 79T g.chr9: 136291398 A>T S207C Missense
C7 79T g.chr5: 40959685 A>T S542C Missense
IL22 79T g.chr12: 68647218 A>T L4Q Missense
CDH26 79T g.chr20: 58560054 A>T - Splice site
TNFRSF9 79T g.chr1: 7993305 A>T L199X Nonsense
NAV3 79T g.chr12: 78598880 A>T K2312X Nonsense
FMN2 79T g.chr1: 240256570 A>T Q387H Missense
DACH1 79T g.chr13: 72038864 A>T - Splice site
ARID1A 79T g.chr1: 27100818 A>T - Splice site
PTEN 79T g.chr10: 89692768 A>T - Splice site
SET 79T g.chr9: 131455250 A>T K174I Missense
CHD9 79T g.chr16: 53260328 A>G I649M Missense
ASPHD1 79T g.chr16: 29913022 C>G P244A Missense
EBF1 79T g.chr5: 158158119 C>A K361N Missense
COL12A1 79T g.chr6: 75804894 C>A - Splice site
MLL2 79T g.chr12: 49416411 C>A E5434X Nonsense
MAGEA11 79T g.chrX: 148794916 G>T - Splice site
DPM2 79T g.chr9: 130697930 G>A P20S Missense
METT10D 79T g.chr17: 2344819 G>T L255M Missense
CAD 79T g.chr2: 27460414 G>T - Splice site
TTLL8 79T g.chr22: 50480200 G>A P227L Missense
HRAS 79T g.chr11: 533875 G>T Q61K Missense
DNAH5 79T g.chr5: 13866393 T>A - Splice site
CHD6 79T g.chr20: 40065941 G>T D1347E Missense
COL6A3 79T g.chr2: 238275672 T>A K1720X Nonsense
MAML2 79T g.chr11: 95825260 T>A Q645H Missense
BPI 79T g.chr20: 36963945 T>A S17T Missense
ANKMY1 79T g.chr2: 241421606 T>G Q871P Missense
SCN1A 79T g.chr2: 166170484 A>T I417L Missense
ACVRL1 79T g.chr12: 52306296 T>A L13Q Missense
TLN1 79T g.chr9: 35713279 T>A Q1089L Missense
SIRPB1 79T g.chr20: 1600557 T>A S12C Missense
51
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
GADL1 79T g.chr3: 30898599 T>A Q82L Missense
MAPK15 79T g.chr8: 144804377 T>A S531T Missense
PRRT2 79T g.chr16: 29825053 T>A D226E Missense
GFOD2 79T g.chr16: 67709689 T>A H176L Missense
PKD1L1 79T g.chr7: 47941982 T>A K686N Missense
AC092718.1 79T g.chr16: 81150983 T>A R49W Missense
ASAP3 79T g.chr1: 23769010 T>A Q190L Missense
NFATC1 79T g.chr18: 77170957 T>A C215S Missense
SHANK2 79T g.chr11: 70333244 T>A S694C Missense
S1PR2 79T g.chr19: 10335163 T>A Y140F Missense
PTPRS 79T g.chr19: 5258140 T>A - Splice site
PYCR2 79T g.chr1: 226109073 T>A - Splice site
RASGEF1C 79T g.chr5: 179555612 T>A - Splice site
GS1-541M1.2 79T g.chrX: 26675416 T>A Q67L Missense
MEFV 79T g.chr16: 3293995 T>A K400M Missense
MAP7 79T g.chr6: 136698890 T>A - Splice site
PDE3B 79T g.chr11: 14839832 T>A D542E Missense
NFKBID 79T g.chr19: 36388650 T>A Y122F Missense
BEND3 79T g.chr6: 107391344 T>A S351C Missense
CNNM1 79T g.chr10: 101090451 T>A L436Q Missense
OLA1 79T g.chr2: 174943758 T>A I343F Missense
SCGB1C1 79T g.chr11: 193804 T>A L50M Missense
KDR 79T g.chr4: 55963840 T>A K868I Missense
IGLV7-46 79T g.chr22: 22724367 T>A L88H Missense
DHX15 79T g.chr4: 24538798 T>A L18F Missense
CELSR3 79T g.chr3: 48685734 T>A Y2313F Missense
AFAP1L2 79T g.chr10: 116057049 T>C K746R Missense
ECEL1 79T g.chr2: 233347320 T>A - Splice site
FGFR4 79T g.chr5: 176517643 T>A L115X Nonsense
GHRHR 79T g.chr7: 31011710 T>A - Splice site
NPHS1 79T g.chr19: 36332614 T>A - Splice site
NCOR1 79T g.chr17: 15968256 T>A R1677X Nonsense
ATRX 79T g.chrX: 76855240 T>A D1916V Missense
NCOA2 79T g.chr8: 71039266 T>A Q1233L Missense
LRRK2 79T g.chr12: 40668799 T>A - Splice site
DCHS2 79T g.chr4: 155225939 A>T F1374L Missense
52
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
USH2A 79T g.chr1: 216256802 A>T L1765H Missense
CREBBP 79T g.chr16: 3828009 T>A - Splice site
CDH10 79T g.chr5: 24509898 T>A T345S Missense
GATA1 80T g.chrX: 48650728 A>G - Splice site
DNAI2 80T g.chr17: 72306154 A>T - Splice site
WBP5 80T g.chrX: 102612541 A>T - Splice site
RP13-77O11.2 80T g.chrX: 52862803 A>T Y49X Nonsense
GUCY1B2 80T g.chr13: 51594629 A>T L353X Nonsense
PRRC2A 80T g.chr6: 31593091 A>T - Splice site
KIF19 80T g.chr17: 72356353 A>T - Splice site
AC036111.1 80T g.chr11: 55522736 A>T E25V Missense
PARP14 80T g.chr3: 122414327 A>T Q218L Missense
ZNF229 80T g.chr19: 44932813 A>T C715S Missense
SEMA3F 80T g.chr3: 50220092 A>T H260L Missense
ACSF2 80T g.chr17: 48541261 A>T M377L Missense
STK19 80T g.chr6: 31946765 A>T Y218F Missense
KCNH8 80T g.chr3: 19432059 A>T T300S Missense
GPR149 80T g.chr3: 154138891 A>T S520R Missense
ALOX12B 80T g.chr17: 7984263 A>T Y156N Missense
KRT8P14 80T g.chrX: 45491705 A>T Y422N Missense
UGT2B4 80T g.chr4: 70361452 A>T L43Q Missense
ZNF41 80T g.chrX: 47307450 A>T H573Q Missense
SYT9 80T g.chr11: 7324516 A>T H131L Missense
LAMB4 80T g.chr7: 107703374 A>T C1043S Missense
DBH 80T g.chr9: 136508539 A>T E250V Missense
GRM7 80T g.chr3: 7188338 A>T Q240L Missense
BMPER 80T g.chr7: 33976899 A>T - Splice site
C7orf63 80T g.chr7: 89929179 A>T - Splice site
IL21R 80T g.chr16: 27455861 A>T - Splice site
CTC-348L14.1 80T g.chr5: 82837510 A>T L29X Nonsense
GLI2 80T g.chr2: 121726477 A>T S148C Missense
RAD51L1 80T g.chr14: 69061307 A>T Q381L Missense
C14orf159 80T g.chr14: 91626745 A>T - Splice site
SLC22A10 80T g.chr11: 63065197 A>T - Splice site
USP13 80T g.chr3: 179479042 A>T - Splice site
USP30 80T g.chr12: 109509505 A>G H190R Missense
53
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
ARHGAP9 80T g.chr12: 57872524 A>T H111Q Missense
UNC5A 80T g.chr5: 176289631 A>T Q26L Missense
OTOG 80T g.chr11: 17632801 A>T Q1997L Missense
IMPG1 80T g.chr6: 76640727 A>T L729H Missense
KDM6A 80T g.chrX: 44896898 A>T - Splice site
NCOA7 80T g.chr6: 126202293 A>T S173C Missense
SETD1A 80T g.chr16: 30972738 A>T K133X Nonsense
LRRK2 80T g.chr12: 40699751 A>T K1314N Missense
HERC6 80T g.chr4: 89319318 A>T H350L Missense
FLT4 80T g.chr5: 180046770 C>A - Splice site
DSCR4 80T g.chr21: 39325190 C>A D117Y Missense
PDCD11 80T g.chr10: 105201680 C>G P1552R Missense
CACNA1G 80T g.chr17: 48674220 C>A A1065E Missense
SETD7 80T g.chr4: 140439168 C>G G264A Missense
WDR27 80T g.chr6: 170038699 G>A S602F Missense
RP5-1158E12.2 80T g.chrX: 45772886 G>C W68S Missense
HSPA7 80T g.chr1: 161576646 G>A R189Q Missense
ITFG2 80T g.chr12: 2927518 G>T - Splice site
HMCN1 80T g.chr1: 186010203 G>A R2080K Missense
CHD5 80T g.chr1: 6185888 G>C S1370X Nonsense
NPC2 80T g.chr14: 74946993 T>A - Splice site
INSL5 80T g.chr1: 67263854 T>A K84X Nonsense
FSIP2 80T g.chr2: 186661198 T>A L3201X Nonsense
HPX 80T g.chr11: 6461466 T>A R89X Nonsense
BCAS1 80T g.chr20: 52601879 T>A K363X Nonsense
KBTBD6 80T g.chr13: 41706643 T>A Q2L Missense
EPB41L4B 80T g.chr9: 111947779 T>A K803I Missense
RP11-118F2.3 80T g.chr9: 94789814 T>A L12H Missense
FAP 80T g.chr2: 163046259 T>C K486E Missense
SLC25A39 80T g.chr17: 42399905 T>A Q69L Missense
OR51B5 80T g.chr11: 5364529 T>A M76L Missense
IPCEF1 80T g.chr6: 154481085 T>A T399S Missense
HTR1A 80T g.chr5: 63257471 T>A T26S Missense
GSDMC 80T g.chr8: 130762242 T>A I403L Missense
MYOM2 80T g.chr8: 2021455 T>A F332Y Missense
ABCC9 80T g.chr12: 21981977 T>A K1195I Missense
54
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
GFRA1 80T g.chr10: 117825133 T>A Q401L Missense
GPR142 80T g.chr17: 72368656 T>A C436S Missense
PTGFR 80T g.chr1: 79002259 T>A C323S Missense
BCAR3 80T g.chr1: 94037273 T>A D643V Missense
LRRC66 80T g.chr4: 52861278 T>A Q637L Missense
POSTN 80T g.chr13: 38158986 T>A E325D Missense
SPOCK2 80T g.chr10: 73832299 T>C Y69C Missense
STAB2 80T g.chr12: 104098351 T>A C1287S Missense
DMD 80T g.chrX: 31676108 T>A - Splice site
MYH11 80T g.chr16: 15847367 T>A - Splice site
COL2A1 80T g.chr12: 48373349 T>A - Splice site
C1orf61 80T g.chr1: 156374287 T>A Q12L Missense
AP002512.1 80T g.chr11: 56216444 T>A C99X Nonsense
RP11-293F5.8 80T g.chr1: 21737997 T>C R153G Missense
TM6SF2 80T g.chr19: 19380986 T>A R133X Nonsense
AP001482.1 80T g.chr11: 88845898 T>A C6S Missense
ARHGAP17 80T g.chr16: 24979666 T>A - Splice site
TNR 80T g.chr1: 175334147 T>A - Splice site
TTC5 80T g.chr14: 20766956 T>A - Splice site
WDR52 80T g.chr3: 113152477 T>A E12V Missense
CLNK 80T g.chr4: 10509653 T>A Q305L Missense
ANPEP 80T g.chr15: 90335737 T>A E769V Missense
TXNDC11 80T g.chr16: 11823782 T>A N255I Missense
MLLT3 80T g.chr9: 20448215 T>A E109V Missense
EPHA5 80T g.chr4: 66230888 T>C K695E Missense
NCOA2 80T g.chr8: 71041023 T>A R1173X Nonsense
ATR 80T g.chr3: 142211983 T>A K2023N Missense
SETD2 80T g.chr3: 47162530 T>A Q1199L Missense
CHD8 80T g.chr14: 21875046 T>A K680M Missense
JMJD6 80T g.chr17: 74720048 T>A K204M Missense
MLL2 80T g.chr12: 49442554 T>A - Splice site
TP53 80T g.chr17: 7579494 A>T R65X Nonsense
SETX 80T g.chr9: 135205119 T>A - Splice site
DNAH9 80T g.chr17: 11572858 A>T I1034F Missense
CDH10 80T g.chr5: 24509750 T>A H394L Missense
ADAMTSL1 80T g.chr9: 18723054 A>T L416H Missense
55
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
USP5 100T g.chr12: 6969313 A>T - Splice site
RAB11FIP4 100T g.chr17: 29854858 A>T - Splice site
DIRC2 100T g.chr3: 122564608 A>T - Splice site
PTPN7 100T g.chr1: 202127440 A>T L44Q Missense
FGD4 100T g.chr12: 32729289 A>T Q46H Missense
LYPD6B 100T g.chr2: 150061614 A>T K67X Nonsense
KALRN 100T g.chr3: 124114272 A>T N95Y Missense
DLG3 100T g.chrX: 69712449 A>T - Splice site
PPRC1 100T g.chr10: 103904062 A>T - Splice site
SSTR3 100T g.chr22: 37603714 A>T S43R Missense
PLEKHA2 100T g.chr8: 38801329 A>T Q74L Missense
DTX1 100T g.chr12: 113515429 A>T N154Y Missense
FANCI 100T g.chr15: 89807826 A>T Q248L Missense
PPARD 100T g.chr6: 35391914 A>T S206C Missense
MED12L 100T g.chr3: 151148098 A>T Q2105H Missense
TRPC1 100T g.chr3: 142503867 A>T I394F Missense
TNFSF14 100T g.chr19: 6667432 A>T L83Q Missense
KRT7 100T g.chr12: 52628958 A>T Q115L Missense
ADARB1 100T g.chr21: 46603371 A>T T448S Missense
OR5I1 100T g.chr11: 55703374 A>T L168Q Missense
TNR 100T g.chr1: 175348866 A>T D595E Missense
IGHV5-51 100T g.chr14: 107034944 A>T Y46N Missense
GSG1L 100T g.chr16: 27895848 A>T L170H Missense
LMNA 100T g.chr1: 156105102 A>T - Splice site
G3BP1 100T g.chr5: 151178771 A>T - Splice site
DNAH17 100T g.chr17: 76565246 A>T - Splice site
DECR2 100T g.chr16: 455541 A>T S60C Missense
SLC47A2 100T g.chr17: 19618217 A>T L146H Missense
WAS 100T g.chrX: 48544472 A>T R170X Nonsense
ADAMTSL1 100T g.chr9: 18574247 A>T I153F Missense
IDH3G 100T g.chrX: 153055624 A>T C29S Missense
FITM1 100T g.chr14: 24602021 A>T K290X Nonsense
PLXNA1 100T g.chr3: 126741156 A>T K1400X Nonsense
SCML4 100T g.chr6: 108042098 A>T L261X Nonsense
SERPINB11 100T g.chr18: 61387273 A>T S54C Missense
CHD6 100T g.chr20: 53358140 A>T Q2660L Missense
56
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
AE000659.8 100T g.chr14: 22309774 A>T Q53L Missense
MLL 100T g.chr11: 118374588 A>T K2658X Nonsense
KDM6A 100T g.chrX: 44870204 A>T - Splice site
SETD1A 100T g.chr16: 30982741 A>T Y1020F Missense
ARID4A 100T g.chr14: 58814448 A>T E419V Missense
MLLT10 100T g.chr10: 22002766 A>T T605S Missense
SETBP1 100T g.chr18: 42531454 A>T K717X Nonsense
LRRK2 100T g.chr12: 40626129 A>T Q97H Missense
NCOA1 100T g.chr2: 24930115 A>T L592F Missense
ARID1A 100T g.chr1: 27097784 A>T K1125X Nonsense
ARID2 100T g.chr12: 46285700 A>T - Splice site
VSIG4 100T g.chrX: 65253560 C>T W56X Nonsense
MOGAT3 100T g.chr7: 100839260 C>G G264R Missense
DNAH6 100T g.chr2: 84756185 C>A P186H Missense
IARS 100T g.chr9: 95025273 C>A V589L Missense
PSD 100T g.chr10: 104175876 C>A - Splice site
MLL 100T g.chr11: 118343798 C>T L642F Missense
SUPT3H 100T g.chr6: 45289535 G>T - Splice site
SYNPO2L 100T g.chr10: 75406610 G>A P934S Missense
RASL11A 100T g.chr13: 27847371 G>T V157L Missense
EPHA5 100T g.chr4: 66356195 G>T H434Q Missense
SAFB2 100T g.chr19: 5598897 T>G - Splice site
REXO4 100T g.chr9: 136276195 T>A - Splice site
DDX60 100T g.chr4: 169206651 T>A - Splice site
ATP8B4 100T g.chr15: 50294422 T>A - Splice site
SHPK 100T g.chr17: 3533642 T>A - Splice site
LRRIQ1 100T g.chr12: 85547825 T>A L1558X Nonsense
LRRC48 100T g.chr17: 17907855 T>A L244Q Missense
CENPI 100T g.chrX: 100375419 T>A L207X Nonsense
XIST 100T g.chrX: 73047745 T>A M26L Missense
COL22A1 100T g.chr8: 139606430 T>A Y1482F Missense
RGS9 100T g.chr17: 63159213 T>A - Splice site
HMSD 100T g.chr18: 61627528 T>C F120S Missense
APC 100T g.chr5: 112179558 T>C I2756T Missense
TRIM9 100T g.chr14: 51450125 T>A T599S Missense
CCDC135 100T g.chr16: 57735981 T>A L213Q Missense
57
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
SACS 100T g.chr13: 23914405 T>A S1204C Missense
RYR1 100T g.chr19: 39009953 T>A V3373E Missense
XRN1 100T g.chr3: 142090087 T>A E1021V Missense
ADAMTS9 100T g.chr3: 64606882 T>A E907D Missense
KDR 100T g.chr4: 55970893 T>A Q635L Missense
OR2A13P 100T g.chr7: 143839246 T>A C7S Missense
DHX37 100T g.chr12: 125441411 T>A Q760L Missense
IGLV4-60 100T g.chr22: 22516934 T>A S74R Missense
YME1L1 100T g.chr10: 27436564 T>A I68F Missense
WDR49 100T g.chr3: 167245702 T>A E485V Missense
AE000660.10 100T g.chr14: 22600551 T>A F13I Missense
LRIG3 100T g.chr12: 59282252 T>A Q269L Missense
ITGA4 100T g.chr2: 182343486 T>A Y187N Missense
GPR77 100T g.chr19: 47844093 T>A Y13N Missense
FAM123A 100T g.chr13: 25744200 T>A S401C Missense
SLC28A2 100T g.chr15: 45557339 T>A L252Q Missense
ASTN2 100T g.chr9: 119249665 T>A K1153M Missense
DBH 100T g.chr9: 136516823 T>A V420E Missense
TYR 100T g.chr11: 88911227 T>A C36S Missense
SLC25A22 100T g.chr11: 792199 T>A Q254L Missense
HSPBAP1 100T g.chr3: 122496755 T>A - Splice site
TMEM44 100T g.chr3: 194344041 T>A - Splice site
PHF21B 100T g.chr22: 45285664 T>A - Splice site
DAB2 100T g.chr5: 39392567 T>A - Splice site
FBN2 100T g.chr5: 127599350 T>A - Splice site
MAL 100T g.chr2: 95719326 T>A C15S Missense
OTOF 100T g.chr2: 26702137 T>A K737X Nonsense
ERCC3 100T g.chr2: 128036750 T>A K577X Nonsense
RYR1 100T g.chr19: 38958256 T>A V1062E Missense
PRF1 100T g.chr10: 72360180 T>A Q160L Missense
ATP13A1 100T g.chr19: 19770717 T>A H159L Missense
NFXL1 100T g.chr4: 47896219 T>A Q477L Missense
PHKA2 100T g.chrX: 18917328 T>A Q1025L Missense
SHANK2 100T g.chr11: 70332859 T>A Q822L Missense
DNMT3A 100T g.chr2: 25497835 T>A E205V Missense
NCOA6 100T g.chr20: 33338025 T>A Q658L Missense
58
Supplementary Table 8 continued
Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs
Gene Symbol Sample
ID Nucleotide (Genomic)
AA
Change Change Type
ARID4B 100T g.chr1: 235331959 T>A T1274S Missense
HERC1 100T g.chr15: 63950744 T>A - Splice site
MLL2 100T g.chr12: 49427737 T>A Q3584L Missense
PIK3CG 100T g.chr7: 106545627 T>A L1035Q Missense
SMARCA1 100T g.chrX: 128602884 T>A - Splice site
BRIP1 100T g.chr17: 59861787 T>A - Splice site
SETX 100T g.chr9: 135205675 T>A Y437F Missense
DCHS2 100T g.chr4: 155242017 C>G D1057H Missense
USH2A 100T g.chr1: 216074128 T>A R2474W Missense
DNAH9 100T g.chr17: 11597768 A>T - Splice site
MYO5C 100T g.chr15: 52543631 T>A R540X Nonsense
CREBBP 100T g.chr16: 3832778 T>A N494Y Missense
59
Supplementary Table 9
The effect of +/− one base flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Triplet sequence Number of occurrences in reference unspliced
transcript regions
Whole genome mutations in unspliced transcripts (sense
strand) per million triplet occurrences
TAG 21,385,856 489.1
CAG 33,081,154 457.4
TAC 17,558,516 361.5
CTA 19,584,816 252.6
TAT 32,436,138 247.3
CAT 28,802,535 236.9
TAA 31,522,442 231.6
CTG 34,400,990 225.8
CAC 23,010,091 222.3
CAA 26,804,923 214.6
GTA 18,835,554 182.5
AAG 30,295,166 161.1
GAG 28,017,769 160.6
ATA 30,643,440 130.1
ATG 29,057,712 129.5
GTG 26,485,820 116.1
TTA 33,840,449 112.1
TTG 32,831,444 106.7
CTT 33,446,798 83.3
GAA 30,134,519 81.7
CTC 27,481,282 80.6
GAC 15,016,072 74.8
CCG 5,119,091 63.9
AAC 21,105,949 56.6
GAT 21,869,739 55.7
AAA 54,647,870 51.2
AAT 37,056,182 44.9
CGG 5,052,009 44.9
TTC 32,138,122 43
GCG 4,495,083 41.2
GTC 15,927,567 39.4
ACG 4,145,742 38.6
CCC 21,932,371 35.1
60
Supplementary Table 9 continued
The effect of +/− one base flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Triplet sequence Number of occurrences in reference unspliced
transcript regions
Whole genome mutations in unspliced transcripts (sense
strand) per million triplet occurrences
CGT 4,584,943 32.1
GTT 25,336,011 31.8
ATC 20,434,461 26.8
GGG 22,783,978 26.6
CGC 4,320,937 26.4
TTT 66,478,696 26.1
CCA 28,970,783 26
ATT 40,028,353 24.6
TCG 3,948,305 23.8
TGG 31,610,504 20.7
CGA 3,752,064 19.5
CCT 29,864,527 17.6
AGG 29,156,183 16.5
GGA 25,390,838 14.1
TCC 25,158,132 14.1
ACC 18,338,132 10.7
TCT 36,335,394 10.2
AGA 33,959,533 10
GCC 20,457,256 9.8
GGT 20,012,098 9.2
GGC 20,417,785 9
TGA 31,935,743 8.5
ACA 29,224,997 8.1
TCA 30,539,328 7.6
TGT 35,004,257 6.6
ACT 24,980,528 5.7
AGT 26,982,802 5.1
GCA 22,962,367 5.1
TGC 24,226,432 4.5
AGC 22,681,450 3.7
GCT 23,733,151 3.3
61
Supplementary Table 10
The effect of +/− one base flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Triplet sequence
Number of occurrences in
reference outside transcribed regions
Whole genome intergenic mutations
per million triplet occurrences
CTA : TAG 35,286,346 514.2
CAG : CTG 52,807,266 509.3
GTA : TAC 30,703,409 398.2
ATG : CAT 50,755,626 265.4
CAC : GTG 39,654,427 264.4
ATA : TAT 58,565,776 259.8
TAA : TTA 57,491,649 245
CAA : TTG 52,404,647 233.8
CTC : GAG 44,367,158 180.3
AAG : CTT 54,359,348 177.7
CCG : CGG 6,372,732 91
GAA : TTC 54,396,287 90.1
GAC : GTC 25,085,578 82.5
AAC : GTT 39,807,488 69.1
ATC : GAT 36,676,580 59.5
CGC : GCG 5,451,325 57.3
AAA : TTT 105,971,797 54.3
AAT : ATT 70,126,249 49.6
ACG : CGT 6,231,337 48.5
CCC : GGG 33,505,642 47.7
CCA : TGG 48,825,083 35.2
AGG : CCT 46,388,989 29.7
CGA : TCG 5,451,878 29.7
GGA : TCC 41,028,220 22.7
ACC : GGT 30,626,984 16.4
AGA : TCT 60,587,491 15.5
GCC : GGC 29,932,276 14.6
TCA : TGA 53,457,500 14
ACA : TGT 55,112,234 11.4
ACT : AGT 43,279,543 9.4
GCA : TGC 38,228,283 8.7
AGC : GCT 36,563,127 6.3
62
Supplementary Table 11
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences ATAGG 1,196,711 819.7
CTAGG 1,156,560 721.1
CTAGC 824,340 691.5
ATAGC 1,097,065 674.5
ACAGG 2,179,044 660.8
ATAGA 1,825,784 660
ACAGC 1,468,809 630.4
CTAGA 1,343,347 616.4
GTAGG 980,728 599.6
ACAGT 1,893,468 538.7
ATACA 2,175,087 530.6
CTACA 1,386,997 523.4
CCAGG 2,920,014 515.8
ACAGA 2,556,782 506.5
GCAGG 2,005,285 505.2
GTACA 1,125,872 472.5
ATAGT 1,607,995 458.3
CCAGA 1,968,757 455.1
CTATG 1,252,622 441.5
GTAGA 1,498,367 433.8
GCAGC 1,521,819 427.8
GTAGC 1,033,949 427.5
TCAGG 2,157,273 426.5
CTAAG 1,180,278 425.3
CCAGT 1,678,066 424.3
CTAGT 1,023,498 421.1
CTATA 1,354,815 420
ATACC 1,008,599 414.4
GCAGA 1,956,583 413.5
CCTAT 1,123,682 412.9
CTACC 915,021 407.6
ATACT 1,540,424 403.8
ACATA 2,028,968 396.8
63
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences CCAGC 2,637,820 396.2
ATATG 2,076,349 394.9
TTAGG 1,442,396 394.5
ATACG 216,776 382.9
ATATA 3,382,612 382.2
ATAAG 1,576,073 375.6
ACATG 2,054,573 370.4
GCAGT 1,907,864 367.4
CCATG 1,955,349 364.1
GTAGT 1,169,626 363.4
TCAGC 1,997,426 358
CTACT 1,566,215 357.5
GTATG 1,181,594 357.1
CTAAC 863,735 356.6
CCATA 1,210,080 355.3
GCTAT 1,101,373 355
TCAGA 2,192,906 350.2
TCAGT 2,036,941 350
TTAGC 1,393,011 345.3
ACACA 2,558,406 340.4
ACAAG 1,505,533 338.1
CCTAG 1,143,756 335.7
GTAAG 1,125,649 334
ACACG 336,252 330.1
ACACT 1,512,147 328
ATAAC 1,245,568 327.6
TCTAT 1,793,844 327.2
GTATA 1,468,668 326.1
GTACT 948,726 322.5
GCTGT 1,782,660 316.4
TCTAG 1,414,524 310.4
GCTAG 862,816 308.3
CTATC 912,009 300.4
64
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences CCTGT 2,378,443 297.7
ATATC 1,408,600 294.6
CCAAG 1,787,484 292
TTAGA 1,930,896 287.9
GCATA 1,110,806 287.2
GCATG 1,655,026 285.8
CCTAC 861,548 279.7
ACATC 1,285,663 279.2
ACAAC 1,126,435 277
GTATC 875,253 270.8
ACACC 1,223,095 269
GTACC 681,562 268.5
GCACA 1,534,212 266.6
CCTGG 3,036,580 266.1
TGTAT 2,627,279 263
TCTAC 1,310,024 261.1
ACAAT 1,751,803 260.9
CAAGG 1,629,941 260.1
AGAGG 2,235,125 259
ACTAT 1,395,163 258.8
TAAGG 1,264,331 257.8
TTACA 2,386,112 255.2
CCTGC 2,068,227 254.8
ACTGT 1,935,065 251.2
CTAAA 2,028,241 249.5
CTATT 1,894,484 248.1
CTACG 161,886 247.1
ATAAA 3,870,624 246.7
TCTGT 3,129,571 246.4
CCACA 1,918,264 239.8
GTAAC 839,636 239.4
TGTAG 1,700,884 237.5
GCAAG 1,332,614 236.4
65
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences CTAAT 1,719,130 235.6
ACTGG 1,557,983 234.3
GTACG 124,926 232.1
CATAG 1,321,681 229.3
TCTGG 2,123,671 227.9
TTAGT 1,860,278 227.4
ATAAT 2,868,687 227.3
CCAAT 1,139,360 226.4
TATAG 1,511,726 226.2
TTACC 1,148,826 223.7
CCACT 1,839,226 223.5
GCATC 1,025,348 222.4
GCACT 1,497,985 222.3
GCTAC 909,960 222
TTAAG 1,914,895 220.9
CATAT 2,024,901 219.3
GGTAT 1,103,551 219.3
CCAAC 1,150,566 217.3
CATGT 2,232,017 215.9
GTAAA 1,899,997 215.8
TCAAG 1,950,784 215.8
TGTAC 1,176,813 215
CGTAT 237,460 214.8
CTTAG 1,340,465 214.1
AAAGG 2,275,558 214
ACATT 2,614,712 213.8
CCAAA 2,334,491 213.8
TCATA 1,719,856 213.4
CAAGC 1,283,238 212
TTACT 1,960,114 211.7
GCTGC 1,599,146 211.4
ATATT 3,544,566 211
ACAAA 3,127,568 209.4
66
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences CCATC 1,669,035 209.1
GCACG 335,795 208.5
CCTAA 1,239,069 207.4
TATGT 2,347,689 202.8
GGTAG 1,019,912 202
GCAAT 1,324,115 201.6
CCACG 460,938 199.6
TCTGC 2,065,759 197.5
CCTGA 2,157,244 195.2
TTACG 210,494 194.8
TTATG 1,996,409 194.8
GCTGG 2,881,658 194
AGTAT 1,664,023 193.5
GTAAT 1,772,965 193.5
TAAGC 1,028,832 193.4
CTTAT 1,727,514 192.8
GCAAC 1,044,025 192.5
ACTAG 926,779 192.1
AGAGC 1,613,921 191.5
CAAGA 1,991,200 190.8
CGAGG 457,087 188.1
TGAGG 2,446,641 187.2
TCATG 1,981,533 185.7
ACTGA 1,774,841 183.7
CATGG 1,927,451 183.1
GAAGG 1,927,260 182.1
TATAT 3,554,080 182
AGAGA 3,208,362 178.3
GCACC 1,082,671 178.3
ACTAC 948,651 178.1
GCTGA 1,965,043 178.1
CGTAC 118,678 176.9
AGTAG 1,773,315 175.9
67
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences TATGG 1,313,356 175.9
CGTAG 188,618 175
TTAAC 1,432,050 174.6
GATAG 1,022,660 174.1
TCAAC 1,105,819 173.6
GTATT 2,106,637 169.9
GCTAA 1,302,400 166.6
GTTAG 1,048,024 166
TCTGA 2,219,897 165.8
ACTGC 1,723,894 165.3
TAAGA 1,851,381 165.3
CAAGT 1,671,961 165.1
CATAC 1,022,951 164.2
TCACA 2,061,956 163.4
TCACT 2,368,410 163.4
TGTGT 3,431,103 162.9
CTTAC 1,102,458 162.4
TATAC 1,335,228 161.8
TCAAT 1,563,226 161.2
AGTGT 1,851,038 160.5
CCATT 2,129,170 160.2
TCTAA 1,759,834 159.7
TGAGC 1,986,513 158.6
TATGC 1,101,358 158
CATGC 1,588,965 156.1
AGAGT 1,891,490 155.4
GATAT 1,500,074 155.3
AAAGC 1,845,144 155
CCACC 2,170,438 153
GGTAC 661,436 152.7
GCAAA 1,869,491 152.4
AGTAC 924,188 151.5
TTATA 2,784,296 149.1
68
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences GGAGG 2,880,335 147.6
CTTGT 1,862,880 147.1
TCACG 447,573 145.2
AGAAG 2,624,991 144.8
CGTGT 425,183 143.5
CGAGA 478,705 142
GGAGC 1,316,451 141.3
TGAGA 2,868,040 140.9
CGAAG 235,543 140.1
CCTTG 1,843,788 139.9
GAAGC 1,415,080 139.2
GATGT 1,487,539 139.2
TAAGT 1,652,007 139.2
CGAGC 216,337 138.7
CCTTA 1,282,901 138
GTTAC 897,669 137
TTAAA 4,220,733 136.9
TTATC 1,622,110 136.9
CTTGG 2,031,681 134.9
GCATT 1,757,313 134.3
TGTGG 2,301,303 134.3
TGAGT 1,938,112 133.6
TGTGC 1,796,759 133.6
TCATC 1,640,929 133.5
AGTGG 2,051,485 133.1
TCAAA 2,689,897 132.3
ATTGT 2,211,104 132.1
TTAAT 2,961,609 132
TCACC 1,661,133 131.2
TTTAG 2,491,824 130.8
TGTAA 2,445,904 130
CATGA 1,899,001 129
GTTGT 1,533,215 128.5
69
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences GGTGT 1,506,228 127.5
AATAG 1,871,604 126.6
CCTCT 2,297,690 124
CTTGC 1,459,495 124
AAAGA 3,599,258 123.4
ATTGG 1,422,826 123
GTTAT 1,464,484 122.9
GATAC 832,386 122.5
GGAGA 2,345,781 117.2
TGAAG 2,273,317 117
CAAAG 2,378,782 116.9
GCCGT 248,314 116.8
TTTAC 1,995,555 116.8
CGAGT 284,243 116.1
GAAGA 2,341,241 115.8
ATTAG 1,746,436 115.7
GATGC 1,029,408 115.6
GTTGG 1,459,066 113.8
CAATG 1,452,710 113.6
ACTAA 1,490,471 112.7
GGAGT 1,848,924 112.5
TTTAT 4,782,503 111.7
GGTAA 1,140,608 111.3
TGACA 1,798,112 110.7
CCTTT 2,671,626 108.2
CGACC 157,280 108.1
ATTAT 3,002,047 107.6
CGTAA 204,686 107.5
AATGT 2,624,276 107.1
CTTGA 2,081,248 107.1
AATAT 3,333,236 106.2
CAACA 1,851,159 105.9
AGTGC 1,663,404 105.8
70
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences AGTAA 1,847,337 105
GATGG 1,813,464 104.2
TTTGT 4,264,759 103.9
AGACA 2,211,856 103.5
CGTGG 517,850 102.3
GCTCT 1,700,767 102.3
TCATT 2,961,630 100.6
ATTGC 1,523,508 99.8
TAAAG 2,183,525 99.8
GAAGT 1,726,675 99.6
TTTGG 2,875,060 99.5
TCTCT 3,430,559 99.1
CAATA 1,541,229 98.6
GCTTG 1,439,693 98.6
CCTCA 2,368,321 98.4
AATGG 2,008,464 97.6
GTTAA 1,481,224 97.2
GCTTA 1,123,982 97
CCTCG 467,262 96.3
ACCGT 281,526 95.9
AGAAC 1,604,320 95.4
ACGGC 201,551 94.3
CATAA 1,700,150 94.1
TAACA 1,753,204 94.1
GTTGA 1,393,457 94
TATGA 1,744,778 94
AAAGT 2,789,924 93.6
TGTGA 2,405,210 93.5
CGACG 53,661 93.2
GTTGC 1,225,081 93.1
CGATA 140,380 92.6
GGAAG 2,276,439 92.2
GAAAG 2,390,000 92.1
71
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences TCTTG 2,409,905 92.1
ACTTG 1,741,874 91.3
AGATA 1,916,407 90.8
CGATG 220,508 90.7
CGTGC 367,049 89.9
CCTTC 1,995,653 89.7
CCCGC 462,974 88.6
GGTGG 2,286,103 88.4
AGATG 2,280,579 88.1
ACTTA 1,591,657 88
CCCGT 343,210 87.4
TGATA 1,712,570 86.4
CGACT 163,151 85.8
CGACA 211,039 85.3
TTTGC 2,269,755 85
GCTTC 1,480,116 84.5
AGTGA 2,455,577 84.3
AGAAT 2,807,086 84.1
CTTAA 1,871,029 83.4
TGATG 2,012,580 83
ATTGA 1,915,648 82
TATAA 2,589,257 81.1
AATAC 1,750,376 80
TAATG 1,893,230 79.8
TGACT 1,717,370 79.8
ACTCT 1,784,576 79.6
CAAAT 2,490,445 79.5
GGTGC 1,169,212 79.5
ACGGT 251,817 79.4
TAAAC 1,564,335 78.6
CAACT 1,275,167 78.4
TTATT 4,573,537 78.1
ACTCA 1,644,045 77.9
72
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences ATTAC 1,710,030 77.8
TGAAC 1,625,556 77.5
ACCGC 246,239 77.2
AATGC 1,573,543 76.9
AGACG 416,851 76.8
CGAAC 185,317 75.5
TCTTA 2,043,125 75.4
TGAAT 2,537,051 75.3
GCCGG 439,091 75.2
TACGC 119,692 75.2
TCCGT 320,449 74.9
TATCG 147,372 74.6
TTTGA 3,301,776 74.2
TGACG 229,315 74.1
CAAAC 1,575,197 73.6
GCTCG 217,848 73.4
AAAAG 3,557,820 72.8
CGAAT 192,395 72.8
CGTGA 497,670 72.3
AGACT 1,734,014 72.1
GCTCA 1,887,082 72.1
GATGA 1,692,502 71.5
CGTCG 56,374 71
CTTCT 2,826,827 70.8
GCCGC 312,289 70.4
GCGGT 258,027 69.8
CTTCA 2,187,966 69
TGAAA 3,474,874 67.9
TCTCA 2,756,477 67.5
TTTAA 4,480,310 67.4
CCTCC 2,927,702 67.3
GGAAC 1,084,026 67.3
GGTGA 1,759,283 67.1
73
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences GCTTT 2,240,670 66.9
GGATG 1,508,733 66.9
AGAAA 4,454,267 66.7
TCTTT 4,488,015 66.4
TAATA 2,516,522 66
TACGT 278,358 64.7
GGCGT 454,366 63.8
AAACG 329,619 63.7
TCTCC 2,350,751 63
CACGT 430,088 62.8
AGACC 1,363,176 62.4
AAACA 3,220,352 61.8
CTTCG 261,669 61.1
ACCGG 181,150 60.7
TCCGC 329,980 60.6
TCGGT 198,165 60.6
GGATA 1,124,692 60.5
ATTAA 2,697,424 60.4
TCTTC 2,516,257 60.4
ACGGG 333,595 60
CGCGC 184,400 59.7
TGTTA 2,077,631 59.2
CGAAA 254,428 59
GATAA 1,538,897 58.5
AATGA 2,670,950 58.4
ACTCG 256,893 58.4
CAATC 995,390 58.3
CTTTG 2,842,703 57.3
CAACC 1,106,715 56.9
TGCGT 298,767 56.9
CTTTA 2,397,027 56.7
GAACA 1,657,449 56.7
GACGT 231,783 56.1
74
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences GAATA 1,862,184 55.3
GGACA 1,373,640 55.3
ACCGA 181,101 55.2
TGACC 1,325,013 54.3
AAAAC 2,843,978 53.4
TAACT 1,553,687 53.4
ACTTT 2,955,966 53.1
TAAAT 3,470,005 53
TGTCT 2,636,640 52.7
GTTCT 1,863,119 52.6
CATTG 1,805,412 52.1
TCTCG 521,574 51.8
GCTCC 1,237,058 51.7
CCCCT 1,549,404 51
TGTTG 2,472,227 51
TAACG 176,992 50.8
TCCCT 2,206,139 50.8
GAAAC 1,814,590 50.7
GCGTA 137,955 50.7
TATTG 1,959,216 50
ACGTA 260,967 49.8
CGATC 240,798 49.8
TGTCA 1,968,231 49.8
CCGTC 343,390 49.5
GGAAT 1,760,350 49.4
TGCGC 264,078 49.2
AACGC 162,959 49.1
ACTCC 1,631,453 49
CTTCC 2,335,942 48.8
CCGGC 435,045 48.3
TAACC 896,878 47.9
GAATG 1,924,126 47.8
GGAAA 2,742,646 47.4
75
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences GTTCA 1,709,786 47.4
ACGCT 232,572 47.3
GCGCG 190,401 47.3
CCGTA 169,716 47.1
CACGG 363,943 46.7
AAATG 3,606,681 46.6
CAACG 193,710 46.5
ACTTC 1,620,365 46.3
AGGGG 1,515,646 46.2
ATTCT 2,940,287 45.9
GGACG 240,819 45.7
AGATC 1,363,087 45.5
AGTCG 176,040 45.4
GGACT 1,218,661 45.1
TGCGG 310,425 45.1
AGCGC 223,129 44.8
AAATA 4,768,519 44.7
GTTCG 201,328 44.7
AGTCA 1,637,059 44.6
TAAAA 4,800,838 44.6
GTCGG 180,588 44.3
GCCGA 361,825 44.2
AATAA 3,733,046 43.9
GGCGC 433,487 43.8
TGTCG 255,141 43.1
ACGCG 69,711 43
TCCGA 188,225 42.5
ACCAT 1,799,498 42.2
CCGGT 189,741 42.2
ACCCT 1,376,458 42.1
GCGGG 479,786 41.7
ACGTT 339,505 41.2
CAAAA 3,576,736 41.1
76
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences GATTG 1,293,906 41
CCGGA 246,274 40.6
GACGC 172,356 40.6
CGTCT 419,530 40.5
AACGT 323,346 40.2
CGTTC 224,657 40.1
TGATC 1,502,826 39.9
CATTA 1,789,486 39.7
GAAAT 2,882,921 39.5
AGCGG 278,882 39.4
TAATC 1,589,175 39
CATCA 1,749,553 38.9
ATCGC 257,954 38.8
CGCGT 77,386 38.8
GCCCT 1,391,836 38.8
GCGAT 284,533 38.7
GGACC 801,101 38.7
GTCGC 180,882 38.7
TCGGG 336,464 38.6
TATCT 1,978,794 38.4
TCGCG 78,192 38.4
GAACG 209,638 38.2
GCGGA 314,482 38.2
TATCA 1,597,745 38.2
TGTCC 1,494,977 38.1
TATTA 2,615,304 37.9
ATTCA 2,369,453 37.6
CATCT 2,215,828 37.5
GCCAT 1,487,994 37
AGTTG 1,572,993 36.9
TCGGA 189,833 36.9
ATTTA 3,714,156 36.6
TTCGT 300,726 36.6
77
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences TTTCA 3,702,732 36.5
CAATT 1,788,994 36.3
GCGTC 192,694 36.3
ATTTG 2,988,525 36.1
CCCAT 1,612,870 36
GGCGG 614,549 35.8
CTTTT 4,484,575 35.7
GTTTG 2,074,984 35.7
CGTTA 197,330 35.5
TCCCG 481,128 35.3
GTTCC 1,172,656 35
GGTCT 1,491,396 34.9
ACGAT 229,873 34.8
CCCCG 492,991 34.5
TACCG 144,941 34.5
CTTTC 2,631,978 34.2
ACCCC 1,201,127 34.1
TCCCC 1,639,875 34.1
TTTCT 5,383,927 34
AGCGT 265,750 33.9
CCCCA 2,003,088 33.9
GCGTT 209,754 33.4
TCCAT 2,063,377 33.4
TGGGT 1,951,465 33.3
ACGTC 211,048 33.2
AGATT 2,232,198 33.2
AGTCT 1,866,281 33.2
TCGTA 180,801 33.2
GGGGT 1,336,146 32.9
TATCC 1,094,063 32.9
CCCGG 639,671 32.8
ACCCG 336,523 32.7
GACGA 183,720 32.7
78
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences AAACC 1,748,056 32.6
AGGGA 2,148,555 32.6
TGGGG 2,211,306 32.6
CCCCC 1,201,327 32.5
CGTTG 247,037 32.4
CTCGT 308,730 32.4
GCGCA 246,785 32.4
CCGCT 279,398 32.2
ACGGA 281,671 32
CCCTG 2,141,641 31.8
ATCGT 252,737 31.7
ATGGG 1,676,500 31.6
GCCCC 1,204,419 31.6
AGGGT 1,458,919 31.5
TACGA 159,672 31.3
CCGCA 288,662 31.2
GACGG 352,833 31.2
CATCG 224,820 31.1
TATTC 1,928,243 31.1
GAACT 1,584,270 30.9
ATGGA 2,050,759 30.7
AGTTA 1,637,716 30.5
AAACT 2,499,781 30.4
CCGAC 166,458 30
CGTCA 233,678 30
CCGGG 634,882 29.9
CCCTT 1,745,886 29.8
CCCGA 336,597 29.7
GGTTG 1,318,526 29.6
CGTCC 237,658 29.5
GGTCG 169,286 29.5
AGTCC 1,197,074 29.2
CGTTT 411,262 29.2
79
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences TCGGC 379,732 29
ATTCC 1,666,678 28.8
TCCCA 2,890,728 28.7
TGTTC 1,924,955 28.6
CCCTA 947,329 28.5
GAAAA 3,838,493 28.4
CATTT 3,987,777 28.3
AGCGA 356,441 28.1
CATCC 1,430,845 28
TAATT 3,322,941 28
TGTTT 4,287,921 28
AAAAT 5,813,462 27.9
ACCAC 1,542,993 27.9
GTTTA 1,861,719 27.9
GCGCC 432,566 27.7
TCGTG 361,038 27.7
ACCCA 1,704,439 27.6
GGTTA 978,764 27.6
TTCGC 185,740 26.9
ACGCC 410,905 26.8
ATCTA 1,424,270 26.7
GTTTC 2,106,256 26.6
CACGC 490,095 26.5
GTTTT 3,843,083 26.5
ATGGC 1,446,599 26.3
GCCAC 1,670,766 26.3
CAGGT 1,963,502 26
ACGCA 231,978 25.9
ATGGT 1,889,813 25.9
AACGG 195,438 25.6
GGTCA 1,289,385 25.6
CCGAG 549,026 25.5
GATTA 1,722,546 25.5
80
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences CGCGA 78,619 25.4
CTGGT 1,774,314 25.4
CCGTG 438,594 25.1
GGCGA 278,461 25.1
GCGGC 320,154 25
GGGGG 1,200,687 25
ACCAG 1,610,502 24.8
CCGTT 241,819 24.8
TTTTA 5,777,747 24.8
TTTCC 3,005,026 24.6
TCCGG 245,895 24.4
ACGTG 452,092 24.3
TCGTT 289,165 24.2
TGCGA 207,264 24.1
TTTTG 4,805,218 24.1
AGTTT 3,018,806 23.9
GGTCC 798,015 23.8
TCCAA 1,681,218 23.8
GTCGT 168,842 23.7
CCCAC 1,692,493 23.6
GGATT 1,905,074 23.6
CATTC 1,880,272 23.4
GCGTG 555,896 23.4
GTGGG 1,889,457 23.3
CGGCA 258,868 23.2
AGGAG 2,808,406 23.1
GGTTT 2,168,060 23.1
GCCCA 1,780,207 23
TTCGA 261,131 23
CAGGG 2,049,422 22.9
CGGGT 348,816 22.9
TCCAC 1,482,678 22.9
CCGCC 613,395 22.8
81
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences ATCCG 220,629 22.7
ATTTC 3,021,513 22.5
TGATT 2,484,674 22.5
AAAAA 9,462,714 22.2
TTCGG 225,076 22.2
CCCAA 1,818,902 22
GAACC 1,047,764 22
GTGGT 1,815,701 22
AACGA 228,240 21.9
CCCAG 3,254,121 21.8
AGGGC 1,333,814 21.7
TAGGT 1,244,207 21.7
AAATC 2,084,708 21.6
TAGGA 1,535,787 21.5
CAGAT 1,915,927 21.4
AATTG 2,014,516 21.3
GCGCT 235,541 21.2
GCCAG 1,989,326 21.1
TCCTG 2,988,450 21.1
GATCA 1,394,788 20.8
GCCAA 1,441,148 20.8
TAGGG 1,061,118 20.7
ATCGG 149,114 20.1
CAGGA 2,701,023 20
TCCAG 2,350,691 20
TTTCG 301,797 19.9
CGCAT 202,138 19.8
CGGGA 456,824 19.7
CTCCA 2,278,862 19.7
CGGGC 458,659 19.6
CCGAA 205,761 19.4
TCGAA 258,183 19.4
CGGAT 207,047 19.3
82
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences CGATT 260,957 19.2
CGGGG 520,613 19.2
TATTT 5,616,489 19.2
TTGGG 2,131,650 19.2
AATCA 2,042,160 19.1
GCCCG 471,701 19.1
CTCCT 2,887,270 19
GTGGA 1,681,036 19
CCCTC 1,688,997 18.9
AAATT 4,196,357 18.8
AGTTC 1,645,584 18.8
GCCTA 905,644 18.8
TGGGC 1,860,194 18.8
TTGGT 1,917,795 18.8
CTGGG 3,394,548 18.6
TAGAT 1,614,361 18.6
ATCCT 1,783,775 18.5
GGGGC 1,242,732 18.5
CTCCG 434,620 18.4
TGGAT 1,845,048 18.4
CGGAG 438,057 18.3
CTCTA 1,583,490 18.3
TGGGA 3,069,293 18.2
CGGCT 444,780 18
GCGAC 166,650 18
GGGGA 1,668,896 18
GATCT 1,453,710 17.9
GTCCG 167,450 17.9
TACCA 1,341,094 17.9
CGCGG 168,320 17.8
ATCCA 1,653,448 17.5
GCGAA 172,071 17.4
AATTA 3,028,733 17.2
83
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences TCGCC 290,342 17.2
AATCT 2,044,248 17.1
ATTTT 6,889,566 17.1
ACCAA 1,470,794 17
TTTTC 4,633,855 17
GAATC 1,256,563 16.7
GGGAT 1,619,645 16.7
TCGAC 121,003 16.5
GATCC 977,327 16.4
GGCCG 486,575 16.4
GTGGC 1,770,658 16.4
TTGGA 2,021,706 16.3
CTGGA 2,422,025 16.1
AAGGT 1,565,091 16
ACCTT 1,627,537 16
TCGAG 314,039 15.9
ACCTG 1,902,889 15.8
GTCGA 126,568 15.8
GGGAG 2,548,300 15.7
TCCTT 2,740,778 15.7
TCGCA 191,240 15.7
GAGGG 1,728,181 15.6
GGTTC 1,151,677 15.6
AAGGG 1,618,064 15.5
TAGAG 1,744,127 15.5
AAGGA 2,465,830 15.4
CGCCT 649,561 15.4
AGGAT 1,828,630 15.3
AAGGC 1,448,102 15.2
ATCCC 1,444,244 15.2
CGGTT 199,182 15.1
ACCTC 1,802,771 15
TCGTC 200,023 15
84
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences ACCTA 1,071,888 14.9
CTCCC 2,546,263 14.9
GGGCC 1,148,697 14.8
TCCTC 2,104,325 14.7
TCCTA 1,443,269 14.6
TTGGC 1,717,496 14.6
ACGAG 275,645 14.5
ATCTG 1,927,416 14.5
GAGGA 2,135,035 14.5
GGCCT 1,694,825 14.2
ATCAA 1,629,493 14.1
GGGAC 1,132,001 14.1
TGGTA 1,488,127 14.1
ATGAT 2,077,816 14
TCGCT 358,226 14
GTCCA 1,081,576 13.9
GAATT 2,253,811 13.8
CTCGC 293,386 13.6
GGATC 961,929 13.5
TACCT 1,407,871 13.5
TGGAG 2,596,589 13.5
GATTT 2,533,510 13.4
CACCT 2,032,511 13.3
GTCTA 916,887 13.1
ACGAA 231,638 13
ATCAG 1,609,797 13
CACCA 2,173,231 12.9
CGGCG 154,655 12.9
TGCCT 2,791,274 12.9
TGGTG 2,471,478 12.9
CACCG 389,594 12.8
CGGAA 234,656 12.8
CTCGA 312,122 12.8
85
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences GATTC 1,332,981 12.8
GACAT 1,424,213 12.6
GGGCA 1,688,438 12.4
AGGTG 2,171,788 12
CAGAC 1,336,578 12
GTGAC 1,166,789 12
ATCTT 2,352,939 11.9
CTGGC 2,009,055 11.9
AACAT 2,448,809 11.8
GCCTC 2,464,770 11.8
TAGGC 935,262 11.8
TGGCA 2,040,974 11.8
TGGAC 1,118,195 11.6
AAGAT 2,182,717 11.5
GAGAG 2,174,136 11.5
GTCCC 1,128,048 11.5
CAGGC 2,547,809 11.4
AATCC 1,599,750 11.3
ATGTA 2,115,619 11.3
GGGAA 2,043,736 11.3
GTGAT 1,856,529 11.3
CACAT 2,075,826 11.1
CGCAA 179,679 11.1
TAGAA 2,336,356 11.1
TTCCT 3,255,059 11.1
GTCAA 1,086,031 11
GTCTT 2,002,883 11
AACCG 183,118 10.9
CTCGG 549,387 10.9
AGCCA 2,317,199 10.8
CTCTC 2,135,735 10.8
AGCCT 2,521,864 10.7
CACAA 1,685,692 10.7
86
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences CTCTT 2,613,324 10.7
GCCTG 2,607,454 10.7
TAGTG 1,303,038 10.7
TTTTT 12,403,974 10.7
ATGCA 1,784,314 10.6
TTCCC 2,099,250 10.5
CACCC 1,542,080 10.4
AGGCC 1,570,698 10.2
CTGAT 1,673,282 10.2
ATCAT 1,981,919 10.1
ATCTC 1,988,006 10.1
CACGA 296,505 10.1
GCGAG 297,587 10.1
CAGAG 2,709,767 10
CGGTG 399,373 10
GAGGC 2,308,648 10
GGGTA 901,530 10
GTCAT 1,393,797 10
TCGAT 201,684 9.9
TGGAA 2,615,669 9.9
AGGAA 2,969,380 9.8
AACTA 1,447,514 9.7
AACCT 1,674,876 9.6
AGGCA 2,512,670 9.6
ATGAA 2,702,783 9.6
ATTCG 207,881 9.6
CTCAT 1,985,339 9.6
GACAC 1,041,277 9.6
TACAG 2,072,679 9.6
TTGAG 2,408,269 9.6
GCCTT 1,679,701 9.5
TTGAT 2,105,257 9.5
TTCCA 2,564,195 9.4
87
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences AGGTA 1,397,835 9.3
GTCAG 1,516,244 9.2
TAGTT 1,837,905 9.2
ATGAG 1,986,080 9.1
GTGTA 1,312,975 9.1
AAGAG 2,320,735 9
GACAA 1,334,055 9
GTCTG 1,548,023 9
GTGAG 2,113,466 9
TACTA 1,335,826 9
AATTT 4,476,001 8.9
AAGAC 1,611,101 8.7
CTGTC 1,951,171 8.7
GACCC 915,662 8.7
GAGAT 2,079,231 8.7
GAGGT 1,847,409 8.7
AGCCC 1,414,929 8.5
CACTA 1,057,476 8.5
CTCAA 2,002,943 8.5
TGGTC 1,409,723 8.5
TTGTC 1,673,883 8.4
ATCAC 1,562,962 8.3
TGCCA 2,050,488 8.3
TGGCT 2,530,312 8.3
GATCG 242,933 8.2
TTCTT 4,488,245 8.2
ATGAC 1,240,787 8.1
CACAG 2,229,476 8.1
GTGAA 1,980,737 8.1
CTCTG 2,881,012 8
TAGTC 994,806 8
AACCA 1,640,714 7.9
AGGCT 2,518,182 7.9
88
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences GACCA 1,268,804 7.9
AATTC 2,042,794 7.8
TTCAC 1,912,944 7.8
AACAA 2,583,271 7.7
AGGTT 1,824,771 7.7
GGGTT 1,564,662 7.7
TAGAC 903,455 7.7
TTCTA 2,353,747 7.6
CTGTG 2,669,503 7.5
TTGTA 2,392,407 7.5
GGGTC 951,210 7.4
TGGTT 2,028,333 7.4
TACTT 2,056,964 7.3
TGCCC 1,769,066 7.3
TTGAC 1,238,357 7.3
AGCAT 1,809,041 7.2
CGGTA 138,026 7.2
GAGAA 2,772,870 7.2
GGCCA 1,798,589 7.2
GGGTG 1,678,020 7.2
TTGAA 2,940,009 7.1
AGCCG 426,921 7
GACCT 1,289,808 7
GGCCC 1,138,358 7
TTCTC 2,989,593 7
TTGTG 2,276,551 7
AACCC 1,300,674 6.9
AACTC 1,585,053 6.9
AAGTG 2,306,384 6.9
ATGTC 1,452,773 6.9
AGCAA 2,052,753 6.8
ATGTT 2,786,964 6.8
CCGAT 146,543 6.8
89
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences TAGCG 146,093 6.8
GACAG 1,795,098 6.7
TGCCG 298,743 6.7
TGCAC 1,505,446 6.6
TGGCG 452,214 6.6
GACTA 920,327 6.5
GAGCG 309,210 6.5
GAGAC 1,874,728 6.4
CACAC 2,081,905 6.2
CGGTC 161,217 6.2
TTGTT 3,529,061 6.2
GTGTG 2,651,781 6
TTCTG 3,182,478 6
CACTC 1,512,674 5.9
CAGAA 2,712,980 5.9
CCGCG 170,170 5.9
TACAT 2,035,214 5.9
GACCG 172,337 5.8
TACCC 860,100 5.8
TTCAT 2,943,069 5.8
AACTG 1,756,292 5.7
CACTG 2,438,855 5.7
CTGTT 2,475,015 5.7
GAGCA 1,574,195 5.7
GGCAG 2,285,671 5.7
GTCTC 1,946,130 5.7
TTGCA 2,099,278 5.7
GGGCG 535,296 5.6
TGCAT 1,971,144 5.6
AACAC 1,449,098 5.5
AAGTT 2,181,688 5.5
CAGCG 361,790 5.5
CTGAG 2,731,706 5.5
90
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences CTGCG 367,134 5.4
TACTG 1,488,913 5.4
TTGCC 1,678,964 5.4
CAGTG 2,646,157 5.3
TACTC 1,133,773 5.3
CTCAC 1,931,040 5.2
CTGTA 2,130,012 5.2
TACAA 1,908,493 5.2
TGCAA 1,917,811 5.2
AAGCA 2,170,505 5.1
ATCGA 195,088 5.1
AACAG 2,000,834 5
AGGAC 1,231,398 4.9
CTGCT 2,245,964 4.9
TTCAG 2,522,192 4.8
AAGCT 1,712,564 4.7
AGGCG 638,469 4.7
CAGTA 1,494,909 4.7
CGCCA 422,958 4.7
GGCTG 2,741,900 4.7
GTGCT 1,714,266 4.7
TACAC 1,057,643 4.7
TGGCC 1,920,427 4.7
AGCAG 2,164,259 4.6
GACTG 1,307,666 4.6
GGGCT 1,523,231 4.6
TGCAG 2,605,444 4.6
CTCAG 2,736,400 4.4
GTCCT 1,350,689 4.4
TGCTC 1,585,672 4.4
TTGCT 2,485,128 4.4
AAGTC 1,400,112 4.3
GAGTC 1,154,013 4.3
91
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences AATCG 239,739 4.2
TGCTA 1,429,971 4.2
AGCAC 1,475,385 4.1
CACTT 2,208,939 4.1
GAGTA 1,235,963 4
GAGTG 1,765,602 4
ATGCT 1,790,013 3.9
ATGTG 2,336,429 3.9
CTGAA 2,287,643 3.9
GAGTT 1,807,249 3.9
TTCAA 2,591,506 3.9
CTGCC 2,363,179 3.8
GGCAT 1,566,285 3.8
TTCCG 261,801 3.8
GAGCT 1,605,030 3.7
GGCTA 1,071,955 3.7
TGCTG 2,682,543 3.7
AAGAA 3,668,520 3.5
CTGAC 1,424,020 3.5
GGCAA 1,420,064 3.5
GTCAC 1,132,235 3.5
GTGCG 287,419 3.5
TAGCA 1,427,378 3.5
TGCTT 2,537,248 3.5
CAGCA 2,329,412 3.4
CAGCT 2,359,725 3.4
CAGTT 2,048,159 3.4
AACTT 2,124,044 3.3
AAGCG 306,975 3.3
AGCTG 2,432,398 3.3
ATGCC 1,507,838 3.3
TAGTA 1,525,667 3.3
AGCTC 1,543,819 3.2
92
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences AGCTT 1,856,882 3.2
CAGCC 2,575,196 3.1
GGCTC 1,621,320 3.1
GTGCA 1,635,546 3.1
CGCAG 336,093 3
GAGCC 1,644,944 3
GTGTT 1,971,703 3
CTGCA 2,480,807 2.8
GACTC 1,085,413 2.8
TAGCC 1,078,376 2.8
AAGTA 1,952,402 2.6
AGGTC 1,225,867 2.4
TAGCT 1,696,538 2.4
CAGTC 1,327,380 2.3
GTGTC 1,277,120 2.3
AAGCC 1,382,283 2.2
GTGCC 1,359,530 2.2
GACTT 1,519,825 2
AGCTA 1,540,926 1.9
CGCCC 525,704 1.9
GGCTT 1,581,596 1.9
GGCAC 1,228,974 0.8
ACGAC 130,986 0
ATGCG 211,435 0
CGCAC 240,881 0
CGCCG 149,145 0
CGCTA 133,733 0
CGCTC 291,869 0
CGCTG 371,534 0
CGCTT 308,712 0
CGGAC 152,601 0
CGGCC 478,383 0
TACGG 156,372 0
93
Supplementary Table 11 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced
transcript (transcribed region, including introns) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript
mutations (sense strand) per million quintuplet
occurrences TTGCG 214,486 0
94
Supplementary Table 12
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences
ATAGG : CCTAT 2,052,364 849.7
CCTAG : CTAGG 1,896,141 771.6
ACAGG : CCTGT 3,450,530 751.5
ATAGC : GCTAT 1,959,877 743.4
ACAGC : GCTGT 2,532,402 706.1
CCTAC : GTAGG 1,484,355 655.5
ATAGA : TCTAT 3,476,362 647.0
CTAGC : GCTAG 1,398,035 633.7
CTAGA : TCTAG 2,416,288 626.5
ATACA : TGTAT 4,300,830 615.0
CCAGG : CCTGG 4,241,635 593.7
ACAGA : TCTGT 4,777,684 592.8
CTACA : TGTAG 2,626,430 571.1
CCTGC : GCAGG 2,958,690 565.1
ACAGT : ACTGT 3,070,569 546.8
ATACG : CGTAT 375,014 522.6
CCAGA : TCTGG 3,326,103 510.5
GTAGA : TCTAC 2,365,186 510.3
GCTAC : GTAGC 1,509,832 490.1
GCAGC : GCTGC 2,293,528 485.8
ACTGG : CCAGT 2,650,172 484.9
CATAG : CTATG 2,315,218 476.9
CCTGA : TCAGG 3,432,582 468.8
CTATA : TATAG 2,594,873 468.2
CTAAG : CTTAG 2,099,113 465.9
CCAGC : GCTGG 3,965,825 463.5
GTACA : TGTAC 1,938,474 460.2
ACTAT : ATAGT 2,711,116 458.8
ACATG : CATGT 3,734,730 450.7
ACTAG : CTAGT 1,673,935 439.0
AGTAT : ATACT 2,747,714 438.2
GCAGA : TCTGC 3,286,245 437.8
CTACC : GGTAG 1,566,082 432.9
95
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences ACATA : TATGT 4,063,726 425.7
ATATG : CATAT 3,878,372 422.6
ATACC : GGTAT 1,862,340 419.3
CGTAG : CTACG 255,747 414.4
GCTGA : TCAGC 3,122,946 397.0
ACTGC : GCAGT 2,766,283 392.6
CATGG : CCATG 3,238,291 390.3
CCTAA : TTAGG 2,290,472 390.3
ACACA : TGTGT 5,179,407 388.5
ATAAG : CTTAT 2,991,670 384.8
AGTAG : CTACT 2,655,704 384.5
ACTGA : TCAGT 3,143,432 382.7
CCATA : TATGG 2,275,110 378.4
AGTAC : GTACT 1,476,881 370.4
ATATA : TATAT 6,772,263 367.7
ACACG : CGTGT 546,593 364.1
ACTAC : GTAGT 1,744,628 359.4
GCATA : TATGC 1,989,600 358.3
CATAC : GTATG 1,887,210 357.6
ACAAG : CTTGT 2,877,958 352.0
CTTAC : GTAAG 1,832,124 347.2
ACACT : AGTGT 2,860,776 343.3
TCAGA : TCTGA 3,786,073 343.1
GCTAA : TTAGC 2,215,166 339.0
GCACA : TGTGC 2,675,440 331.1
GTATA : TATAC 2,573,081 329.2
ATAAC : GTTAT 2,423,998 321.4
CTATC : GATAG 1,774,630 320.6
CTAAC : GTTAG 1,612,806 315.0
CCAAG : CTTGG 3,021,796 314.7
CATGC : GCATG 2,630,545 309.8
ACATC : GATGT 2,408,340 309.3
GGTAC : GTACC 1,097,889 308.8
96
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences CAAGG : CCTTG 2,819,989 308.5
TCTAA : TTAGA 3,209,633 305.0
CCACA : TGTGG 3,442,619 303.5
ACAAC : GTTGT 2,299,513 301.8
ATATC : GATAT 2,846,993 297.5
CTTGC : GCAAG 2,325,023 291.6
ACAAT : ATTGT 3,739,513 289.6
CCTTA : TAAGG 2,140,251 286.9
CTAAA : TTTAG 3,845,536 285.3
GTAAC : GTTAC 1,442,530 282.1
AGAGG : CCTCT 3,609,743 281.7
GATAC : GTATC 1,475,654 279.2
ACACC : GGTGT 2,190,637 273.4
CGTAC : GTACG 170,187 270.3
AGTGC : GCACT 2,404,275 267.9
AATAG : CTATT 3,515,013 262.9
ATTGG : CCAAT 2,413,363 259.0
ATAAA : TTTAT 8,099,851 258.9
AGTGG : CCACT 3,095,771 257.8
ATTAG : CTAAT 3,039,701 257.3
TGTAA : TTACA 4,148,426 257.2
CCACG : CGTGG 642,335 250.7
CCAAC : GTTGG 2,121,332 249.4
CCATC : GATGG 2,829,160 243.9
AGAGC : GCTCT 2,615,718 240.9
GTAAA : TTTAC 3,406,648 236.0
ACTAA : TTAGT 2,881,846 236.0
AAAGG : CCTTT 4,226,614 234.7
GCACC : GGTGC 1,682,128 233.6
GATGC : GCATC 1,677,346 231.3
CCTCA : TGAGG 3,771,129 230.7
GCAAC : GTTGC 1,815,344 230.2
ATTGC : GCAAT 2,588,999 229.8
97
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences ATAAT : ATTAT 5,489,421 227.9
CTTGA : TCAAG 3,322,833 227.6
ACAAA : TTTGT 6,652,102 227.4
CATGA : TCATG 3,425,400 226.8
AGTAA : TTACT 3,229,297 226.1
CAAGC : GCTTG 2,176,986 224.6
CGTGC : GCACG 452,367 223.3
ATTAC : GTAAT 2,947,816 222.9
CGAGC : GCTCG 267,150 220.8
GGTAA : TTACC 1,932,429 216.9
GCTTA : TAAGC 1,817,182 216.3
TATGA : TCATA 3,145,460 212.7
AATGT : ACATT 4,725,048 211.2
CCAAA : TTTGG 4,516,292 209.6
CTTAA : TTAAG 3,198,367 208.5
AATAT : ATATT 6,575,896 206.3
CAAGA : TCTTG 3,748,442 206.2
CCTCG : CGAGG 589,094 203.7
TCACA : TGTGA 3,913,802 202.9
CCACC : GGTGG 3,273,847 193.3
AGTGA : TCACT 3,915,720 191.5
GTTAA : TTAAC 2,494,623 191.2
CCTTC : GAAGG 3,205,842 190.9
GTTGA : TCAAC 2,242,837 190.4
AGAGA : TCTCT 5,807,492 187.2
CATAA : TTATG 3,392,975 186.9
ATTGA : TCAAT 3,409,665 184.2
ACTTG : CAAGT 2,801,997 182.3
TAAGA : TCTTA 3,367,481 180.9
AAAGC : GCTTT 3,462,030 179.9
AATGG : CCATT 3,768,892 175.9
ACTCT : AGAGT 3,097,200 174.6
GCTCA : TGAGC 2,942,445 174.0
98
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences CGTAA : TTACG 318,896 172.5
AATAC : GTATT 3,349,997 171.0
TATAA : TTATA 4,968,257 169.5
GCAAA : TTTGC 3,718,838 168.9
CCTCC : GGAGG 4,217,404 165.5
ACTTA : TAAGT 2,796,406 163.8
ACTCA : TGAGT 3,077,142 163.1
GATGA : TCATC 2,849,649 159.7
AATGC : GCATT 2,961,725 157.4
CGTGA : TCACG 681,218 157.1
CGAGA : TCTCG 697,460 154.8
ACCGT : ACGGT 374,711 152.1
TTAAA : TTTAA 7,445,848 148.0
ACTCG : CGAGT 381,872 146.6
AGAAG : CTTCT 4,707,560 145.5
TCTCA : TGAGA 4,855,169 145.4
TCAAA : TTTGA 5,337,344 143.5
GAAGC : GCTTC 2,372,194 142.1
GATAA : TTATC 2,890,095 141.9
GGTGA : TCACC 2,695,528 140.6
AAAGA : TCTTT 7,414,758 138.1
GCTCC : GGAGC 1,910,076 137.7
GGAGA : TCTCC 3,817,593 137.0
ATTAA : TTAAT 5,119,815 134.1
ACGGC : GCCGT 276,090 134.0
GAAGA : TCTTC 4,233,476 130.4
CTTCA : TGAAG 3,837,423 129.5
CGTTA : TAACG 294,433 125.7
CGAAG : CTTCG 349,354 120.3
CAACA : TGTTG 3,717,821 119.7
CGATA : TATCG 243,357 119.2
AGAAC : GTTCT 2,999,610 118.7
CAAAG : CTTTG 4,519,070 116.2
99
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences AGACA : TGTCT 4,209,019 115.2
ACGGG : CCCGT 442,822 115.2
ACTTC : GAAGT 2,862,069 113.5
ACTCC : GGAGT 2,708,221 112.6
AATGA : TCATT 5,030,821 110.5
CTTTA : TAAAG 4,018,590 110.2
CTTCC : GGAAG 3,779,279 109.8
CAATG : CATTG 3,026,134 107.8
AGTCG : CGACT 233,226 107.2
TAACA : TGTTA 3,362,077 105.3
CGACA : TGTCG 324,069 104.9
CGAAC : GTTCG 270,384 103.6
GCCGC : GCGGC 318,992 103.4
CCCGC : GCGGG 541,822 101.5
TGACA : TGTCA 3,205,410 101.4
CAATA : TATTG 3,400,301 100.9
AGATG : CATCT 3,743,691 99.9
CCGGC : GCCGG 501,117 99.8
CAACG : CGTTG 332,131 99.4
ACGGA : TCCGT 433,275 99.3
CCGCG : CGCGG 162,136 98.6
AAAGT : ACTTT 4,911,127 97.2
GTTCA : TGAAC 2,767,227 96.2
CGACG : CGTCG 54,041 92.6
ACCGC : GCGGT 327,027 91.7
ATTCA : TGAAT 4,511,329 90.2
ACGCA : TGCGT 379,989 89.5
AGTTG : CAACT 2,433,741 88.8
AATAA : TTATT 7,753,623 87.8
CATCG : CGATG 325,063 86.2
AGATA : TATCT 3,622,646 85.9
CATTA : TAATG 3,292,676 85.6
CTTTC : GAAAG 4,424,502 85.2
100
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences GCGCA : TGCGC 306,633 84.8
CAAAC : GTTTG 3,214,849 84.6
AGCGC : GCGCT 272,542 84.4
AGTCA : TGACT 2,858,999 83.9
AAAAG : CTTTT 7,052,340 83.3
AGACT : AGTCT 3,037,692 83.3
AGAAT : ATTCT 5,136,604 82.4
AGACG : CGTCT 585,786 82.0
CATCA : TGATG 3,323,362 81.8
ACCGA : TCGGT 269,661 81.5
GCGGA : TCCGC 405,725 81.3
TATCA : TGATA 3,078,532 80.5
AGACC : GGTCT 2,239,082 80.4
GCGTA : TACGC 186,896 80.2
AGAAA : TTTCT 8,982,776 79.9
GGTTA : TAACC 1,578,124 79.8
ATTCG : CGAAT 340,773 79.3
ACGCT : AGCGT 343,503 78.6
ATTTG : CAAAT 5,046,087 77.7
TAATA : TATTA 4,751,072 75.8
CCCGG : CCGGG 755,245 75.4
CCGGA : TCCGG 308,370 74.6
GGACA : TGTCC 2,342,223 74.3
CGATC : GATCG 338,139 71.0
GGAAC : GTTCC 1,880,006 70.7
CAACC : GGTTG 1,995,173 70.2
ACCGG : CCGGT 242,556 70.1
CGTCA : TGACG 328,474 70.0
CGAAA : TTTCG 430,148 69.7
ATTCC : GGAAT 3,077,476 69.2
GTTTA : TAAAC 3,025,257 69.1
AAACA : TGTTT 6,593,719 67.3
CATCC : GGATG 2,372,318 66.6
GAACA : TGTTC 3,130,709 64.9
101
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences CAATC : GATTG 2,087,332 64.7
TGAAA : TTTCA 6,328,700 63.9
AGTTA : TAACT 2,739,276 63.9
CGTTC : GAACG 316,884 63.1
CCCGA : TCGGG 431,561 62.6
CATTC : GAATG 3,442,173 62.5
ATTTA : TAAAT 6,541,833 62.4
ACGTC : GACGT 320,149 62.4
TCCGA : TCGGA 258,527 61.9
ACGTG : CACGT 654,147 61.1
AGGGG : CCCCT 2,293,071 61.0
CGTCC : GGACG 294,768 61.0
CCGCA : TGCGG 379,538 60.6
GAATA : TATTC 3,514,487 60.1
GCCGA : TCGGC 484,164 59.9
CACGC : GCGTG 671,787 59.5
ATGGG : CCCAT 2,777,449 59.1
CAAAA : TTTTG 7,444,256 56.9
CACGG : CCGTG 517,811 56.0
AACGC : GCGTT 267,967 56.0
AGGGA : TCCCT 3,562,788 55.6
GGTCA : TGACC 2,040,429 55.4
ACGCC : GGCGT 549,598 54.6
ACCCT : AGGGT 2,229,226 53.4
GGATA : TATCC 2,042,852 53.3
ACCCC : GGGGT 1,894,959 53.3
AGATC : GATCT 2,365,504 52.9
ACGTA : TACGT 435,280 52.8
GCGCC : GGCGC 498,139 52.2
AAATG : CATTT 6,725,462 51.8
ACCCA : TGGGT 3,008,267 51.5
GGAAA : TTTCC 4,995,248 51.4
ACGAT : ATCGT 371,807 51.1
CCCCA : TGGGG 3,180,062 50.9
102
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences CCGTA : TACGG 236,122 50.8
AGCGG : CCGCT 339,581 50.1
CCCTA : TAGGG 1,665,007 49.8
GAAAC : GTTTC 3,351,605 49.5
AGTCC : GGACT 1,942,058 49.4
GGGGA : TCCCC 2,523,449 48.7
ATGGA : TCCAT 3,722,767 48.3
CGGGA : TCCCG 601,104 48.2
AGGGC : GCCCT 1,981,241 47.4
AGTTC : GAACT 2,728,094 47.3
ACCAT : ATGGT 3,157,808 46.8
CCGTC : GACGG 448,462 46.8
AAACG : CGTTT 581,809 46.4
GGCGA : TCGCC 367,842 46.2
AAAAC : GTTTT 5,844,209 46.0
GATTA : TAATC 2,814,962 45.8
AACGT : ACGTT 525,445 45.7
CCCCC : GGGGG 1,673,893 45.4
AGCGA : TCGCT 488,795 45.0
CCGCC : GGCGG 712,310 44.9
ACCCG : CGGGT 446,585 44.8
TCCCA : TGGGA 4,695,858 44.5
TAAAA : TTTTA 9,212,257 44.4
CCCCG : CGGGG 585,338 44.4
CACGA : TCGTG 478,740 43.9
AAACT : AGTTT 4,918,718 43.7
CGACC : GGTCG 207,990 43.3
AATTG : CAATT 3,637,696 43.1
ATTTC : GAAAT 5,379,726 42.8
CAGGG : CCCTG 3,061,530 41.5
ACCAG : CTGGT 2,654,089 41.1
CCCTC : GAGGG 2,562,870 41.0
ACCTG : CAGGT 2,945,489 40.8
ACGAG : CTCGT 417,911 40.7
103
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences GATCA : TGATC 2,385,310 40.2
ATCGA : TCGAT 325,380 39.9
GAACC : GGTTC 1,741,491 39.0
CCGAA : TTCGG 309,817 38.7
AAATA : TATTT 9,528,640 38.4
ACCTA : TAGGT 2,014,228 37.7
AATTA : TAATT 5,738,091 37.3
AATCT : AGATT 3,824,640 36.6
CGCGC : GCGCG 193,803 36.1
CCCAG : CTGGG 4,805,933 35.8
GATCC : GGATC 1,536,655 35.8
GACGC : GCGTC 223,131 35.8
ATGGC : GCCAT 2,453,540 35.5
ACCAA : TTGGT 2,998,119 35.0
ACCAC : GTGGT 2,627,802 35.0
CTCGC : GCGAG 372,293 34.9
AAACC : GGTTT 3,341,606 34.7
GGACC : GGTCC 1,180,875 34.7
TACGA : TCGTA 261,572 34.4
TCGCA : TGCGA 291,523 34.3
AACGG : CCGTT 324,489 33.9
ATCCG : CGGAT 297,731 33.6
AAGGA : TCCTT 4,524,298 33.3
CCCAA : TTGGG 3,196,868 33.2
GAAAA : TTTTC 7,580,956 32.8
AATCA : TGATT 4,144,717 32.8
GCCCA : TGGGC 2,648,929 32.5
GTGGA : TCCAC 2,589,941 32.4
AAAAT : ATTTT 11,341,687 32.3
ATCCA : TGGAT 3,084,545 32.1
ATCGG : CCGAT 219,179 31.9
CGGGC : GCCCG 538,369 31.6
GAATC : GATTC 2,265,912 30.9
CTGGA : TCCAG 3,820,895 30.6
104
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences AAGGG : CCCTT 2,776,940 30.6
CAGGA : TCCTG 4,405,951 30.4
ATCTA : TAGAT 2,810,436 30.2
ATCCC : GGGAT 2,478,217 29.8
CTCCA : TGGAG 3,906,542 28.5
TAGGA : TCCTA 2,521,865 28.1
CCCAC : GTGGG 2,752,863 28.0
GCCCC : GGGGC 1,640,122 27.4
TCGAA : TTCGA 401,739 27.4
GAGGA : TCCTC 3,348,583 27.2
AGCCG : CGGCT 553,889 27.1
ACCTC : GAGGT 2,823,718 26.9
CACCG : CGGTG 522,284 26.8
CTCCC : GGGAG 3,808,670 26.5
CTGGC : GCCAG 2,902,899 26.5
AAGGT : ACCTT 2,611,045 26.5
TCCAA : TTGGA 3,302,901 26.3
ATCTG : CAGAT 3,249,764 26.2
ACGAA : TTCGT 421,152 26.1
AAATC : GATTT 4,129,568 25.9
CCGAG : CTCGG 700,550 25.7
ATCGC : GCGAT 389,448 25.6
CTCTA : TAGAG 2,773,224 24.6
GCCAC : GTGGC 2,484,104 24.5
AGGAG : CTCCT 4,385,115 24.2
AAAAA : TTTTT 17,969,434 24.0
ACGAC : GTCGT 208,177 24.0
AGGAT : ATCCT 3,020,489 23.8
CGCCG : CGGCG 126,036 23.8
ATCAT : ATGAT 3,677,654 23.4
CCGAC : GTCGG 214,350 23.3
AATCC : GGATT 2,929,127 22.9
GGGAA : TTCCC 3,400,156 22.6
105
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences AATCG : CGATT 399,232 22.6
ATCAG : CTGAT 2,847,458 22.1
GACGA : TCGTC 271,370 22.1
ATCAA : TTGAT 3,532,513 21.8
AAGAT : ATCTT 4,004,025 21.7
TACCA : TGGTA 2,440,256 21.3
AACCG : CGGTT 281,066 21.3
CACCA : TGGTG 3,709,651 21.0
AGCCA : TGGCT 3,859,218 20.5
CTCGA : TCGAG 444,875 20.2
CGGAG : CTCCG 552,978 19.9
ATCTC : GAGAT 3,492,313 19.8
GCCAA : TTGGC 2,575,039 19.8
AACGA : TCGTT 407,359 19.6
GGGAC : GTCCC 1,696,168 19.5
GCGAA : TTCGC 257,972 19.4
AGGTC : GACCT 1,975,778 19.2
AATTC : GAATT 3,843,858 18.5
GAGGC : GCCTC 3,452,260 18.5
GCGAC : GTCGC 215,949 18.5
AGGCA : TGCCT 4,143,383 18.3
GTCCA : TGGAC 1,814,974 18.2
AAGGC : GCCTT 2,480,421 18.1
CTCTC : GAGAG 3,640,518 17.6
AGGTG : CACCT 3,202,165 17.5
GGGTA : TACCC 1,491,394 17.4
AAATT : AATTT 7,925,716 17.3
AGGAA : TTCCT 5,349,454 17.2
AACTA : TAGTT 2,982,814 17.1
AGGAC : GTCCT 2,045,582 17.1
GCCTA : TAGGC 1,522,285 17.1
GTCTA : TAGAC 1,650,946 17.0
CACTA : TAGTG 2,012,473 16.9
106
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences ATGAG : CTCAT 3,452,201 16.8
TGCCA : TGGCA 3,270,540 16.8
GTCGA : TCGAC 180,867 16.6
ATGTA : TACAT 3,772,157 16.5
CACCC : GGGTG 2,407,840 16.2
CGCTC : GAGCG 372,157 16.2
ATGAC : GTCAT 2,264,189 15.9
AGCAT : ATGCT 3,100,402 15.8
TGGAA : TTCCA 4,630,455 15.5
AAGAG : CTCTT 4,263,642 15.5
CTGTA : TACAG 3,350,982 15.2
CAGAG : CTCTG 4,452,819 14.9
TAGAA : TTCTA 4,232,848 14.8
ATGTC : GACAT 2,573,351 14.8
CGGAC : GTCCG 203,247 14.8
CGGTA : TACCG 202,040 14.8
ATGTG : CACAT 3,956,837 14.7
AACCA : TGGTT 3,195,409 14.7
AGCCT : AGGCT 3,773,234 14.6
CGGCC : GGCCG 549,181 14.5
CAGGC : GCCTG 3,670,279 14.4
CGGTC : GACCG 208,544 14.4
GACCA : TGGTC 2,092,573 14.3
AGGTA : TACCT 2,325,061 14.2
ATCAC : GTGAT 2,874,220 14.0
AACCT : AGGTT 2,891,494 13.8
AGCCC : GGGCT 2,137,993 13.1
ACGCG : CGCGT 76,690 13.0
TACTA : TAGTA 2,388,396 12.6
AGGCC : GGCCT 2,401,729 12.5
GTCAA : TTGAC 2,080,186 12.5
CACAG : CTGTG 3,867,255 12.4
GTGAA : TTCAC 3,329,761 12.3
107
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences CTCAA : TTGAG 3,703,373 12.2
CTGAC : GTCAG 2,300,700 11.8
CGCAG : CTGCG 426,776 11.7
CTGTC : GACAG 2,867,756 11.5
CGCGA : TCGCG 87,447 11.5
TTCAA : TTGAA 4,996,620 11.4
AACCC : GGGTT 2,288,382 11.4
CACAC : GTGTG 3,837,959 11.2
AAGAC : GTCTT 3,105,255 11.2
GTCAC : GTGAC 1,788,265 11.1
CGCCC : GGGCG 629,376 11.1
AGCAG : CTGCT 3,469,011 11.0
GGGCA : TGCCC 2,539,788 11.0
CTCAG : CTGAG 4,276,152 10.9
CTCAC : GTGAG 3,214,093 10.9
AAGAA : TTCTT 7,391,191 10.8
AACAT : ATGTT 4,629,550 10.8
AACTT : AAGTT 3,778,240 10.8
GAGAC : GTCTC 2,984,778 10.8
GACAA : TTGTC 2,690,605 10.4
CAGAC : GTCTG 2,330,738 10.3
GACAC : GTGTC 1,948,916 10.3
ATGAA : TTCAT 5,057,318 9.9
GGCCA : TGGCC 2,720,333 9.9
CACAA : TTGTG 3,548,989 9.8
AACTG : CAGTT 3,186,755 9.7
CAGAA : TTCTG 5,142,565 9.6
CTGAA : TTCAG 4,060,876 9.6
AGCTA : TAGCT 2,655,556 9.4
AGCAC : GTGCT 2,450,745 9.4
AACAC : GTGTT 2,920,967 9.2
AACAA : TTGTT 5,508,395 9.1
ATGCA : TGCAT 3,390,737 9.1
108
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences CAGCG : CGCTG 443,621 9.0
TGCAA : TTGCA 3,612,657 8.9
AGCAA : TTGCT 3,967,310 8.8
AAGTG : CACTT 3,706,350 8.7
GTGTA : TACAC 2,069,745 8.7
AAGCT : AGCTT 3,011,869 8.6
GTGCA : TGCAC 2,549,578 8.6
GGCCC : GGGCC 1,501,415 8.6
AGGCG : CGCCT 806,814 8.6
CACTG : CAGTG 3,907,592 8.5
AACAG : CTGTT 3,745,316 8.5
CGGCA : TGCCG 351,139 8.5
TACAA : TTGTA 3,821,239 8.4
GAGAA : TTCTC 5,060,061 8.3
CACTC : GAGTG 2,650,103 8.3
GAGCA : TGCTC 2,535,818 8.3
CTGCA : TGCAG 4,023,532 8.2
AAGTC : GACTT 2,548,544 8.2
AAGTA : TACTT 3,338,487 7.8
GACTC : GAGTC 1,793,884 7.8
CAGTC : GACTG 2,101,763 7.6
GACTA : TAGTC 1,627,837 7.4
GACCC : GGGTC 1,391,465 7.2
AGCTG : CAGCT 3,659,524 7.1
CAGTA : TACTG 2,434,289 7.0
AAGCA : TGCTT 3,978,654 6.8
CGCAA : TTGCG 294,062 6.8
ATGCC : GGCAT 2,461,463 6.5
CTGCC : GGCAG 3,385,498 6.2
GGCAA : TTGCC 2,574,229 6.2
GAGTA : TACTC 1,948,447 6.1
CAGCA : TGCTG 3,994,634 6.0
AAGCC : GGCTT 2,412,704 5.8
109
Supplementary Table 12 continued
The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic
mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference outside transcript regions
Whole genome intergenic mutations per million quintuplet
occurrences TAGCA : TGCTA 2,478,445 5.7
CGGAA : TTCCG 351,915 5.7
AACTC : GAGTT 2,872,014 4.9
CGCTA : TAGCG 203,508 4.9
AAGCG : CGCTT 425,150 4.7
GGCTA : TAGCC 1,745,360 4.6
AGCTC : GAGCT 2,477,629 4.4
CAGCC : GGCTG 3,816,659 3.9
GGCAC : GTGCC 1,883,650 3.8
CGCCA : TGGCG 565,505 3.5
ATGCG : CGCAT 306,587 3.2
CGCAC : GTGCG 330,260 3.0
GAGCC : GGCTC 2,349,760 2.6
110
Supplementary Table 13
The effect of +/− one base flanking the mutated TAG on the rates of unspliced transcript (transcribed
regions, including intron) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript (sense
strand) per million quintuplet occurrences
ATAGG 1,196,711 819.7
CTAGG 1,156,560 721.1
CTAGC 824,340 691.5
ATAGC 1,097,065 674.5
ATAGA 1,825,784 660
CTAGA 1,343,347 616.4
GTAGG 980,728 599.6
ATAGT 1,607,995 458.3
GTAGA 1,498,367 433.8
GTAGC 1,033,949 427.5
CTAGT 1,023,498 421.1
TTAGG 1,442,396 394.5
GTAGT 1,169,626 363.4
TTAGC 1,393,011 345.3
TTAGA 1,930,896 287.9
TTAGT 1,860,278 227.4
111
Supplementary Table 14
The effect of +/− one base flanking the mutated CAG on the rates of unspliced transcript (transcribed
regions, including intron) mutations in AA-UTUC
Quintuplet sequence Number of occurrences in
reference unspliced transcript regions
Whole genome mutations in unspliced transcript (sense
strand) per million quintuplet occurrences
ACAGG 2,179,044 660.8
ACAGC 1,468,809 630.4
ACAGT 1,893,468 538.7
CCAGG 2,920,014 515.8
ACAGA 2,556,782 506.5
GCAGG 2,005,285 505.2
CCAGA 1,968,757 455.1
GCAGC 1,521,819 427.8
TCAGG 2,157,273 426.5
CCAGT 1,678,066 424.3
GCAGA 1,956,583 413.5
CCAGC 2,637,820 396.2
GCAGT 1,907,864 367.4
TCAGC 1,997,426 358.0
TCAGA 2,192,906 350.2
TCAGT 2,036,941 350.0
112
Supplementary Table 15
Hypergeometric analysis for enrichment of CAG splice-site mutations in AA-UTUCs, AA-treated HK2 clones, and non–AA-associated cancers
A>T:T>A
splice non splice total p-value <x
AA
-UT
UC
s
3T 12 34 46 1.55E-02
6T 32 103 135 4.76E-06
9T 40 189 229 7.58E-04
10T 8 25 33 6.58E-03
13T 24 99 128 1.27E-03
20T 39 171 210 2.41E-04
79T 29 111 140 1.70E-04
80T 32 132 164 2.89E-04
100T 49 149 198 6.60E-09
AA
trea
ted
HK
2 cl
ones
HK2_clone 1 7 8 15 5.62E-05
HK2_clone 2 3 15 18 2.29E-04
non-
AA
-as
soci
ated
ca
ncer
s
H.pylori associated gastric cancer (n=15) 1 13 14 0.76
OV associated cholangiocarcinoma (n=8) 3 49 52 0.21
Exome CAG triad 93,292 776,280
113
Supplementary Table 16
RPKM gene expression values for 15 NMD pathway genes in the AA-UTUC and matched normal tissue
80N 80T
RPKM value
MAGOH 6.87443 33.0464
WIBG 10.9648 28.1009
SMG5 3.94804 19.5251
UPF1 4.83847 19.7272
EIF4A3 3.58962 14.3719
DHX34 2.08127 10.5602
RBM8A 4.06941 10.9308
UPF2 2.40556 3.72954
SMG7 1.01325 2.12839
UPF3B 0.403335 1.39435
CASC3 2.20616 3.01697
SMG6 1.4214 2.00396
SMG8 0.085067 0.283679
SMG1 0.170046 0.164403
SMG9 14.6615 12.3358
114
Supplementary Table 17
Identities of 3′ splice sites with CAG>CTG mutations and RPKM > 2
Gene
symbol Exon RPKM Finding Ensembl transcript
METAP2 4 2.13
Exon skipping by reads spanning
flanking exons ENST000000261220
TRAM1 7 2.13
Exon skipping by read depth and reads
spanning flanking exons ENST00000262213
GGNBP2 4 2.4
Exon skipping by reads spanning
flanking exons ENST00000304718
CNOT8 4 2.62 Mutated intron retained ENST00000523698
OXSM 2 4.01
Exon skipping by reads spanning
flanking exons ENST00000420173
MARS 15 4.63 Mutated intron retained ERST00000262027
RBM10 18 5.31 Mutated intron retained ENST00000377604
AIDA 4 5.38 IC; high 3' RPKM ENST00000340020
SEC31A 3 5.39 IC ENST00000348405
MLL2 13 6.18 IC; high 3' RPKM ENST00000301067
AEBP1 8 8.96
All exons 5' of the mutation skipped by
read depth ENST00000223357
C3orf19 6 9.27 IC ENST00000285042
RFC2 9 15.13
Exon skipping by read depth and reads
spanning flanking exons ENST00000352131
MBOAT7 2 25.41
Exon skipping by read depth and reads
spanning flanking exons ENST00000245615
SRRT 4 27.55 No aberration ENST00000423692
Note: IC = inadequate coverage at site of hypothetical mutation high 3' RPKM = overall RPKM is high because of very high coverage in the 3' most exon
115
Supplementary Table 18
3′ splice sites without CAG>CTG mutations for evaluating the proportion of unmutated sites associated
with aberrant splicing
Gene
symbol Exon RPKM Finding Ensembl transcript
HTRA3 2 2.13 No aberration ENST00000307358
PPFIBP2 17 2.13 No aberration ENST00000299492
SMG7 20 2.13 IC ENST00000367537
KIF3B 5 2.13 IC; high 3' RPKM ENST00000375712
LSP1 2 2.4 IC; high 3' RPKM ENST00000381775
ARHGAP26 13 2.4 No aberration ENST00000274498
DHDS 5 2.62 No aberration ENST00000374194
RSL24D1 2 2.62 IC; high 3' RPKM ENST00000260443
ITGA1 27 4 No aberration ENST00000282588
CCBP2 3 4.01 IC ENST00000496604
AGPAT6 7 4.62 No aberration ENST00000396987
RHBDL2 2 4.63 No aberration ENST00000372985
MEF2D 6 5.3 No aberration ENST00000348159
TMEM98 3 5.32 No aberration ENST00000439138
HDAC7 21 5.38 IC ENST00000380610
KDM5C 20 5.38 No aberration ENST00000375401
TIAM1 4 5.3 No aberration ENST00000399841
AP3D1 6 5.4 No aberration ENST00000355272
PRKDC 45 6.11 IC ENST00000523565
PDE4DIP 6 6.33 IC ENST00000369359
UBAP2L 14 8.95 No aberration ENST00000271877
TRIM26 5 9.24 No aberration ENST00000454678
PCIG1 13 9.28 No aberration ENST00000443130
RCC1 12 15.12 No aberration ENST00000398962
INPP5K 12 15.13 No aberration ENST00000406424
IDH1 4 25.4 No aberration ENST00000345146
RTN2 5 25.42 No aberration ENST00000245923
SFRS16 5 27.52 No aberration ENST00000221455
BRF1 2 27.9 No aberration ENST00000327359
Note: IC = inadequate coverage at site of hypothetical mutation high 3' RPKM = overall RPKM is high because of very high coverage in the 3' most exon
116
Supplementary Table 19
Sequence analysis summary of two exome-sequenced AA-treated HK2 clones
Bases in Target
Region
Bases Mapped to
Target Region
Ave. Depth Per
Targeted Base
Targeted Bases
with Depth at
Least 1X
Targeted Bases
with Depth at Least
20X
Somatic Mutations
Identified in
Targeted Region
HK2_ctrl 37804019 29,125,955 34 91.6 56
HK2_clone 1 37804019 30,308,167 34 91.7 57 168
HK2_clone 2 37804019 23,749,988 28 91.1 49 219
117
Supplementary Table 20
Somatic nonsynonymous substitutions in protein-coding genes of AA-treated HK2 clones
Gene Symbol Sample ID Nucleotide (Genomic) AA
Change
Change
Type
ITGA1 AA_HK2_clone 1 g.chr5: 52235727 A>W S1080C Missense
PTPRB AA_HK2_clone 1 g.chr12: 70932712 A>W I1954K Missense
EBAG9 AA_HK2_clone 1 g.chr8: 110575694 A>W Y197F Missense
JAZF1 AA_HK2_clone 1 g.chr7: 28111249 A>W W2R Missense
NT5DC1 AA_HK2_clone 1 g.chr6: 116466615 A>W T182S Missense
FAM73A AA_HK2_clone 1 g.chr1: 78326961 A>W Q443L Missense
ZSWIM3 AA_HK2_clone 1 g.chr20: 44506288 A>W N364I Missense
KIAA0586 AA_HK2_clone 1 g.chr14: 58896090 A>W D70V Missense
TET2 AA_HK2_clone 1 g.chr4: 106157825 A>W Q909L Missense
JAK2 AA_HK2_clone 1 g.chr9: 5069984 A>W T525S Missense
ACTR10 AA_HK2_clone 1 g.chr14: 58701154 A>W Q380L Missense
C22orf23 AA_HK2_clone 1 g.chr22: 38349178 A>W F58I Missense
KRT1 AA_HK2_clone 1 g.chr12: 53070958 A>R L380P Missense
RTP4 AA_HK2_clone 1 g.chr3: 187088871 A>W T151S Missense
PKHD1 AA_HK2_clone 1 g.chr6: 51513899 A>W V3765E Missense
PLEKHG1 AA_HK2_clone 1 g.chr6: 151152519 A>W S758C Missense
MTF2 AA_HK2_clone 1 g.chr1: 93584947 A>W T31S Missense
TG AA_HK2_clone 1 g.chr8: 134145907 A>W - Splice site
PAN2 AA_HK2_clone 1 g.chr12: 56713400 A>W - Splice site
B3GAT2 AA_HK2_clone 1 g.chr6: 71571491 A>W H309Q Missense
HMGCS2 AA_HK2_clone 1 g.chr1: 120307153 A>W Y67X Nonsense
DKK1 AA_HK2_clone 1 g.chr10: 54076441 A>W R225S Missense
RUNX2 AA_HK2_clone 1 g.chr6: 45479981 A>W - Splice site
ZNF766 AA_HK2_clone 1 g.chr19: 52793724 A>W K227M Missense
CHD8 AA_HK2_clone 1 g.chr14: 21878104 T>W E478V Missense
MLL2 AA_HK2_clone 1 g.chr12: 49420693 G>R A5019V Missense
JMJD1C AA_HK2_clone 1 g.chr10: 64974567 T>W T454S Missense
HIVEP2 AA_HK2_clone 1 g.chr6: 143092547 T>W H1110L Missense
CASQ1 AA_HK2_clone 1 g.chr1: 160167402 T>W - Splice site
C1orf186 AA_HK2_clone 1 g.chr1: 206241623 T>W S56C Missense
ROCK2 AA_HK2_clone 1 g.chr2: 11332613 T>W M1305L Missense
KRT36 AA_HK2_clone 1 g.chr17: 39643829 T>W Q287L Missense
FAM26D AA_HK2_clone 1 g.chr6: 116879255 T>K S90A Missense
FEZF1 AA_HK2_clone 1 g.chr7: 121942943 T>W R323X Nonsense
118
Supplementary Table 20 continued
Somatic nonsynonymous substitutions in protein-coding genes of AA-treated HK2 clones
Gene Symbol Sample ID Nucleotide (Genomic) AA
Change
Change
Type
BNC1 AA_HK2_clone 1 g.chr15: 83931855 T>W E716D Missense
OR2AE1 AA_HK2_clone 1 g.chr7: 99473789 T>W T290S Missense
MAGEB6B AA_HK2_clone 1 g.chrX: 26179580 T>W L288Q Missense
MUC16 AA_HK2_clone 1 g.chr19: 9086915 T>W T1634S Missense
RHEB AA_HK2_clone 1 g.chr7: 151174464 T>W D77V Missense
CENPJ AA_HK2_clone 1 g.chr13: 25480785 T>W N464I Missense
PARL AA_HK2_clone 1 g.chr3: 183585748 T>W R76X Nonsense
JAM2 AA_HK2_clone 1 g.chr21: 27078366 T>W V258E Missense
SLC9A9 AA_HK2_clone 1 g.chr3: 143271281 C>Y V338I Missense
VPS24 AA_HK2_clone 1 g.chr2: 86737516 C>M A125S Missense
C13orf40 AA_HK2_clone 1 g.chr13: 103382804 C>S R6748T Missense
IL22RA2 AA_HK2_clone 1 g.chr6: 137476239 C>Y W104X Nonsense
DNAH7 AA_HK2_clone 2 g.chr2: 196753635 A>T V1706E Missense
FANCM AA_HK2_clone 2 g.chr14: 45650702 A>T K1431I Missense
AC073343.2 AA_HK2_clone 2 g.chr7: 6715977 A>T T160S Missense
ITGA2B AA_HK2_clone 2 g.chr17: 42463385 A>T F101L Missense
PMAIP1 AA_HK2_clone 2 g.chr18: 57569877 A>T - Splice site
PTPRB AA_HK2_clone 2 g.chr12: 70932712 A>T I1954K Missense
HOXB5 AA_HK2_clone 2 g.chr17: 46670858 A>G S63P Missense
IKZF5 AA_HK2_clone 2 g.chr10: 124753324 A>T F411Y Missense
TRIO AA_HK2_clone 2 g.chr5: 14330935 A>T T594S Missense
KRT38 AA_HK2_clone 2 g.chr17: 39593694 A>T C447X Nonsense
UPK3A AA_HK2_clone 2 g.chr22: 45689079 A>T I197F Missense
KTN1 AA_HK2_clone 2 g.chr14: 56108481 A>T - Splice site
BICC1 AA_HK2_clone 2 g.chr10: 60556187 A>T T423S Missense
NLE1 AA_HK2_clone 2 g.chr17: 33462448 A>C V345G Missense
TRO AA_HK2_clone 2 g.chrX: 54957697 A>T K683X Nonsense
TMEM86A AA_HK2_clone 2 g.chr11: 18723407 A>T T192S Missense
SNCAIP AA_HK2_clone 2 g.chr5: 121799229 A>T K1004X Nonsense
KRT1 AA_HK2_clone 2 g.chr12: 53070958 A>G L380P Missense
PKP2 AA_HK2_clone 2 g.chr12: 32974403 A>C W678G Missense
SLC5A12 AA_HK2_clone 2 g.chr11: 26743137 A>T L42Q Missense
ABCC4 AA_HK2_clone 2 g.chr13: 95840754 T>A R436X Nonsense
ANPEP AA_HK2_clone 2 g.chr15: 90340900 T>A E688V Missense
IFNA8 AA_HK2_clone 2 g.chr9: 21409555 T>A V127E Missense
119
Supplementary Table 20 continued
Somatic nonsynonymous substitutions in protein-coding genes of AA-treated HK2 clones
Gene Symbol Sample ID Nucleotide (Genomic) AA
Change
Change
Type
AC104667.3 AA_HK2_clone 2 g.chr2: 238499912 T>C - Splice site
CDC27 AA_HK2_clone 2 g.chr17: 45201328 T>A - Splice site
REEP1 AA_HK2_clone 2 g.chr2: 86444235 T>A - Splice site
DPP3 AA_HK2_clone 2 g.chr11: 66260238 T>G V347G Missense
SERPINB11 AA_HK2_clone 2 g.chr18: 61383361 T>A N36K Missense
FEN1 AA_HK2_clone 2 g.chr11: 61563225 T>G V131G Missense
BID AA_HK2_clone 2 g.chr22: 18222132 T>G T162P Missense
ZMYM4 AA_HK2_clone 2 g.chr1: 35870649 T>G V1185G Missense
RNF216 AA_HK2_clone 2 g.chr7: 5792480 T>A - Splice site
ABCA10 AA_HK2_clone 2 g.chr17: 67181728 T>A K796I Missense
RCBTB2 AA_HK2_clone 2 g.chr13: 49070341 T>A K501X Nonsense
ABCA2 AA_HK2_clone 2 g.chr9: 139909983 T>A I1194F Missense
ZBTB17 AA_HK2_clone 2 g.chr1: 16271027 T>A H380L Missense
PAPOLG AA_HK2_clone 2 g.chr2: 60988881 T>G V62G Missense
DNHD1 AA_HK2_clone 2 g.chr11: 6589392 T>G V4153G Missense
C12orf51 AA_HK2_clone 2 g.chr12: 112641549 T>A D2344V Missense
ATP8B2 AA_HK2_clone 2 g.chr1: 154303054 T>C S72P Missense
NCOA6 AA_HK2_clone 2 g.chr20: 33345199 T>A Q451L Missense
HERC4 AA_HK2_clone 2 g.chr10: 69750124 T>A K493X Nonsense
MGAT4A AA_HK2_clone 2 g.chr2: 99279644 T>A - Splice site
120
Supplementary Table 21
Comparison of mutation rates in AA-UTUC, carcinogen-induced cancers, mismatch repair–defective
colorectal cancers, and POLE/POLD1 mutated colorectal cancers
AA-
UTUC
(n=9)
UV-
melanoma
(n=7)
tobacco-lung
cancer (n=12)
OV-
CCA
(n=8)
H. pylori-
gastric
cancer
(n=15)
MSI-
colorectal
cancer (n=34)
POLE
mutated
colorectal
cancer (n=8)
1103.67 335.85 213.83 128 68 1596.5 3678
121
Supplementary Table 22
Primer sequences
Primer sequences used in RT-qPCR
Primers ID Sequences
UPF1_forward 5' AAC GAG CAC CAA GGC ATT GGC T 3'
UPF1_reverse 5' GGC TGC TTT GAT AGT GCC TTC G 3'
UPF2_forward 5' TCT CAC CTG AGG ACC AGT GTA C 3'
UPF2_reverse 5' AGC TGG AGG TGG GTT GCA GTA G 3'
UPF3A_forward 5' TAC TGG AGG TGG CAA GCA GGA A 3'
UPF3A_reverse 5' CCT GTG CTC TTT ATC ACT GCC G 3'
MAGOH_forward 5' CCT GGA GTT TGA GTT TCG ACC G 3'
MAGOF_reverse 5' TCT CTT CAG TTC CTC CAT CAC GC 3'
Primer sequences used in verification of MBOAT7 exon skipping
Primers ID Sequences
MBOAT7_E2_forward 5' TCT TAT CTC CAT CCC CAT CG 3'
MBOAT7_E5_reverse 5' CTG AGG CCA TTT CCT TCC T 3'