supplementary information - images.nature.com · supplementary information ... super nssnp genes...
TRANSCRIPT
1
SUPPLEMENTARY INFORMATION
Extensive genomic and transcriptional diversity identified through massively parallel DNA and
RNA sequencing of eighteen Korean individuals
Young Seok Ju, Jong-Il Kim, Sheehyun Kim, Dongwan Hong, Hansoo Park, Jong-Yeon Shin,
Seungbok Lee, Won-Chul Lee, Sujung Kim, Saet-Byeol Yu, Sung-Soo Park, Seung-Hyun Seo, Ji-
Young Yun, Hyun-Jin Kim, Dong-Sung Lee, Maryam Yavartanoo, Hyunseok Peter Kang, Omer
Gokcumen, Diddahally. R. Govindaraju, Jung Hee Jung, Hyonyong Chong, Kap-Seok Yang, Hyungtae
Kim, Charles Lee and Jeong-Sun Seo
Nature Genetics: doi:10.1038/ng.872
2
Table of Contents
Supplementary Note
1. Genome Analysis
Sensitivity of SNP calling··················································································· 5
Super nsSNP genes ························································································ 6
Estimating the number of novel variants as the number of personal genomes increases ·········· 7
Linkage disequilibrium of novel variants ······························································ 7
Detection of large deletions ·············································································· 8
Identification of large deletion breakpoints ··························································· 9
Inferring molecular mechanisms for large deletion formation ···································· 9
Microhomology-sequence motifs for NHEJ deletions ············································ 10
De novo assembly of short reads ····································································· 10
2. Transcriptome analysis
Sequence alignment for transcriptome ······························································ 12
Expression mapping ······················································································ 12
Unknown transcripts ······················································································ 13
Genes escape X-inactivation············································································ 13
3. Comprehensive analysis of genome and transcriptome
Transcriptional base modification (TBM) ···························································· 15
Allelic-specific expression (ASE) ······································································ 15
References ·································································································· 17
Nature Genetics: doi:10.1038/ng.872
3
Supplementary Figures
Suppl. Figure 1: Experimental overview of the study ··········································· 18
Suppl. Figure 2: Sensitivity of SNP detection in population-level ··························· 19
Suppl. Figure 3: Description of PRIM2 gene - Example of super nsSNP gene ················· 20
Suppl. Figure 4: Super nsSNP genes ······························································ 21
Suppl. Figure 5: Overview of the strategy for detecting large deletion breakpoints ··········· 22
Suppl. Figure 6: Comparison of RNA sequence alignments on cDNA and genome
sequences ······································································································ 23
Suppl. Figure 7: Validation of 4 unknown transcripts··········································· 24
Suppl. Figure 8: Distance between unknown transcripts and nearest gene ···················· 25
Suppl. Figure 9: Validation of 15 TBMs ···························································· 26
Supplementary Tables
Suppl. Table 1: Whole Genome Sequencing Statistics (10 individuals) ··························· 28
Suppl. Table 2: WGS SNP Accuracy from validation using Illumina 610K genotyping array ··· 29
Suppl. Table 3: Primers for validations······························································ 30
Suppl. Table 4: Indel list of 10 individuals extracted by whole genome sequencing ············ 31
Suppl. Table 5: Exome sequencing statistics ························································· 32
Suppl. Table 6: Non-synonymous SNP list detected from 18 individuals
(10 whole genome sequencing, 8 whole exome sequencing) ·················· 33
Suppl. Table 7: Funtional assessment of nsSNP of 18 individuals ································ 34
Suppl. Table 8: Super nsSNP gene list ································································ 35
Suppl. Table 9: List of Korean common novel nsSNP LD ·········································· 36
Suppl. Table 10: Total 5,496 large deletion list of 8 individuals
including breakpoints information ··················································· 37
Suppl. Table 11: Validation of large deletions using 24M CGH array data ······················· 38
Nature Genetics: doi:10.1038/ng.872
4
Suppl. Table 12: Breakpoints list of NA10851 ························································ 39
Suppl. Table 13: Motifs on flanking regions of NHEJ large deletions······························ 40
Suppl. Table 14: RNA Sequencing Statistics ························································· 41
Suppl. Table 15: Comparison of transcriptome alignment methods using pseudogene
expression ········································································································· 42
Suppl. Table 16: Expression map represented in RPKM value on all RefSeq genes ··········· 43
Suppl. Table 17: List of Korean common novel transcripts ········································· 44
Suppl. Table 18: 23 Genes Escape X-inactivation ····················································· 45
Suppl. Table 19: 1,809 TBM sites ··········································································· 46
Suppl. Table 20: 580 Allele Specific Expression sites ···················································· 47
Suppl. Table 21: Contig list generated by de novo assembly ···········································48
Suppl. Table 22: Alignment result of de novo assemble contigs ······································· 49
Nature Genetics: doi:10.1038/ng.872
5
Supplementary Note
1. Genome Analysis
1.1. Sensitivity of SNP calling
The number of SNPs detected in an individual genome can be changed by altering SNP filter
conditions. Even in the same individual, number of SNPs detected with different algorithms can be
changed extensively1 (3.84 million SNPs by ELAND, 4.13 million SNPs by MAQ, 3.61 million SNPs in
common). In this project, we focused on the identification of “rare” SNPs. As we may expect, since
false positive SNPs tend to appear in an individual specific manner (interpreted as being rare), the
number of rare SNPs can be easily overestimated if the SNP filter conditions are modified to allow for
more false positives to be included in the SNP list. Therefore, we have attempted to reduce the false
positives in our SNP list, attempting to maximize our PPV rather than detection sensitivity.
We have achieved a high PPV (>99.94%) for SNP detection, based on the comparison of whole-
genome sequencing and microarray data. SNPs we called are accurate. Our experimental validation
by PCR and Sanger sequencing also supported our high accuracy of SNPs we called (100%). Given
the PPV of SNP detection (~ 99.94%), we expect the number of false positives in each individual
genome to be approximately 2,000 (3.5 million SNPs x 0.0006).
Excluding data from AK1 and AK2 (sequenced by earlier platforms), the sensitivities of SNP detection
for the other Korean individuals are ~ 97%. Most of the variants that we could not detect (false
negatives) appear to be covered by only a few reads, with which we cannot accurately call SNPs. For
example, AK4, 62.4% (n=4,930) of the mismatch-sites between sequencing and 610K microarray
were covered less than 10 times. Lower coverage (23.1x) than 30x may account for the part of the
insufficient read-depth for SNP calling. Interestingly, 74.3% (n=3,665) of these regions showed high
GC (> 55%) or low GC (<25%) contents. The GA technology does not provide robust sequence data
in genomic regions of high or low GC contents1.
Because we sequenced 10 individual genomes, the population-level sensitivity of SNP detection is
much higher than that of each individual. Compared with microarray-data, we identified ~95% of
Nature Genetics: doi:10.1038/ng.872
6
singletons. However, we could identify > 99% of SNPs when more than 2 individuals have the variants
in the genome (Supplementary Figure 2).
1.2. Super nsSNP genes
The density of nsSNPs on coding sequences was 0.254/kb on average. During the course of
summarizing nsSNPs, we identified a subset of genes with more nsSNPs than expected. These
“super nsSNP” genes were defined as those in which 1) the average number of nsSNPs was ≥2
among all the individuals whole-genome sequenced, and 2) the nsSNP density was ≥4/kb of coding
sequence.
We found that, in some cases, hidden duplication of genes that are frequently duplicated among the
population but are not located in the human reference genome may generate super nsSNP genes as
an artifact. For example, PRIM2 is a super nsSNP gene. However, the read-depth for PRIM2 genes is
highly elevated for all Korean individuals whole-genome sequenced, suggesting that copy number
gain in PRIM2 gene may be frequent in this population (Supplementary Figure 3). Interestingly, the
human reference genome includes only a single copy of PRIM2 on chromosome 6, whereas the C.
Venter genome2 includes PRIM2 homologous DNA segments on chromosome 5. Therefore, during
alignments, short reads generated from homologous segments should be mapped to chromosome 6;
thus, all mismatches between the segments of chromosome 5 and 6 would appear as “SNPs” (mostly
heterozygous) even though there are actually few “variants” of PRIM2 in the human genome.
Because the reference genome is not a perfect reference, the interpretation of human resequencing
should be done carefully. Of the 86 super nsSNP genes we identified, 15 showed more than a 30%
increase in read-depth (Supplementary Figure 4). In addition, 33 were located on the segmental
duplications of human genomes, and 75 overlapped with known CNV regions archived in the
Database of Genome Variants (DGV). (Only 9 super nsSNP genes were not related to increased
read-depth, segmental duplications, or CNV regions in the DGV.) These observations suggest that
structural variants could partially account for super nsSNP genes.
1.3. Estimating the number of novel variants as the number of personal genomes
increases
Nature Genetics: doi:10.1038/ng.872
7
We simulated the number of novel variants that would be “discovered” as the number of personal
genomes increases. To accomplish this, we first randomly permuted the order of 10 personal
genomes 1000 times. At each step, we obtained the average number of “new” variants; that is, those
that were not archived in genome databases and were not discovered in the previous step. This
method was applied to SNPs, short Indels, and nsSNPs. For nsSNPs, we performed identical tests
using nsSNPs from 18 individuals (10 whole-genomes and 8 whole-exomes) to further confirm the 10
individual estimates.
Then, using the numbers of each novel variant at each incremental step, we obtained trend curves
by the least-mean-square method. Surprisingly, the extrapolation of the trend showed that a number
of SNPs would be identified as novel, even though many personal genomes are deep sequenced. For
example, ~54,000 novel SNPs would be identified after sequencing 100 haploid genomes (50
individuals; < 1% allele frequency). Similarly, ~28,000, ~6000 and ~700 SNPs would be discovered as
novel when 100 (< 0.5%), 500 (< 0.1%) and 5,000 (< 0.01%) individuals are sequenced in high depth,
respectively. However, this number should be interpreted cautiously, in particular when extensive
extrapolation was used, since approximately 2,000 SNPs are expected to be false positives in an
individual genome, given the PPV in SNP detection (99.94%).
The number of novel nsSNP decreases relatively more slowly than SNPs. From these observations,
we may conclude that nsSNPs are relatively more diverse than SNPs.
1.4. Linkage disequilibrium of novel variants
We examined the linkage disequilibrium (correlation, r2) between novel but common (allele frequency
≥ 2/20 among Koreans) non-synonymous and surrounding SNPs (within 20 kb), both upstream and
downstream, among 10 individual whole genomes sequenced. We were particularly interested in the
linkage relationship between novel non-synonymous and known variants, or tagging SNPs, for
common genotyping arrays, since the novel nsSNPs are likely to be functional. However, the linkage
with known variants should be tight if they are to be detected as candidates for complex diseases in
genome-wide association studies (GWAS).
Nature Genetics: doi:10.1038/ng.872
8
1.5. Detection of large deletions
Because of incompleteness of the human reference genome (e.g., PRIM2), and genomic
characteristics of that makes some genomic regions „inaccessible‟ by sequencing technologies (e.g.,
extremely high GC ratio, repetitive sequence, gaps), whole-genome resequencing of single
individuals against the human reference genome is not a feasible and accurate approach for
identifying structural variations that show real polymorphisms among human populations. In this
manuscript, by comparing multiple genomes in parallel, we could identify reliable “polymorphisms”
existing in human populations.
To find SV, we used read-depth (RD), paired-end (PE) read, and split-reads information of each
personal genomes and performed pairwise comparison between genomes. First, we calculated the
normalized (to 25.0x) personal whole-genome read-depth of coverage in a 30 bp window size as
follows:
normalized RD30bp window ,person = RD30bp window ,person × 25.0x
average whole− genome RDperson
Then we compared the 30 bp window coverage between two individuals
(RD deviation = normalized RD30bp window ,person A normalized RD30bp window ,person B) using all available
combinations (N= C2 = 28 )8 . To be defined as a deletion candidate, the read-depth deviation should
be increased (4/3 = 1.33x) or decreased (3/4 = 0.75x) for more than 33 windows in a row (>1 kb long).
Likewise, to identify regions of homozygous deletion for all individuals considered, we also counted
regions with in which the read-depth for all individuals was less than 5x for more than 33 windows in a
row (>1 kb long). Because the frequency of CN loss predominates over CN gain in the human
genome, we proceeded to next step using the assumption that all candidate regions identified above
correspond to CN loss.
For the next step, we investigated stretched paired-end reads, aligning each end onto each flanking
region of large deletion candidates (defined as <1 kb from the estimated junction). We regarded
deletion candidate regions with ≥2 stretched paired-end reads as suggestive. However, existence of
stretched paired-end reads is not an essential prerequisite for large deletion; thus, we did not remove
candidates with one or no paired-end reads in this step.
Nature Genetics: doi:10.1038/ng.872
9
Thereafter, we checked read-depth changes and paired-end reads near deletion candidate regions for
all individuals in parallel. Regions with nearby unstable read depths, which make it difficult to
determine deletion, were removed. If an individual region exhibited a remarkable read-depth decline
compared to flanking regions or if read depths were variable among individuals, it was considered a
large deletion. If any individuals showed fitting of stretched reads to the large deletion regions, all
individuals who showed a clear decline in read depth for the regions were regarded as carrying the
deletion in their genome. If all individuals showed a read depth of approximately zero (RD < 3), we
regarded the regions as unimorphic CN losses.
1.6. Identification of large deletion breakpoints
To identify nucleotide-resolution breakpoints of large deletions, we used “orphan reads”, which are
short-reads that only one of paired-end reads is successfully aligned to the reference genome. We
collected all the „orphan reads‟ which were mapped within 1 kb of an estimated large deletion
boundary. Then unmapped ends of the orphan-reads were re-aligned to reference genome using the
BLAT3 program, to check if it could be split and separately aligned („split reads‟) to both side of large
deletion region. The alignment information of the „split reads‟ provides nucleotide-resolution
breakpoints of large deletions.
To collect the most reliable breakpoints for each large deletions, we picked a split-reads group that
had the greatest number of split reads with the same gap coordinates. When there was more than
one split-reads group with different gap coordinates and the same number of split reads, we chose
split reads with the closest gap size compared to the estimated deletion size. In this way, we
summarized BLAT results of orphan reads to one best split-reads group for each detected large
deletion.
1.7. Inferring molecular mechanisms for large deletion formation
To infer mechanisms of large deletion formation, we classified 5,496 large deletions we detected into
4 particular mechanisms, such as VNTR, NAHR, TEI and NHEJ, using DNA sequences of large
deletions and breakpoint junction through an algorithm slightly modified from BreakSeq4. If > 80% of a
large deletion is covered by simple repeat sequences predicted by Tandem Repeat Finder5, it is
Nature Genetics: doi:10.1038/ng.872
10
categorized as "VNTR". Then, we compared flanking sequences of both ends of a deletion using
BLAST. If both the sequences share exact homology accross the breakpoints, we classified it as
NAHR. Large deletions not yet categorized as "VNTR" or "NAHR", are annotated by RepeatMasker,
and if a deletion is completely covered by any of transposable elements, e.g. Alu or LINE, the deletion
is classified as TEI. Finally, remaining large deletion lacking the former patterns are classified as
NHEJ
1.8. Microhomology-sequence motifs for NHEJ deletions
We assessed the microhomology-sequence motifs ≤10 bp within 400 bp upstream and downstream
from the breakpoints of NHEJ large deletions (or estimated boundaries if breakpoints were not
detected). In our large deletion set, there are 3,664 large deletions by NHEJ mechanisms. Large
deletions with identical positions were compressed, and finally we obtained a set of 1,022 non-
redundant NHEJ deletions, and 2,044 flanking DNA sequences of 400-bp long.
We used MEME (Motif-based sequence analysis tools, http://meme.ncbr.net/)6 for detection of the
microhomology-sequence. We have shown the most significant 3 motifs (Supplementary Table 13).
1.9. De novo assembly of short reads not aligned onto human genome reference
In order to find novel contigs that might emerge in Korean individuals, we conducted a de novo
sequence assembly using read data not aligned to the reference human genome, gathered from eight
Korean sequencings. Before merging vertices into contigs, we discarded all reads that contained any
ambiguities („N‟s) and those with the lowest base quality scores („B‟s) and then we obtained about
15G reads of sequence data. De novo sequence assembly of the filtered read data was carried out
using ABySS7 version 1.2.1 short-read assembler and the MPI (Message Passing Interface) protocol.
To assess assembly performance of overlapping sub-string values (k-mer), we compared assemblies
of 181.2 million paired-end reads for k-values ranging from 25 to 34 bp, and found the optimal size
(32 bp) of k-mers with parameters of four coverage depths and two erode bases. We then aligned the
assembled contigs greater than 1000 bp in length with the reference human genome (hg19) and the
Huref genome sequence using NCBI BLAST version 2.2.22 (parameters, e = 1 x 10-20
, a = 23,
Nature Genetics: doi:10.1038/ng.872
11
F = false, -X = 1000), and chose those contigs that mapped onto these genomes by less than 99%
using our own scripts. In addition, we aligned these contigs to the common chimpanzee whole-
genome shotgun draft assembly (Pan troglodytes 2), ultimately retaining contigs with a mapping ratio
greater than 99%. To analyze the context of the remaining contigs, we aligned DNA and RNA
sequencing data from eight Koreans to those contigs. Moreover, we aligned the sequence read data
of YH8, NA10851
9, NA12878
10, NA18507
1, NA19240
10, ABT
11, KB1
11, and Eskimo
12 onto each contig
using the GSNAP13
alignment tool, selecting the options “exact matches” and “unique alignment”, and
then compared the number of aligned contigs in each genome.
Nature Genetics: doi:10.1038/ng.872
12
2. Transcriptome Analysis
2.1. Sequence alignment for transcriptome
Using the GSNAP alignment tool, we aligned short reads from transcriptome sequencing to a set of
constructed mRNA sequences instead of the reference human genome to avoid mapping errors
resulting from mRNA splicing. As introduced in the main manuscript (Figure 5a), if short reads are
mapped to genomic sequences, short reads containing splice junctions usually cannot be aligned in
situ. The reads appear to be non-mapped, or mapped to ectopic sites, such as pseudogenes, which
have sequences that are highly homologous to the in situ sequences but lack introns. As a result, (1)
read depths for loci near splice junctions usually decrease; (2) pseudogenes appear to be transcribed,
especially near the sequences of splice junctions in real genes; and (3) false-positive variants are
detected due to ectopic alignments (Supplementary Figure 6 and Supplementary Table 15).
We generated the mRNA sequences set using information about exons from RefSeq, UCSC, and
Ensembl gene databases. All information was downloaded from the UCSC genome browser. Exons
for a total of 161,250 genes were available (33,907 from RefSeq, 65,271 from UCSC and 62,072 from
Ensembl). The mRNA sequences were generated from human reference genome NCBI Build 36.3
based on their exonic positions.
After mapping the short reads from transcriptome sequencing onto the set of 161,250 mRNA
sequences, the mapping information for each base (i.e., read depth, type and number of mismatches)
was transformed into genomic location from mRNA-scale. Results of transcriptome sequencing, such
as expression level and variants information, were obtained from this mapping information.
2.2. Expression mapping
We examined the expression level of human genes (RefSeq gene), normalized using reads per
kilobase of exon per million mapped reads (RPKM) values14
, calculated by applying the following
equation:
RPKM = 109
C
NL
Nature Genetics: doi:10.1038/ng.872
13
where C is the number of reads mapped to a total gene, N is the total number of mapped reads in the
experiment, and L is the length in base pairs of a gene. Using threshold > 1 RPKM14
, we identified
11,101 genes in active transcription in lymphoblastoid cell lines.
2.3. Unknown transcripts
Unknown transcripts were detected by filtering short reads from transcriptomes aligned outside of
currently known genomic regions. First, reads that failed to align to the 161,250 mRNA sequences
from the transcriptome pipeline were re-aligned with human reference genome NCBI build 36.3. Then,
short reads overlapping with any known gene regions were removed based on information obtained
from four different databases: (1) known genes from RefSeq gene (downloaded on 12 Sep. 2010), (2)
UCSC (downloaded on 10 May 2009), (3) Ensembl (downloaded 9 Aug. 2009), (4) and known mRNA
from GenBank (Downloaded 12 Sep. 2010). All database information was downloaded from
repositories at the UCSC genome browser (http://genome.ucsc.edu). To be conservative in identifying
unknown transcripts, we also removed short reads that overlapped with any human expressed
sequence tags (ESTs) (downloaded from the UCSC genome browser). Short reads that overlapped
known genic regions by ≥ 1 bp were filtered out. As a result, we obtained short reads that did not
overlap any known genic regions. These reads were collapsed to construct unknown transcript
regions for each individual. Unknown transcript regions with average read-depths < 4 were
considered insignificant and were removed. Finally, our interpretations were made more conservative
by removing unknown transcript regions found in only single individuals.
2.4. Genes escape X-inactivation
To find X-chromosome genes that are expressed at higher levels in females than in males, we
compared gene expression levels between genders. First, we removed 585 genes with < 1 RPKM in
gene expression from among the 948 genes on the X-chromosome. Then, we removed 186 genes for
which the average expression in females was lower than that in males. Using the expression level of
the remaining 177 genes, we performed a Wilcoxon rank-sum test to analyze differences between the
two groups. This non-parametric method was used because the sample size was not very large for
either group (9 males, 6 females). From these tests, we determined that the minimum significance
Nature Genetics: doi:10.1038/ng.872
14
level was 0.0018 (e.g. XIST) when the expression level in all six females was greater than that in
males. We selected 23 genes as candidate escape from X-inactivation genes at a significance level of
0.05.
To estimate the false discovery rates (FDR) for establishing a cut-off value, we calculated q-values
using the p-values of 162 genes tested using QVALUE software15
. The FDR values for the most
significant (XIST, p-value = 0.0018) and minimally suggestive (ALG13, p-value = 0.0392) genes were
0.017 and 0.151, respectively.
Nature Genetics: doi:10.1038/ng.872
15
3. Comprehensive analysis of genome and transcriptome
3.1. Transcriptional base modification (TBM)
We compared the sequences of each genome and transcriptome sets of 15 individuals (seven sets of
whole-genomes and transcriptomes, and eight sets of whole-exomes and transcriptomes), sequenced
using an Illumina Genome Analyzer. To identify modifications of RNA from genomic sequences, we
identified loci where variations were present in the transcriptome but not in the genome sequence.
SNPs in the transcriptome are defined as loci containing more than three identical mismatches in
high-quality reads (Q score > 15) with a mismatch allele frequency ≥ 20%. Genomic regions with no
variants were defined as loci where (1) at least five high quality reads existed; (2) one or fewer
mismatches were found; and (3) the mismatch allele frequency was < 10%, which allows one
mismatch for more than 10 reads and supports the existence of a wild-type allele. For sequencing of
the eight exomes, we included an additional criterion since many UTR regions were not covered by
hybridization capture system. We considered all the known SNPs exist in dbSNP130 are potential
genomic variants in the individual sequenced exomes. Conversely, loci not reported as variants in the
dbSNP130 were considered to be wild-type.
Because most of the errors in transcriptome sequencing would be detected as false positives, we
further filtered the results using conservative filter criteria. First, we filtered out singletons; that is,
those found in only one of the 15 individuals. Candidate loci found in only exome-sequenced
individuals were also removed. Then, we aligned 61-bp long sequences, comprising 30 bp upstream,
the variant allele, and 30 bp downstream of each candidate site, with the Human Reference Genome
Build 36.3 using BLAT. If the sequence matched perfectly to any region, it was removed, since the
modified RNA sequence might be transcribed from the perfectly matched region rather than from the
candidate TBM site.
3.2. Allele-specific expression (ASE)
We explored the expression level of each allele on the heterozygous nsSNPs. Among 28,042 nsSNPs
found in 15 individuals, we identified 4,867 loci where two or more individuals had heterozygous
SNPs and the corresponding gene is active in transcription. We calculated the read counts for wild-
Nature Genetics: doi:10.1038/ng.872
16
type and variant alleles in both genome and transcriptome sequencing results for each individual
containing the heterozygous SNPs. If the read depth for either genome or transcriptome was < 5, the
individual was regarded as non-informative for the given variant, and the corresponding locus was
removed.
Using read-counts for each category, we constructed a 2 x 2 table for each individual in each
heterozygotic locus, as shown below.
Wild type Variant Total
Genome A B A+B
Transcriptome C D C+D
Total A+C B+D A+B+C+D
We carried out a Fisher‟s exact test for testing unbalanced expression. If the resulting p-value was
<0.10, the individual was considered suggestive for allele specific expression at the locus. Loci with
fewer than two suggestive individuals were considered to be non-informative, and were removed.
Preferential expression (PE) was quantified using the following formula:
PE = 1
n( VariantFrequencyindividual ,transcriptome
n1 − VariantFrequencyindividual ,genome ),
where n is the number of suggestive individuals for the locus. If the PE for a locus was between -0.2
and 0.2, the magnitude of allele-specific expression was regarded as insufficient and the locus was
removed.
Finally, we constructed a new 2 x 2 table using read counts from all suggestive individuals for all
suggestive regions, and then repeated Fisher‟s exact tests to obtain a significance level. Applying the
Bonferroni‟s correction method for multiple testing (n = 4,867), we removed loci with p-values >
1.027 x 10-5
. Finally, we determined 580 loci that satisfied the criteria for allele-specific expression.
.
Nature Genetics: doi:10.1038/ng.872
17
Reference
1 Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator
chemistry. Nature 456, 53-59, (2008).
2 Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol 5, e254, (2007).
3 Kent, W. J. BLAT--the BLAST-like alignment tool. Genome Res 12, 656-664, (2002).
4 Lam, H. Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a
breakpoint library. Nat Biotechnol 28, 47-55, (2010).
5 Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27,
573-580, (1999).
6 Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs
in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28-36, (1994).
7 Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res 19,
1117-1123, (2009).
8 Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60-65, (2008).
9 Ju, Y. S. et al. Reference-unbiased copy number variant analysis using CGH microarrays. Nucleic
Acids Res, (2010).
10 Durbin, R. M. et al. A map of human genome variation from population-scale sequencing. Nature
467, 1061-1073, (2010).
11 Schuster, S. C. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463,
943-947, (2010).
12 Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463,
757-762, (2010).
13 Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short
reads. Bioinformatics 26, 873-881, (2010).
14 Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying
mammalian transcriptomes by RNA-Seq. Nat Methods 5, 621-628, (2008).
15 Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc Natl Acad Sci
U S A 100, 9440-9445, (2003).
Nature Genetics: doi:10.1038/ng.872
18
Supplementary Figures
Supplementary Figure 1. Experimental overview of the study. To identify functional genomic
variants, we performed whole-genome sequencing, targeted-exome capture sequencing, and
transcriptome sequencing.
Nature Genetics: doi:10.1038/ng.872
19
Supplementary Figure 2. Sensitivity of SNP detection in population-level
Nature Genetics: doi:10.1038/ng.872
20
Supplementary Figure 3. Description of PRIM2 gene - Example of super nsSNP gene
Nature Genetics: doi:10.1038/ng.872
21
Supplementary Figure 4. Super nsSNP genes compared to Segmental Duplication (SD), Increase
of Read Depth (RD) and Database of Genome Variants (DGV)
Nature Genetics: doi:10.1038/ng.872
22
Supplementary Figure 5. Overview of the strategy for detecting nucleotide-resolution large
deletion breakpoints
Nature Genetics: doi:10.1038/ng.872
23
Supplementary Figure 6. Aligning RNA short reads on cDNA sequence provide better results
(higher expression level) than aligning on reference genome sequence. Gene expression level (genes
in chromosome 1) assessed by cDNA alignment and genome alignment are compared. Genome
alignment loses many short-reads, especially short-reads from splice junctions.
Nature Genetics: doi:10.1038/ng.872
24
Supplementary Figure 7. Validation result of 4 unknown transcripts using PCR and gel
electrophoresis
Nature Genetics: doi:10.1038/ng.872
25
Supplementary Figure 8. Distance between each of unknown transcripts and its nearest gene
Nature Genetics: doi:10.1038/ng.872
26
Supplementary Figure 9. Validation results of 15 TBM sites by Sanger Sequencing of DNA (above)
and RNA (below)
Nature Genetics: doi:10.1038/ng.872
27
Nature Genetics: doi:10.1038/ng.872
28
Supplementary Tables
Supplementary Table 1. Whole Genome Sequencing Statistics (10 individuals)
ID Gender Total Bases Aligned Bases Read length Read Bases Read Bases
1 x 36 519,486,218 18,701,503,848 364,705,892 13,129,412,112
2 x 36 1,646,543,336 59,275,560,096 1,360,491,421 48,977,691,156
2 x 88 123,322,768 10,852,403,584 99,363,440 8,743,982,720
2 x 106 177,416,122 18,806,108,932 79,506,040 8,427,640,240
2 x 25 6,371,995,780 159,299,894,500 1,451,098,162 36,277,454,050
2 x 50 3,390,922,334 169,546,116,700 852,680,528 42,634,026,400
AK3 Male 89,154,943,968 73,664,437,785 2 x 76 1,173,091,368 89,154,943,968 969,340,034 73,664,437,785
2 x 76 444,312,562 33,767,754,712 403,774,458 30,684,050,534
2 x 101 430,032,812 43,433,314,012 350,934,340 35,441,899,710
2 x 36 297,272,572 10,701,812,592 270,105,201 9,723,384,303
2 x 76 1,032,644,200 78,480,959,200 855,072,031 64,980,546,495
2 x 36 55,752,362 2,007,085,032 52,073,655 1,874,563,411
2 x 76 540,079,624 41,046,051,424 486,485,356 36,969,646,182
2 x 101 301,478,526 30,449,331,126 247,079,721 24,952,724,255
AK7 Male 103,902,771,632 87,557,181,002 2 x 76 1,367,141,732 103,902,771,632 1,152,156,028 87,557,181,002
AK9 Male 92,883,089,498 77,220,870,831 2 x 151 615,119,798 92,883,089,498 511,452,344 77,220,870,831
2 x 76 287,229,228 21,829,421,328 262,486,993 19,947,194,343
2 x 101 616,387,950 62,255,182,950 541,867,353 54,722,835,143
2 x 36 320,810,256 11,549,169,216 302,486,715 10,889,034,343
2 x 76 216,223,192 16,432,962,592 194,812,391 14,804,360,307
2 x 101 465,375,420 47,002,917,420 388,626,600 39,247,175,224
* Statistics of AK2 performed by SOLiD platform is based on current SRA010321 data (SRX018824, SRX018829, SRX018830, SRX018821)
** AK2 statics excludes redundant pair for calculation
AK20 Female 74,985,049,228 64,940,569,874
AK14 Female 84,084,604,278 74,670,029,486
AK5 Male 89,182,771,792 74,703,930,798
AK6 Female 73,502,467,582 63,796,933,848
AK2* Female 328,846,011,200 78,911,480,450**
AK4 Female 77,201,068,724 66,125,950,244
Total short read data Aligned short read data
AK1 Male 107,635,576,460 79,278,726,228
Nature Genetics: doi:10.1038/ng.872
29
Supplementary Table 2. WGS SNP Accuracy from validation using Illumina 610K genotyping array
Wildtype Hetero SNP Homo SNP total homo hetero
AK1 Wildtype 281159 10409 1428 0.9997275 0.9938772 0.959271378 0.98891 0.935696
Hetero SNP 63 150036 281
Homo SNP 13 1426 127051
AK2 Wildtype 281212 15498 956 0.9996608 0.991047292 0.943371616 0.992618 0.903771
Hetero SNP 83 144696 1594
Homo SNP 10 860 126957
AK4 Wildtype 281099 7181 1331 0.9996718 0.996173154 0.970826736 0.9896 0.956157
Hetero SNP 83 156453 929
Homo SNP 10 155 125725
AK5 Wildtype 281733 8117 2405 0.9996295 0.995281657 0.963857946 0.981356 0.949936
Hetero SNP 97 153923 1233
Homo SNP 7 91 125360
AK6 Wildtype 281588 10141 2377 0.9996486 0.994877351 0.957024169 0.981505 0.937692
Hetero SNP 91 152428 1240
Homo SNP 7 188 124906
AK7 Wildtype 281215 9027 4191 0.9996589 0.995934463 0.954679485 0.967176 0.944949
Hetero SNP 90 154914 1099
Homo SNP 5 33 122392
AK14 Wildtype 280873 5585 936 0.9996148 0.996034498 0.977666508 0.992774 0.96562
Hetero SNP 100 156749 1016
Homo SNP 10 116 127581
AK20 Wildtype 281511 6923 828 0.999496 0.989956306 0.973392788 0.993618 0.95715
Hetero SNP 134 154497 2703
Homo SNP 9 145 126216
Definition Wildtype a b c (e+f+h+i)/(d+e+f+g+h+i)
Hetero SNP d e f (e+i)/(e+f+h+i)
Homo SNP g h i (e+f+h+i)/(b+c+e+f+h+i)
(f+i)/(c+f+i)
(e+h)/(b+e+h)
PPV
Genotype accuracy
Sensitivity_total
Sensitivity_homo
Sensitivity_hetero
Illumina 610k sensitivityPPVindividual Whole Genome Sequencing
Genotype
accuracy
Nature Genetics: doi:10.1038/ng.872
30
Supplementary Table 3. Primer List used for validation of SNPs, Indels, Novel Transcripts and
RNA ModificationGS SNP Accuracy from validation using Illumina 610K genotyping array
SuppTable3_Validation_Primer_List.xls
Depicted below is a preview of the full version.
individual position F primer R primer
SNP01 AK3 chr1:61693622 ACGTAGATCCTGATTTCGTGGT TGACCATAATGCTTGCTGTTTC
SNP02 AK3 chr10:88920229 TTGTACATGTTTTAGAGAAAGCAAA TTGAAGGTGCTCCAATTCTACA
SNP03 AK3 chr15:49853231 AGGAATCTCGGTTGGATATGAA TCAGGACTAACCTGCAAGATCA
SNP04 AK3 chr21:42786256 ATCCAGTCAAGTCAACGGTTCT CAACTTTAAGGTGGGAAAGGTG
SNP05 AK3 chr3:101756793 CAGATTCTGGCAATGAAATGTCT TTTTGGAACCAAGATAGCAGGT
SNP06 AK3 chr5:66091330 ACAGTGTTGGGAGAATGGAGTT GCAGGACCTTGTAAAGAAATGC
SNP07 AK5 chr1:215871347 GGAGTGAACAAAAAGTCGAACC CGCTGAGGAACAACTGGTATAA
SNP08 AK5 chr19:61165270 TGAAATGAGAAACCTCGTGATG CAGTAGCTTTTGCAGTTTGCAC
SNP09 AK5 chr3:39518603 TTTCCTGATGTGCGTTTATGTC TCCAGTCCTCCTCTTTCTTCTG
SNP10 AK5 chr5:172045840 CTCTCGTTAGACGGGAAAGCTA ATTTCCTTACCCAGGGATGACT
SNP11 AK7 chr11:6609168 AGTGAGGGTGCAGAGAGAAAAG TAGGTGAGTTGTGTTCCCACAG
SNP12 AK7 chr15:31665551 CCCTCAGGAAGGACTGTTCATA AGGGAGATGATCGACTTGATGT
SNP13 AK7 chr22:40510576 CATCTTCATTGTCTCCCATCCT CTTACCTGCACCAGGAGGTTCT
SNP14 AK9 chr1:110034661 GCCTTCTGCAGATCACTTTTGT AGGACTGGGAAAACATCTGAAA
SNP15 AK9 chr7:48320193 CTGTCTGGAAAGTGTGATCAGG TTGTAAATGAAAGCTCGCACAT
SNP16 AK9 chr10:5425823 CCAGTTGGACACCAATCTACAA TCTGTGTCAGACATCACCACTG
SNP17 AK9 chr22:37807148 AAGGTTACCCTGACCATCTTTG CCCATCACAGACACTTAAGCAG
SNP18 AK20 chr12:8705937 TGCTATTCCTCTTCGACCTCAT TCTGAGCATCACATTCTCTGCT
SNP19 AK20 chr17:31987125 TGCACAAGGATAGGAACCAGTA GTTACAGACAGGACTCCCTTGG
SNP20 AK20 chr6:30654121 CCCTGATCTTCAAGTTGGATTC GGATGTTTTTCTTCCTCCTCCT
SNP21 AK20 chr7:134269011 GAGAAAGAATTAAAGCCGAGCA CCTTAGCCTTCTCTTCCTCCTC
SNP22 AK4 chr11:133543212 TGAGCTTCCTGCATGACTACAT TTGTGGGAATTACACCTCCTCT
SNP23 AK4 chr17:7621663 TGGTGACGATAGAAATTCATGC TCCTTGCATTTACAGAACATGG
SNP24 AK4 chr3:11276768 TGCTTTCCACTTGATATTGTGC TTCATGTGCAACCCAGATACAT
SNP25 AK4 chr6:431719 TGGACTTCTTTTATGTGGCAGA GGGCTGTAAAACAAGTGTCTCC
SNP26 AK6 chr14:63553000 TGCTTTGTTGGGTATTGTTTTT ACATCAAGCCATCTATCCACAA
SNP27 AK6 chr18:31080233 GTGGCAAAACTTTCAAAAGGAG GCACAAAGCAAGCTAGACTCAA
SNP28 AK6 chr2:43846079 AGAGAAGGGAATTCTGGTAGCC ATGGAGGACAAGGAGTGAATGT
SNP29 AK6 chr6:127679300 TGATTATCCTCTACGGCACAAA GCATTATACCTTTGTGTTTCTGCTT
SNP30 AK6 chrX:131040253 GCTATGTGGACTTGTCCTTTCC TCTAAGCTCCTTCCAAACAAGC
SNP31 AK14 chr1:198644404 GTATGGTTTGATGACCCAGGTT AGAGAAGCCATTTGGATGTGAT
SNP32 AK14 chr5:65386257 AATCTTGGTGATCCAGGCTCTA CTTGATGCATTTGGACCATCTA
SNP33 AK14 chr5:95250409 CTGGTCCAAGCAGAGTTCTAGG TCTGACCTGTGGTTGAAAAATG
Indel01 AK3 chr7:80141323 AAAAGGGTGATAGGCAATTGAA TGGCCTAATATGTAACTTCTCTTTG
Indel02 AK5 chr19:52466505 ACTTGCCTGTGTCCCCAAAG CTCCACCTCTTCACCCCAAT
Indel03 AK5 chr20:74155 AAGTGTCTAAACGACGTTGGAA TCGAAGCAGTAGTCATCATCAAA
Indel04 AK5 chr3:191588765 CAGGGCGTGAGAAAAAGTAAAA TACCTCCAGAGAGTCATCAGCA
Indel05 AK7 chr1:8638908 TGTTGTCATTGTCCTCGTCTTC TCCCTGGAATTGAGTGAGAAAT
Indel06 AK7 chr11:55518142 CAGAGTCCCCACAATATTCACA TCCAGGACCGTCTACCTAAAAA
Indel07 AK7 chr21:44882040 AATCAGGCTACACCAGCTCCT AGACGGACTTAGAGCAGACAGG
Indel08 AK7 chr5:98220064 TTCCGACTACTCCAGGTATGCT TCATCGGTTACATTCAGACCAC
Indel09 AK7 chr6:159580769 AAGCCAATTTTGAGTCTTGGAG GATCCCATTGGAGCTCATTATC
Indel10 AK9 chr1:33068543 GAGGGATCTAAGCACGTTTACAA AATGCTGAAGGTAACAGGAGAAA
Nature Genetics: doi:10.1038/ng.872
31
Supplementary Table 4. Indel list of 10 individuals extracted by whole genome sequencing
SuppTable4_Indel_Table_1_20.txt
Depicted below is a preview of the full version.
Chromosometype position position_alternativesize allele AK1 AK3 AK5 AK7 AK9 AK2 AK4 AK6 AK14 AK20 total sample_countmax_total
chr1 del 228 228 27 - 0 0 0 0 0 0 1 1 2 1 5 4 20
chr1 del 328 328 24 - 0 0 1 0 0 0 0 0 0 0 1 1 20
chr1 ins 353 353 1 A 0 0 0 0 0 0 0 0 1 0 1 1 20
chr1 del 39377 39377 1 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 42096 42096 2 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 43001 43001 2 - 0 0 0 0 0 0 0 0 1 0 1 1 20
chr1 del 44575 44575 16 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 51213 51213 1 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 ins 51224 51224 1 A 0 0 0 0 0 0 0 0 1 0 1 1 20
chr1 del 53598 53601 3 - 0 0 1 1 1 0 1 0 1 1 6 6 20
chr1 del 56023 56023 6 - 0 0 0 0 0 0 0 0 1 0 1 1 20
chr1 del 62003 62022 4 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 63705 63705 1 - 0 0 0 1 0 0 0 0 0 1 2 2 20
chr1 del 71453 71453 1 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 71996 71996 3 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 73692 73692 20 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 88784 88784 1 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 ins 94048 94048 8 CACACACA0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 220913 220913 2 - 0 0 0 0 0 0 0 1 1 0 2 2 20
chr1 del 223548 223551 1 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 233713 233713 4 - 0 0 0 1 1 0 0 1 0 0 3 3 20
chr1 del 235119 235123 1 - 0 0 0 1 0 0 0 0 0 0 1 1 20
chr1 ins 239148 239148 1 T 0 1 0 0 0 0 0 0 0 0 1 1 20
chr1 del 241490 241491 1 - 0 0 0 0 1 0 2 1 2 0 6 4 20
chr1 ins 245787 245787 2 CT 0 0 0 0 0 0 0 1 1 0 2 2 20
chr1 ins 245792 245792 2 TG 0 0 0 0 0 0 0 0 0 1 1 1 20
chr1 del 333889 333889 3 - 0 0 0 0 0 0 0 0 1 0 1 1 20
chr1 del 530208 530210 5 - 0 1 0 0 0 0 0 0 0 0 1 1 20
chr1 del 536003 536003 5 - 0 0 0 0 0 1 0 0 0 0 1 1 20
chr1 ins 537496 537496 2 GT 0 0 0 0 0 2 0 0 0 0 2 1 20
chr1 del 537701 537701 2 - 0 0 0 0 0 2 0 0 0 0 2 1 20
chr1 del 537719 537719 2 - 0 0 0 0 0 1 0 0 0 0 1 1 20
chr1 del 557102 557102 1 - 2 0 0 0 0 0 0 0 0 0 2 1 20
chr1 del 557879 557884 1 - 0 0 0 0 0 0 0 2 0 0 2 1 20
chr1 del 602551 602555 3 - 0 0 0 0 1 0 0 0 0 1 2 2 20
chr1 ins 670110 670110 6 GTGTGT 0 0 0 0 0 0 0 0 1 0 1 1 20
chr1 ins 703655 703655 3 GCT 0 0 0 0 2 0 0 0 0 0 2 1 20
chr1 ins 710936 710936 2 GC 0 0 0 0 0 1 0 0 0 0 1 1 20
chr1 del 713661 713666 2 - 0 2 0 1 2 1 1 1 1 1 10 8 20
chr1 del 714067 714067 5 - 0 0 0 0 0 1 0 0 0 0 1 1 20
chr1 del 714811 714829 5 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 ins 715418 715418 5 GGAAT 0 0 0 0 0 0 0 0 1 0 1 1 20
chr1 del 716096 716096 5 - 0 0 0 0 0 1 0 0 0 0 1 1 20
chr1 del 716176 716176 10 - 0 0 0 0 1 0 0 0 0 0 1 1 20
chr1 del 716176 716176 15 - 0 0 0 0 0 0 1 0 1 0 2 2 20
chr1 del 716888 716888 5 - 0 0 0 0 0 1 0 0 0 0 1 1 20
chr1 ins 724802 724803 1 T 0 0 0 0 1 0 1 1 0 0 3 3 20
chr1 del 735233 735234 1 - 0 0 0 0 0 0 0 0 0 1 1 1 20
chr1 ins 739834 739834 2 AA 0 0 0 0 2 0 0 0 0 1 3 2 20
Nature Genetics: doi:10.1038/ng.872
32
Supplementary Table 5. Exome Sequencing Statistics
Individuals Gender Read length reads bases Aligned reads Aligned bases aligned coverage
AK_N1 Male 2 x 78 65,740,784 5,127,781,152 62,465,112 4,872,092,077 62.7
AK_N2 Male 2 x 78 67,124,814 5,235,735,492 63,544,668 4,956,295,050 63.8
AK_N5 Male 2 x 78 68,444,468 5,338,668,504 64,553,546 5,034,976,940 64.8
AK_N6 Male 2 x 78 66,995,750 5,225,668,500 62,752,082 4,894,465,964 63.0
AK_N7 Male 2 x 78 66,884,224 5,216,969,472 62,973,127 4,911,704,660 63.2
AK_N9 Female 2 x 78 70,142,332 5,471,101,896 63,828,875 4,978,499,064 64.1
AK_N14 Female 2 x 78 71,856,586 5,604,813,708 64,188,537 5,006,552,917 64.5
AK_N15 Male 2 x 78 71,876,294 5,606,350,932 64,543,996 5,034,263,798 64.8
Nature Genetics: doi:10.1038/ng.872
33
Supplementary Table 6. Non-synonymous SNP list detected from 18 individuals (10 whole genome sequencing, 8 whole exome sequencing)
SuppTable6_nsSNP_from_18individuals.xls
Depicted below is a preview of the full version.
chr pos ref allele AK1 AK3 AK5 AK7 AK9 AK2 AK4 AK6 AK14AK20AK_N1AK_N2AK_N5AK_N6AK_N7AK_N15AK_N9AK_N14annotations ref_aa snp_aa blosum nssnp
chr1 59374 A G 0 2 2 0 1 0 2 0 2 2 2 2 2 2 2 2 2 2 CDS:OR4F5 T A 0 nsSNP
chr1 855557 C T 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 CDS:SAMD11 H Y 2 nsSNP
chr1 867694 T C 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CDS:SAMD11 W R -3 nsSNP
chr1 878522 T C 2 2 2 0 0 2 2 0 2 2 2 2 2 2 2 2 2 2 CDS:NOC2L I V 3 nsSNP
chr1 879101 G A 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 CDS:NOC2L A V 0 nsSNP
chr1 891991 C T 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CDS:PLEKHN1 A V 0 nsSNP
chr1 895986 G C 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 CDS:PLEKHN1 A P -1 nsSNP
chr1 899101 G C 2 2 1 2 0 2 0 0 2 0 2 2 2 2 2 2 2 2 CDS:PLEKHN1 R P -2 nsSNP
chr1 899172 T C 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CDS:PLEKHN1 S P -1 nsSNP
chr1 925085 C A 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 CDS:HES4::Intron:HES4 "T"R S -1 nsSNP
chr1 939471 G A 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 CDS:ISG15 S N 1 nsSNP
chr1 966461 C T 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 CDS:AGRN T I -1 nsSNP
chr1 970994 A G 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 CDS:AGRN Q R 1 nsSNP
chr1 979070 G C 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 CDS:AGRN S T 1 nsSNP
chr1 1110294 G A 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 CDS:TTLL10 S N 1 nsSNP
chr1 1122801 G A 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 CDS:TTLL10 G D -1 nsSNP
chr1 1212130 G C 2 0 0 0 0 1 0 0 0 1 2 0 0 0 1 0 0 0 CDS:SCNN1D R P -2 nsSNP
chr1 1213248 G C 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 CDS:SCNN1D E Q 2 nsSNP
chr1 1252564 G A 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 CDS:GLTPD1 R K 2 nsSNP
Nature Genetics: doi:10.1038/ng.872
34
Supplementary Table 7. Trait-O-Matic results on nsSNP of 18 individuals
SuppTable7_TraitOMatic_18individuals.xls
Depicted below is a preview of the full version.
Coordinates Genotype
Gene, amino acid change Trait-associated allele
chr1:9246497 G/A
H6PD, R453Q A
chr1:25589952 C/G
RHCE, P226A C
chr1:46643348 C/A
FAAH, P129T A
chr1:65809029 G/G
LEPR, K109R G
chr1:65831101 G/G
LEPR, Q223R G
chr1:167963570 G/A
SELE, H468Y A
chr1:194925860 C/T
CFH, Y402H C
chr1:194925860 C/T
CFH, Y402H C
chr1:205173101 A/A
PIGR, A580V A
chr1:224086256 T/C
EPHX1, Y113H C
chr1:224086256 T/C
EPHX1, Y113H C
chr1:224086256 T/C
EPHX1, Y113H C
chr1:224086256 T/C
EPHX1, Y113H C
chr2:108880033 G/G
EDAR, V370A G
chr2:230758959 G/G
SP110, L425S G
chr3:46374212 G/A
CCR2, V64I A
chr4:2876505 T/T
ADD1, G460W T
chr4:100458342 T/C
ADH1B, R48H T
chr4:100458342 T/C
ADH1B, R48H T
chr4:100479812 T/C
ADH1C, I350V C
chr4:100482988 C/T
ADH1C, R272Q T
chr4:102970099 G/A
BANK1, R61H A
chr5:7923973 A/G
MTRR, I22M G
chr5:7923973 A/G
MTRR, I22M G
Associated trait
CORTISONE REDUCTASE DEFICIENCY
RH E/e POLYMORPHISM
DRUG ADDICTION, SUSCEPTIBILITY TO
LEPTIN RECEPTOR POLYMORPHISM
LEPTIN RECEPTOR POLYMORPHISM
IgA NEPHROPATHY, SUSCEPTIBILITY TO
BASAL LAMINAR DRUSEN, INCLUDED
MACULAR DEGENERATION, AGE-RELATED, 4, SUSCEPTIBILITY TO
IgA NEPHROPATHY, SUSCEPTIBILITY TO
EMPHYSEMA, SUSCEPTIBILITY TO, INCLUDED
LYMPHOPROLIFERATIVE DISORDERS, SUSCEPTIBILITY TO
PREECLAMPSIA, SUSCEPTIBILITY TO, INCLUDED
PULMONARY DISEASE, CHRONIC OBSTRUCTIVE, SUSCEPTIBILITY TO, INCLUDED
HAIR MORPHOLOGY 1, HAIR THICKNESS
MYCOBACTERIUM TUBERCULOSIS, SUSCEPTIBILITY TO
HUMAN IMMUNODEFICIENCY VIRUS TYPE 1, RESISTANCE TO
HYPERTENSION, SALT-SENSITIVE ESSENTIAL, SUSCEPTIBILITY TO
AERODIGESTIVE TRACT CANCER, SQUAMOUS CELL, ALCOHOL-RELATED,
PROTECTION AGAINST; INCLUDED
ALCOHOL DEPENDENCE, PROTECTION AGAINST
ALCOHOL DEPENDENCE, PROTECTION AGAINST
ALCOHOL DEPENDENCE, PROTECTION AGAINST
SYSTEMIC LUPUS ERYTHMATOSUS, ASSOCIATION WITH
DOWN SYNDROME, SUSCEPTIBILITY TO, INCLUDED
NEURAL TUBE DEFECTS, FOLATE-SENSITIVE, SUSCEPTIBILITY TO
Nature Genetics: doi:10.1038/ng.872
35
Supplementary Table 8. Super nsSNP gene list
SuppTable8_Super_nsSNP_Gene_List.xls
Depicted below is a preview of the full version.
# nsSNPdensity
(/Kb)# nsSNP
density
(/Kb)# nsSNP
density
(/Kb)# nsSNP
density
(/Kb)# nsSNP
density
(/Kb)
ZNF717 NM_001128223 3 75868718 75915203 5 2749 75 27.28 85 30.92 67 24.37 79 28.74 75 27.28
OR4C3 NM_001004702 11 48303068 48304058 1 991 16.5 16.65 17 17.15 14 14.13 18 18.16 11 11.1
CDC27 NM_001256 17 42553299 42621537 19 2494 40.6 16.28 65 26.06 31 12.43 59 23.66 31 12.43
FRG2C NM_001124759 3 75796220 75797882 4 853 13.5 15.83 9 10.55 12 14.07 16 18.76 14 16.41
OR4C45 NM_001005513 11 48323475 48330575 2 921 13.7 14.88 15 16.29 16 17.37 18 19.54 15 16.29
OR9G9 NM_001013358 11 56224439 56225357 1 919 12.9 14.04 16 17.41 9 9.79 7 7.62 14 15.23
FAM104B NM_001166702 X 55189241 55204143 3 342 4.6 13.45 4 11.7 4 11.7 4 11.7 4 11.7
PRIM2 NM_000947 6 57291202 57620661 15 1543 17.7 11.47 15 9.72 18 11.67 19 12.31 19 12.31
HLA-DRB1 NM_002124 6 32654845 32665497 6 807 8.5 10.53 6 7.43 8 9.91 9 11.15 5 6.2
HLA-DPA1 NM_033554 6 33144404 33149325 5 787 7.8 9.91 8 10.17 8 10.17 9 11.44 5 6.35
CTBP2 NM_001329 10 126668076 126717613 11 1347 12.6 9.35 30 22.27 9 6.68 11 8.17 11 8.17
SEC22B NM_004892 1 143807903 143827246 5 653 6 9.19 7 10.72 6 9.19 6 9.19 7 10.72
HLA-DQB1 NM_002123 6 32735990 32742362 5 791 7.1 8.98 4 5.06 8 10.11 9 11.38 10 12.64
OR13C5 NM_001004482 9 106400558 106401515 1 958 8.4 8.77 5 5.22 9 9.39 8 8.35 4 4.18
KCNJ12 NM_021012 17 21259247 21260549 3 1303 11.2 8.6 13 9.98 9 6.91 13 9.98 10 7.67
MUC4 NM_138297 3 196959717 197023085 23 3401 28.4 8.35 34 10 2 0.59 30 8.82 22 6.47
TAS2R31 NM_176885 12 11074271 11075201 1 931 7.3 7.84 6 6.44 7 7.52 7 7.52 7 7.52
HLA-DQA1 NM_002122 6 32713213 32718519 5 772 6 7.77 5 6.48 2 2.59 8 10.36 6 7.77
OR51Q1 NM_001004757 11 5400006 5400960 1 955 7.4 7.75 7 7.33 7 7.33 7 7.33 8 8.38
HLA-A NM_002116 6 30018309 30021211 8 1106 8.4 7.59 16 14.47 6 5.42 13 11.75 7 6.33
Gene Code Chr
Start
of first exon
(bp)
Stop
of last exon
(bp)
Number
of exons
Total
exon
length
(bp)
AK1Average AK3 AK5 AK7
Nature Genetics: doi:10.1038/ng.872
36
Supplementary Table 9. List of Korean common novel nsSNP LD
SuppTable9_KoreanCommonNovel_nsSNP_LD.xls
Depicted below is a preview of the full version.
chr pos ref_allele var_allele frequency annotation ref_aa snp_aa blosum rsSNP_maxLD r2
chr1 9755922 A T 2//20 CDS:CLSTN1 F Y 3 rs77601527 1
chr1 11694516 G A 2//20 CDS:C1orf187 G R -2 rs77681396 1
chr1 11949647 A G 2//20 CDS:PLOD1 Y C -2 rs116892868 0.44444
chr1 12260475 G A 2//20 CDS:VPS13D D N 1 rs7545503 0.44444
chr1 12760062 C G 3//20 CDS:PRAMEF12 N K 0 rs80177200 1
chr1 12776677 T A 8//20 CDS:PRAMEF1 L stop -4 rs1063776 1
chr1 12776775 T C 2//20 CDS:PRAMEF1 C R -3 rs1613050 1
chr1 12777066 C G 4//20 CDS:PRAMEF1 R G -2 rs1063774 1
chr1 12778353 C T 2//20 CDS:PRAMEF1 A V 0 rs74850310 1
chr1 12778480 T G 2//20 CDS:PRAMEF1 I M 1 rs848426 0.86538
chr1 12810958 T G 2//20 CDS:PRAMEF11 Q H 0 rs1769772 0.64286
chr1 12810984 C T 4//20 CDS:PRAMEF11 D N 1 rs2076063 0.64286
chr1 12811002 C T 5//20 CDS:PRAMEF11 A T 0 rs1736809 0.42857
chr1 12829871 T C 7//20 CDS:LOC649330,HNRNPCL1T A 0 rs12745844 0.64286
chr1 12829872 G C 7//20 CDS:LOC649330,HNRNPCL1S R -1 rs12745844 0.64286
chr1 12829903 T C 7//20 CDS:LOC649330,HNRNPCL1E G -2 rs12745844 0.64286
chr1 12830036 C G 2//20 CDS:LOC649330,HNRNPCL1D H -1 rs1630264 1
chr1 12830083 T C 2//20 CDS:LOC649330,HNRNPCL1K R 2 rs1630264 1
chr1 12830120 T C 2//20 CDS:LOC649330,HNRNPCL1I V 3 rs1630264 1
chr1 12830385 A C 5//20 CDS:LOC649330,HNRNPCL1F L 0 rs61777008 0.60494
chr1 12830389 C T 6//20 CDS:LOC649330,HNRNPCL1G D -1 rs1737113 0.64286
chr1 12830390 C T 6//20 CDS:LOC649330,HNRNPCL1G S 0 rs1737113 0.64286
chr1 12842229 G A 2//20 CDS:PRAMEF2 A T 0 rs61781252 1
chr1 12843952 A C 2//20 CDS:PRAMEF2 S R -1 rs116865587 1
chr1 12843954 C A 2//20 CDS:PRAMEF2 S R -1 rs116865587 1
chr1 12843969 T G 2//20 CDS:PRAMEF2 I M 1 rs58112782 1
chr1 12862091 A C 2//20 CDS:PRAMEF4 F C -2 rs3928864 1
chr1 13105825 T C 2//20 CDS:LOC440563 E G -2 rs113741404 1
chr1 13106059 G A 5//20 CDS:LOC440563 P L -3 rs113259710 1
chr1 13106115 A C 6//20 CDS:LOC440563 F L 0 rs28434299 0.53552
chr1 13106119 C T 5//20 CDS:LOC440563 G D -1 rs78443402 0.66667
chr1 13106120 C T 5//20 CDS:LOC440563 G S 0 rs78443402 0.66667
chr1 16773608 A G 2//20 CDS:NBPF1 S P -1 rs58145953 0.58333
chr1 16786264 T C 4//20 CDS:NBPF1 S G 0 rs598052 0.84848
chr1 17439736 G A 2//20 CDS:PADI1 R H 0 rs4363467 1
chr1 46911224 C T 2//20 CDS:C1orf223 R W -3 rs12562113 0.69262
chr1 47375775 C T 2//20 CDS:CYP4A22 R C -3 rs2224622 1
chr1 52078667 T A 2//20 CDS:NRD1 E V -2 rs117346555 0.58333
chr1 64247576 T G 2//20 CDS:ROR1 S A 1 rs1341511 0.25
chr1 89221866 C T 2//20 CDS:RBMXL1::Intron:CCBL2 "T"A T 0 rs77567101 1
chr1 89221886 C G 3//20 CDS:RBMXL1::Intron:CCBL2 "T"G A 0 rs112636230 1
chr1 89297183 A G 2//20 CDS:GBP1 L P -3 rs12125301 0.76563
chr1 111830738 G A 2//20 CDS:ADORA3 R C -3 rs1415793 0.47368
Nature Genetics: doi:10.1038/ng.872
37
Supplementary Table 10. Total 5,496 deletion list of 8 individuals including breakpoints information
SuppTable10_Large_Deletion_List.xls
Depicted below is a preview of the full version.
whole-gene CDS UTR intron promoter(<1kb)
LargeDeletion_1 AK3 chr1 1261 534631 535891 N/S N/S N/S - - - - - VNTR
LargeDeletion_1 AK5 chr1 1261 534631 535891 N/S N/S N/S - - - - - VNTR
LargeDeletion_1 AK7 chr1 1261 534631 535891 N/S N/S N/S - - - - - VNTR
LargeDeletion_1 AK9 chr1 1261 534631 535891 N/S N/S N/S - - - - - VNTR
LargeDeletion_1 AK4 chr1 1261 534631 535891 N/S N/S N/S - - - - - VNTR
LargeDeletion_1 AK6 chr1 1261 534631 535891 N/S N/S N/S - - - - - VNTR
LargeDeletion_1 AK14 chr1 1261 534631 535891 N/S N/S N/S - - - - - VNTR
LargeDeletion_1 AK20 chr1 1261 534631 535891 N/S N/S N/S - - - - - VNTR
LargeDeletion_2 AK3 chr1 1051 859231 860281 N/S N/S N/S - - - SAMD11 - VNTR
LargeDeletion_2 AK5 chr1 1051 859231 860281 859230 859817 1 - - - SAMD11 - VNTR
LargeDeletion_2 AK7 chr1 1051 859231 860281 859248 859835 1 - - - SAMD11 - VNTR
LargeDeletion_2 AK9 chr1 1051 859231 860281 N/S N/S N/S - - - SAMD11 - VNTR
LargeDeletion_2 AK4 chr1 1051 859231 860281 N/S N/S N/S - - - SAMD11 - VNTR
LargeDeletion_2 AK6 chr1 1051 859231 860281 859553 860196 1 - - - SAMD11 - VNTR
LargeDeletion_2 AK14 chr1 1051 859231 860281 859721 860196 2 - - - SAMD11 - VNTR
LargeDeletion_2 AK20 chr1 1051 859231 860281 N/S N/S N/S - - - SAMD11 - VNTR
LargeDeletion_3 AK3 chr1 1591 955171 956761 N/S N/S N/S - - - AGRN - VNTR
LargeDeletion_3 AK5 chr1 1591 955171 956761 N/S N/S N/S - - - AGRN - VNTR
LargeDeletion_3 AK7 chr1 1591 955171 956761 955260 955805 2 - - - AGRN - VNTR
LargeDeletion_3 AK9 chr1 1591 955171 956761 955142 956475 1 - - - AGRN - VNTR
LargeDeletion_3 AK4 chr1 1591 955171 956761 N/S N/S N/S - - - AGRN - VNTR
LargeDeletion_3 AK6 chr1 1591 955171 956761 955072 956133 1 - - - AGRN - VNTR
LargeDeletion_3 AK14 chr1 1591 955171 956761 N/S N/S N/S - - - AGRN - VNTR
LargeDeletion_3 AK20 chr1 1591 955171 956761 955018 956460 1 - - - AGRN - VNTR
LargeDeletion_4 AK3 chr1 1231 1064821 1066051 1065396 1065944 1 - - - - - VNTR
LargeDeletion_4 AK5 chr1 1231 1064821 1066051 1064327 1065607 1 - - - - - VNTR
LargeDeletion_4 AK7 chr1 1231 1064821 1066051 N/S N/S N/S - - - - - VNTR
stopbreakpoint
start
breakpoint
stop#split_reads
Gene Annotation inferred
mechanismindex individual chr size start
Nature Genetics: doi:10.1038/ng.872
38
Supplementary Table 11. Validation of large deletions using 24M CGH array data
IndividualTotal
deletions
Subject regions
(≥5 array probes)
Validated regions
(p-value*<0.05)
Validated regions
(p-value*<0.01)
Accuracy
(p-value<0.05)
Accuracy
(p-value<0.01)
AK4 674 330 272 255 82.42% 77.27%
AK6 693 331 285 271 86.10% 81.87%
AK14 700 318 262 245 82.39% 77.04%
AK20 683 318 268 245 84.28% 77.04%
Total 2,750 1,297 1,087 1,016 83.81% 78.33%
* Wilcoxon Rank Sum Test using R statistics
Nature Genetics: doi:10.1038/ng.872
39
Supplementary Table 12. Breakpoint list of NA10851 deletions (CNV Loss)
SuppTable12_NA10851_Deletion(CNV)_Breakpoints.xls
Depicted below is a preview of the full version.
cnv_id chr cnv_typecnv_size cnv_start cnv_end gap_size gap_start gap_end
NA10851_BP_1 chr1 LOSS 3085 2042821 2045905 987 2042811 2043797
NA10851_BP_2 chr1 LOSS 606 7847539 7848144 310 7847572 7847881
NA10851_BP_3 chr1 LOSS 82381 13219599 13301979 162082 13220037 13382118
NA10851_BP_4 chr1 LOSS 3646 54864917 54868562 3693 54864862 54868554
NA10851_BP_5 chr1 LOSS 901 58516499 58517399 913 58516498 58517410
NA10851_BP_6 chr1 LOSS 1188 59878725 59879912 851 59878834 59879684
NA10851_BP_7 chr1 LOSS 1088 61855369 61856456 844 61855446 61856289
NA10851_BP_8 chr1 LOSS 763 67780751 67781513 856 67780549 67781404
NA10851_BP_9 chr1 LOSS 468 72222105 72222572 412 72222209 72222620
NA10851_BP_10 chr1 LOSS 45875 72538815 72584689 45516 72538912 72584427
NA10851_BP_11 chr1 LOSS 2410 79993560 79995969 1248 79994369 79995616
NA10851_BP_12 chr1 LOSS 2433 89248781 89251213 2716 89248503 89251218
NA10851_BP_13 chr1 LOSS 3298 94060875 94064172 2884 94060962 94063845
NA10851_BP_14 chr1 LOSS 585 104244674 104245258 526 104244723 104245248
NA10851_BP_15 chr1 LOSS 1065 104470632 104471696 439 104470723 104471161
NA10851_BP_16 chr1 LOSS 827 105469153 105469979 912 105469073 105469984
NA10851_BP_17 chr1 LOSS 4341 108534537 108538877 3927 108534849 108538775
NA10851_BP_18 chr1 LOSS 12815 112493555 112506369 12913 112493315 112506227
NA10851_BP_19 chr1 LOSS 34690 150822121 150856810 32198 150822167 150854364
NA10851_BP_20 chr1 LOSS 3337 157134152 157137488 2451 157134158 157136608
NA10851_BP_21 chr1 LOSS 964 157915302 157916265 950 157915332 157916281
NA10851_BP_22 chr1 LOSS 1246 167270761 167272006 871 167270985 167271855
NA10851_BP_23 chr1 LOSS 4730 173338098 173342827 6154 173337124 173343277
NA10851_BP_24 chr1 LOSS 767 186806093 186806859 775 186806079 186806853
NA10851_BP_25 chr1 LOSS 2158 190183033 190185190 2109 190183069 190185177
NA10851_BP_26 chr1 LOSS 1715 197040280 197041994 1665 197040373 197042037
NA10851_BP_27 chr1 LOSS 1521 201330618 201332138 844 201330970 201331813
NA10851_BP_28 chr1 LOSS 3580 205608618 205612197 3601 205608592 205612192
NA10851_BP_29 chr1 LOSS 8527 208144233 208152759 7922 208144679 208152600
NA10851_BP_30 chr1 LOSS 2736 208789026 208791761 2420 208789084 208791503
NA10851_BP_31 chr1 LOSS 6837 220440446 220447282 6374 220440789 220447162
NA10851_BP_32 chr1 LOSS 495 223740078 223740572 331 223740095 223740425
NA10851_BP_33 chr1 LOSS 1696 234985539 234987234 1610 234985557 234987166
NA10851_BP_34 chr1 LOSS 1116 241849373 241850488 1018 241849369 241850386
NA10851_BP_35 chr1 LOSS 2642 244204375 244207016 1560 244204866 244206425
Nature Genetics: doi:10.1038/ng.872
40
Supplementary Table 13. Motif on flanking regions of NHEJ large deletions
Nature Genetics: doi:10.1038/ng.872
41
Supplementary Table 14. RNA Sequencing Statistics
<8AKs Transcriptome Seq. Run1>
Read length reads bases Aligned reads Aligned bases Coverage
AK3 2 x 101 51,601,132 5,211,714,332 28,345,352 2,862,410,015 36.9
AK4 2 x 101 51,143,140 5,165,457,140 27,636,273 2,790,800,158 35.9
AK5 2 x 101 52,630,854 5,315,716,254 28,317,447 2,859,606,849 36.8
AK6 2 x 101 51,324,294 5,183,753,761 26,733,252 2,699,610,066 34.8
AK7 2 x 101 51,032,760 5,154,308,760 27,011,183 2,727,650,181 35.1
AK14 2 x 101 52,862,496 5,339,112,096 26,717,457 2,698,007,226 34.7
AK20 2 x 101 51,586,996 5,210,286,596 27,373,384 2,764,293,063 35.6
<8AK_Ns Transcriptome Seq. Run1>
Read length reads bases Aligned reads Aligned bases Coverage
AK_N1 2 x 78 77,298,628 6,029,292,984 59,347,587 4,606,250,876 59.3
AK_N2 2 x 78 82,591,894 6,442,167,732 63,596,426 4,940,288,891 63.6
AK_N5 2 x 78 90,704,150 7,074,923,700 64,991,714 5,059,103,355 65.1
AK_N6 2 x 78 86,655,342 6,759,116,676 61,578,609 4,797,048,436 61.8
AK_N7 2 x 78 87,381,024 6,815,719,872 61,237,205 4,775,180,801 61.5
AK_N9 2 x 78 81,470,204 6,354,675,912 61,697,228 4,790,953,276 61.7
AK_N14 2 x 78 88,447,086 6,898,872,708 67,369,082 5,228,014,109 67.3
AK_N15 2 x 78 78,300,992 6,107,477,376 61,677,972 4,787,852,561 61.6
Nature Genetics: doi:10.1038/ng.872
42
Supplementary Table 15. Alignments of RNA short-reads on cDNA sequence decreases
misalignment of short-reads on pseudogenes.
Chr. Gene RPKM_GenomeBased RPKM_cDNABased
1 SUMO1P3 19.8012 0.4284
1 HSP90B3P 44.8018 0.0409
1 TOP1P1 11.3555 0.5103
1 AURKAPS1 12.0098 0.3908
3 PA2G4P4 77.8963 0.3407
6 LYPLA2P1 20.3936 0.4135
7 CLK2P 11.374 0
7 RPL23P8 239.9053 0.1288
8 RNF5P1 30.8248 0.3391
8 NACAP1 34.0562 1.2033
8 PTTG3P 29.723 0
9 PTENP1 13.8665 1.1005
9 ANXA2P2 107.289 1.462
10 PIPSL 37.2138 0.0875
11 CSNK2A1P 45.2763 1.8908
12 NME2P1 360.4471 0.0972
13 ATP5EP2 251.3407 0.0897
16 UBE2MP1 50.9908 0.2103
16 CSDAP1 11.0325 0
Nature Genetics: doi:10.1038/ng.872
43
Supplementary Table 16. Expression map represented in RPKM value for all Refseq genes
SuppTable16_Gene_Expression_Map_in_RPKM.xls
Depicted below is a preview of the full version.
AK3 AK5 AK7 AK_N1 AK_N2 AK_N5 AK_N6 AK_N7 AK_N15 AK4 AK6 AK14 AK20 AK_N9 AK_N14
1 29751 LOC100288778 4225 7502 1220 NR_028269 10.54964 12.185 14.002 14.091 10.209 8.219 5.938 7.566 11.588 7.651 14.359 9.887 11.921 11.827 9.663 9.138
1 19700 WASH5P 4225 19233 1769 NR_024540 10.25641 12.048 13.978 14.1 10.128 7.507 5.65 6.886 10.883 7.547 13.713 9.297 11.317 11.607 10.063 9.125
1 2924 FAM138F 24475 25944 1129 NR_026820 0.003707 0 0 0 0 0.036 0 0.019 0 0 0 0 0 0 0 0
1 18978 FAM138A 24475 25944 1129 NR_026818 0.003707 0 0 0 0 0.036 0 0.019 0 0 0 0 0 0 0 0
1 27700 FAM138C 24475 25944 1129 NR_026822 0.003707 0 0 0 0 0.036 0 0.019 0 0 0 0 0 0 0 0
1 33158 OR4F5 58954 59871 918 NM_0010054840.014827 0.041 0 0.077 0 0 0 0 0 0 0 0 0.059 0 0.045 0
1 29456 LOC100132062 313755 318443 4369 NR_028325 2.633713 4.93 3.59 4.024 2.811 1.914 1.522 1.832 1.704 1.547 3.974 2.052 2.756 2.447 2.915 1.489
1 29775 LOC100133331 313755 318443 4272 NR_028327 2.693387 5.041 3.671 4.116 2.874 1.958 1.557 1.874 1.743 1.582 4.064 2.099 2.819 2.502 2.981 1.522
1 29784 LOC100132287 313755 318443 4369 NR_028322 2.633713 4.93 3.59 4.024 2.811 1.914 1.522 1.832 1.704 1.547 3.974 2.052 2.756 2.447 2.915 1.489
1 23957 OR4F29 357522 358460 939 NM_001005221 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 28748 OR4F3 357522 358460 939 NM_001005224 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 33188 OR4F16 357522 358460 939 NM_001005277 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 31657 MIR1977 556051 556128 78 NR_031741 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 13392 OR4F3 610959 611897 939 NM_001005224 0.00568 0 0.04 0 0.045 0 0 0 0 0 0 0 0 0 0 0
1 23958 OR4F29 610959 611897 939 NM_001005221 0.00568 0 0.04 0 0.045 0 0 0 0 0 0 0 0 0 0 0
1 33189 OR4F16 610959 611897 939 NM_001005277 0.00568 0 0.04 0 0.045 0 0 0 0 0 0 0 0 0 0 0
1 29782 LOC100133331 651003 655594 4272 NR_028327 2.89888 5.379 3.886 4.029 3.307 1.912 1.558 1.886 1.985 1.898 4.063 3.323 3.074 2.684 2.786 1.715
1 26608 NCRNA00115 751450 752765 1316 NR_024321 1.847667 1.928 2.428 1.245 2.615 2.005 2.675 1.418 1.37 2.024 2.014 2.132 1.042 1.692 1.703 1.425
1 22531 LOC643837 752927 779603 1543 NR_015368 5.284347 4.936 6.378 7.498 6.553 4.802 5.208 4.593 4.171 4.376 5.572 5.385 4.794 6.085 4.865 4.049
1 638 FAM41C 793320 802045 1700 NR_027055 1.025213 1.293 1.152 1.332 0.702 0.664 1.045 0.79 0.487 0.514 2.244 1.408 1.296 1.18 0.695 0.577
1 27737 FLJ39609 842818 844680 494 NR_026874 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 12523 SAMD11 850984 869824 2554 NM_152486 4.23398 5.999 5.519 6.475 3.633 2.51 1.994 3.122 4.118 2.685 4.37 4.854 5.73 6.015 2.931 3.557
1 17488 NOC2L 869446 884542 2800 NM_015658 56.4704 81.353 76.418 80.267 40.936 37.365 28.795 44.552 43.75 38.736 63.486 65.319 71.363 79.816 50.294 44.607
1 3040 KLHL17 885830 890958 2560 NM_198317 1.945547 2.619 1.797 1.753 2.343 1.742 1.03 2.025 1.794 1.605 1.983 1.935 1.983 2.277 2.246 2.051
1 11766 PLEKHN1 891740 900345 2398 NM_032129 0.248513 0.163 0.09 0.297 0.295 0.411 0.253 0.253 0.158 0.438 0.182 0.377 0.193 0.165 0.2 0.253
1 31739 PLEKHN1 891740 900345 2293 NM_0011601840.255713 0.176 0.094 0.286 0.312 0.378 0.251 0.269 0.163 0.454 0.194 0.391 0.212 0.19 0.214 0.253
1 28805 C1orf170 900442 907336 3040 NR_027693 0.058 0.022 0.044 0.117 0.084 0.055 0.014 0.133 0.051 0.08 0.045 0.048 0 0.069 0.074 0.034
1 21904 HES4 924206 925415 961 NM_021170 0.778307 0.901 0.554 0.946 1.414 0.557 0.506 1.116 0.626 1.367 0.944 1.195 0.271 0.731 0.41 0.138
1 29552 HES4 924208 925415 1037 NM_0011424670.742247 0.845 0.514 0.881 1.345 0.516 0.481 1.077 0.617 1.317 0.943 1.107 0.28 0.691 0.393 0.128
1 8392 ISG15 938710 939782 666 NM_005101 268.2438 186.28 419.374 380.118 259.442 99.085 136.799 241.431 326.256 185.151 190.792 427.151 289.287 314.469 441.4 126.623
1 8191 AGRN 945366 981355 7319 NM_198576 10.80492 8.352 12.267 12.034 16.947 14.3 7.813 11.017 7.653 14.786 10.835 12.316 8.24 9.068 9.438 7.007
1 5344 C1orf159 1007061 1041599 2104 NM_017891 3.982073 7.995 6.259 5.951 3.214 2.325 1.067 2.779 2.76 2.074 4.427 4.863 4.849 5.54 2.981 2.647
accession ave_rpkmGene Expression (RPKM)
chr gene_id gene start stop size
Nature Genetics: doi:10.1038/ng.872
44
Supplementary Table 17. List of Unknown transcripts
SuppTable17_Unknown_Transcripts.xls
Depicted below is a preview of the full version.
index individuals chr size(bp) start stop# of
individualsnearest_gene distance(bp)
NovelTranscript_1 AK3,AK_N5,AK7,AK_N7,AK_N1,AK_N2,AK_N6,AK_N14,AK6,AK_N15,AK20,AK4,AK5,AK_N9chr1 268 1333087 1333354 14 MRPL20 531
NovelTranscript_2 AK_N14,AK_N5,AK_N7,AK_N1chr1 761 1501253 1502013 4 SSU72 1128
NovelTranscript_3 AK6,AK_N5 chr1 138 2464125 2464262 2 LOC115110 6956
NovelTranscript_4 AK_N1,AK_N7 chr1 95 2477222 2477316 2 TNFRSF14 1834
NovelTranscript_5 AK_N14,AK_N9 chr1 345 2501115 2501459 2 C1orf93 6649
NovelTranscript_6 AK_N5,AK_N2 chr1 199 2501622 2501820 2 C1orf93 6288
NovelTranscript_7 AK_N15,AK_N6 chr1 227 2506009 2506235 2 C1orf93 1873
NovelTranscript_8 AK3,AK_N7,AK4,AK20,AK_N14,AK_N1,AK_N6,AK5,AK6,AK_N2,AK14,AK_N9,AK7,AK_N5,AK_N15chr1 166 3641312 3641477 15 KIAA0495 930
NovelTranscript_9 AK4,AK14 chr1 188 4210231 4210418 2 LOC284661 161552
NovelTranscript_10 AK_N2,AK_N5,AK_N15,AK_N9chr1 735 7897021 7897755 4 TNFRSF9 4738
NovelTranscript_11 AK_N2,AK6 chr1 256 7897756 7898011 2 TNFRSF9 4482
NovelTranscript_12 AK_N2,AK14,AK6,AK_N6,AK_N15,AK_N5,AK_N7,AK_N9,AK4,AK_N14,AK20,AK5chr1 337 7899895 7900231 12 TNFRSF9 2262
NovelTranscript_13 AK_N5,AK3 chr1 515 7943265 7943779 2 PARK7 521
NovelTranscript_14 AK4,AK6,AK_N5,AK_N6,AK14,AK_N1,AK_N7,AK7,AK_N15,AK_N2,AK_N14,AK20chr1 531 9018450 9018980 12 SLC2A5 613
NovelTranscript_15 AK_N14,AK_N2 chr1 217 9019361 9019577 2 SLC2A5 16
NovelTranscript_16 AK_N6,AK_N14,AK_N7,AK_N9,AK_N5,AK_N15,AK5chr1 1011 9970337 9971347 7 NMNAT1 2194
NovelTranscript_17 AK_N7,AK_N15 chr1 326 9972031 9972356 2 NMNAT1 3888
NovelTranscript_18 AK_N14,AK_N1,AK4,AK_N7,AK_N15chr1 485 10435206 10435690 5 APITD1 409
NovelTranscript_19 AK_N15,AK_N14 chr1 141 10437528 10437668 2 APITD1 2731
NovelTranscript_20 AK_N7,AK_N9,AK14chr1 333 11044140 11044472 3 SRM 1462
NovelTranscript_21 AK_N5,AK_N7 chr1 254 11047958 11048211 2 EXOSC10 1051
NovelTranscript_22 AK_N5,AK14 chr1 125 11048682 11048806 2 EXOSC10 456
NovelTranscript_23 AK_N2,AK20,AK7,AK_N5,AK3,AK6,AK_N14,AK5,AK_N9,AK_N7,AK14,AK_N6,AK_N15chr1 746 11278696 11279441 13 UBIAD1 7619
NovelTranscript_24 AK_N7,AK_N14,AK_N5,AK14,AK4,AK_N2,AK_N6,AK_N9,AK6chr1 426 11279945 11280370 9 UBIAD1 8868
NovelTranscript_25 AK_N2,AK_N9,AK7,AK_N15chr1 681 12037286 12037966 4 TNFRSF8 8054
NovelTranscript_26 AK_N5,AK_N6,AK_N2,AK_N1chr1 807 12038458 12039264 4 TNFRSF8 6756
NovelTranscript_27 AK_N1,AK_N9,AK_N14,AK_N6,AK_N5,AK_N7,AK_N15chr1 211 12045808 12046018 7 TNFRSF8 2
NovelTranscript_28 AK4,AK_N2,AK_N1,AK_N5,AK_N15,AK_N7chr1 430 12126853 12127282 6 TNFRSF8 2
NovelTranscript_29 AK_N2,AK_N5,AK_N6,AK_N9,AK_N15,AK_N14chr1 1050 12127341 12128390 6 TNFRSF8 490
NovelTranscript_30 AK_N6,AK_N5,AK_N2,AK_N7chr1 1151 12128542 12129692 4 TNFRSF8 1691
NovelTranscript_31 AK_N5,AK_N2 chr1 440 12129747 12130186 2 TNFRSF8 2896
NovelTranscript_32 AK_N1,AK_N2 chr1 401 12131940 12132340 2 TNFRSF8 5089
NovelTranscript_33 AK_N2,AK_N6 chr1 179 12134081 12134259 2 TNFRSF8 7230
NovelTranscript_34 AK_N6,AK_N5,AK_N7,AK_N15,AK_N2,AK_N9,AK_N1chr1 654 13897046 13897699 7 PRDM2 1622
NovelTranscript_35 AK_N14,AK_N9,AK_N2,AK_N6chr1 193 13898365 13898557 4 PRDM2 764
Nature Genetics: doi:10.1038/ng.872
45
Supplementary Table 18. 23 Genes escape X-inactivation
chr gene start stop ave_rpkm male_rpkm female_rpkm pvalue qvalue previous report*
X PRKX 3532384 3641675 9.560273 8.15844444 11.66283333 0.0027 0.01924878 9/9
X HDHD1A 6976961 7076231 22.92633 18.3414444 29.804 0.0018 0.01668228 8/9
X PNPLA4 7826804 7855475 3.217813 2.287 4.614166667 0.0018 0.01668228 9/9
X MSL3 11686199 11703791 35.78468 29.5732222 45.10183333 0.0027 0.01924878 3/9
X TMSB4X 12903147 12905267 6894.269 6306.36333 7776.126667 0.0392 0.1513762 not analyzed
X TRAPPC2 13640282 13662675 9.250967 7.61022222 11.71216667 0.0018 0.01668228 9/9
X GEMIN8 13934766 13957956 6.921387 6.29111111 7.866833333 0.0018 0.01668228 9/9
X CA5BP 15602960 15631395 5.98988 4.71577778 7.901166667 0.0027 0.01924878 9/9
X ZRSR2 15718495 15751303 10.67819 9.06488889 13.09816667 0.0018 0.01668228 not analyzed
X SYAP1 16647628 16690727 17.53308 16.3571111 19.29716667 0.0392 0.1513762 9/9
X CXorf15 16714476 16772561 10.94431 9.56222222 13.0175 0.0113 0.06981842 9/9
X EIF1AX 20052557 20069887 45.26513 37.8303333 56.41733333 0.0018 0.01668228 9/9
X EIF2S3 23982986 24006851 133.5755 123.438667 148.781 0.0216 0.1177573 9/9
X GPR34 41433170 41441474 1.32278 1.01244444 1.788666667 0.0216 0.1177573 0/9
X CDK16 46962472 46974336 17.27303 15.7872222 19.502 0.0292 0.1503465 7/7
X KDM5C 53237229 53271329 23.22696 17.8846667 31.24066667 0.0018 0.01668228 9/9
X SMC1A 53417795 53466343 32.10438 27.8975556 38.41466667 0.008 0.05295961 7/9
X FOXO4 70232724 70240109 1.87224 1.74288889 2.0665 0.0392 0.1513762 3/9
X RPS4X 71409178 71413866 1781.525 1419.05078 2325.235333 0.0018 0.01668228 9/9
X TSIX 72928765 72965791 3.771327 0.175 9.166 0.0018 0.01668228 not analyzed
X XIST 72957220 72989313 12.37481 0.12877778 30.74416667 0.0018 0.01668228 9/9
X NGFRAP1 102517924 102519657 3.75198 2.45233333 5.701666667 0.0392 0.1513762 0/9
X ALG13 110811002 110820279 5.858393 5.22677778 6.805833333 0.0392 0.1513762 5/9
* Carrel, L & Willard, H.F. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature (2005)
The value reflects fraction of hybrids expressing genes in inactivated X chromosomes in the previous paper (out of 9 (or less) tested).
Nature Genetics: doi:10.1038/ng.872
46
Supplementary Table 19. 1,809 TBM sites
SuppTable19_TBM_List.xls
Depicted below is a preview of the full version.
Chr position wt snp rna_wt rna_snp rna_var% dna_wt dna_RD rna_A rna_C rna_G rna_T is_dbsnp
chr1 5636 T C 8 7 0.4667 0 11 0 7 0 8 rs2691318
chr1 5636 T C 5 6 0.5455 0 26 0 6 0 5 rs2691318
chr1 5636 T C 4 6 0.6 1 17 0 6 0 4 rs2691318
chr1 6837 C T 50 21 0.2958 0 26 0 50 0 21 rs1045474
chr1 6837 C T 59 18 0.2338 1 30 0 59 0 18 rs1045474
chr1 6837 C T 44 11 0.2 0 23 0 44 0 11 rs1045474
chr1 8231 T C 2 14 0.875 0 7 0 14 0 2 rs4849248
chr1 8231 T C 5 4 0.4444 0 9 0 4 0 5 rs4849248
chr1 8231 T C 2 14 0.875 1 25 0 14 0 2 rs4849248
chr1 8231 T C 4 13 0.7647 0 12 0 13 0 4 rs4849248
chr1 8231 T C 2 9 0.8182 0 13 0 9 0 2 rs4849248
utr_gene utr_strand allele_change individual is_validated_by_2ndRNAseq
||uc009viv.1,uc009viw.1,uc009vix.1 -,-,- AG AK5 yes
||uc009viv.1,uc009viw.1,uc009vix.1 -,-,- AG AK7 yes
||uc009viv.1,uc009viw.1,uc009vix.1 -,-,- AG AK14 yes
||LOC100288778,WASH7P,uc009vit.1,uc009viu.1,uc001aae.2,uc001aab.2,uc009viq.1,uc009vir.1,uc001aac.2,uc009viv.1,uc009viw.1,ENSG00000146556,uc001aaf.1,uc009vix.1,uc001aag.1,uc009viy.1,uc009viz.1,uc009vjc.1,uc009vjd.1,uc001aai.1,uc001aah.2,uc009vja.1,uc009vjb.1-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,- GA AK20 yes
||LOC100288778,WASH7P,uc009vit.1,uc009viu.1,uc001aae.2,uc001aab.2,uc009viq.1,uc009vir.1,uc001aac.2,uc009viv.1,uc009viw.1,ENSG00000146556,uc001aaf.1,uc009vix.1,uc001aag.1,uc009viy.1,uc009viz.1,uc009vjc.1,uc009vjd.1,uc001aai.1,uc001aah.2,uc009vja.1,uc009vjb.1-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,- GA AK4 yes
||LOC100288778,WASH7P,uc009vit.1,uc009viu.1,uc001aae.2,uc001aab.2,uc009viq.1,uc009vir.1,uc001aac.2,uc009viv.1,uc009viw.1,ENSG00000146556,uc001aaf.1,uc009vix.1,uc001aag.1,uc009viy.1,uc009viz.1,uc009vjc.1,uc009vjd.1,uc001aai.1,uc001aah.2,uc009vja.1,uc009vjb.1-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,- GA AK6 yes
||uc009viu.1,uc001aab.2,uc001aac.2,uc009viz.1,uc001aai.1,uc009vja.1,uc009vjb.1 -,-,-,-,-,-,- AG AK3 yes
||uc009viu.1,uc001aab.2,uc001aac.2,uc009viz.1,uc001aai.1,uc009vja.1,uc009vjb.1 -,-,-,-,-,-,- AG AK5 yes
||uc009viu.1,uc001aab.2,uc001aac.2,uc009viz.1,uc001aai.1,uc009vja.1,uc009vjb.1 -,-,-,-,-,-,- AG AK7 yes
||uc009viu.1,uc001aab.2,uc001aac.2,uc009viz.1,uc001aai.1,uc009vja.1,uc009vjb.1 -,-,-,-,-,-,- AG AK20 yes
||uc009viu.1,uc001aab.2,uc001aac.2,uc009viz.1,uc001aai.1,uc009vja.1,uc009vjb.1 -,-,-,-,-,-,- AG AK6 yes
Nature Genetics: doi:10.1038/ng.872
47
Supplementary Table 20. 580 Allele Specific Expression sites
SuppTable20_Allele_Specific_Expression_Sites.xls
Depicted below is a preview of the full version.
AK3 AK5 AK7 AK4 AK6 AK14 AK20 AK_N1 AK_N2 AK_N5 AK_N6 AK_N7 AK_N9 AK_N14 AK_N15
1 1411854 G A rs860213 ATAD3B R Q 1 nsSNP 1:0 1:5 3:0 5:5 1:1 4:1 0:3 51:18 45:22 32:12 51:15 37:34 49:0 46:0 18:30
1 1421028 C T rs72468211ATAD3B P S -1 nsSNP 3:3 3:3 3:4 0:4 3:1 6:6 2:6 10:15 5:2 8:7 7:2 9:4 8:4 9:4 11:4
1 1640647 T C rs17845218CDK11B,CDK11AH R 0 nsSNP 20:40 26:22 34:29 21:28 30:13 33:18 18:22 1:0 0:0 0:0 0:0 0:1 2:1 3:0 0:0
1 1640657 A G rs1059830 CDK11B,CDK11AC R -3 nsSNP 18:38 27:22 29:23 23:24 29:10 37:17 19:23 1:0 0:0 0:0 0:0 0:1 2:1 2:0 0:0
1 7832324 C T rs2890565 UTS2 S N 1 nsSNP 31:0 6:13 31:0 21:8 15:0 23:0 18:0 65:0 64:0 0:62 69:0 30:32 42:24 86:0 77:0
1 7836017 G A rs228648 UTS2 T M -1 nsSNP 9:17 0:33 22:16 16:11 33:0 13:15 16:11 56:62 55:72 138:0 60:61 110:0 52:35 56:49 113:0
1 9755922 A T novel CLSTN1 F Y 3 nsSNP 22:0 10:14 27:0 17:0 11:0 12:0 7:0 31:0 36:0 19:32 20:16 31:0 11:14 39:0 37:0
1 11773514 C T rs2274976 MTHFR R Q 1 nsSNP 13:0 18:0 19:0 12:0 9:4 6:9 13:0 118:0 105:0 64:62 101:0 102:0 96:0 52:47 84:0
1 19285848 T A rs12584 UBR4 M L 2 nsSNP 0:37 18:21 0:32 0:14 0:19 6:6 0:14 34:18 0:53 33:24 28:33 0:63 27:17 0:46 0:45
chr pos wt snp dbsnp annotation wtAA snpAA blosum is_nssnpGenome Read-Counts
AK3 AK5 AK7 AK4 AK6 AK14 AK20 AK_N1 AK_N2 AK_N5 AK_N6 AK_N7 AK_N9 AK_N14 AK_N15
0::61 0::49 48::0 0::29 0::40 21::15 0::57 10::11 14::18 5::5 22::21 0::29 23::25 26::0 7::28 6 6 5 0.298 3.61E-10
52::20 64::11 78::11 29::8 46::7 47::6 37::13 27::11 16::5 6::5 28::11 28::8 38::14 33::9 15::10 7 7 3 -0.387 1.83E-06
24::96 31::72 26::86 13::49 19::64 15::61 14::61 19::31 7::34 13::24 28::35 19::34 7::52 15::48 9::29 7 7 7 0.289 2.72E-20
48::100 48::78 47::93 36::64 52::68 37::71 46::59 37::38 11::43 27::31 49::51 34::40 20::56 51::48 32::29 7 7 5 0.231 5.39E-10
0::0 0::0 0::0 0::0 28::0 21::0 14::0 0::0 31::0 0::57 43::0 0::12 7::24 0::0 32::0 4 2 2 0.4 2.56E-06
0::0 0::0 0::0 0::0 20::0 3::0 17::0 0::0 15::0 40::0 16::8 8::1 8::4 1::0 20::0 10 4 3 -0.381 4.78E-08
51::0 32::24 27::0 57::0 38::0 48::0 38::0 73::0 36::1 19::16 22::12 55::0 45::0 61::0 42::0 4 4 3 -0.306 1.69E-06
13::0 17::0 11::0 18::0 0::8 7::7 11::0 19::0 23::0 6::10 20::1 13::0 29::1 5::23 16::0 4 4 2 0.406 9.57E-06
0::113 53::65 0::102 0::131 0::99 64::57 0::114 66::74 1::144 58::74 52::45 2::98 0::129 0::98 0::124 6 6 4 0.217 4.03E-08
Transcriptome Read-counts# of heterozygote individuals# of informative individuals# of suggestive individualsAS_score FisherPvalue
Nature Genetics: doi:10.1038/ng.872
48
Supplementary Table 21. Contig list generated by de novo assembly
SuppTable21_Denovo_Contigs_List.xls
Depicted below is a preview of the full version.
Nature Genetics: doi:10.1038/ng.872
49
Supplementary Table 22. Alignment result of de novo assemble contigs
SuppTable22_Denovo_Alignment_Result.xls
Depicted below is a preview of the full version.
AK3 AK4 AK5 AK6 AK7 AK9 AK14 AK20 AK3 AK4 AK5 AK6 AK7 AK14 AK20
# of reads 220 277 261 245 270 63 239 228 0 0 0 0 0 0 0
# of bases 16720 23927 16996 20220 20520 9513 22014 16188 0 0 0 0 0 0 0
# of paired matches 116 190 138 160 130 36 158 162 0 0 0 0 0 0 0
# of exact matched reads 220 277 261 245 270 63 239 228 0 0 0 0 0 0 0
# of exact matched bases 16720 23927 16996 20220 20520 9513 22014 16188 0 0 0 0 0 0 0
coverage of exact matches 16.33 23.37 16.6 19.75 20.04 9.29 21.5 15.81 0 0 0 0 0 0 0
# of exact paired-end matched reads(exact) 116 190 138 160 130 36 158 162 0 0 0 0 0 0 0
# of exact paired-end matched bases(exact) 8816 16490 8648 13110 9880 5436 14558 11352 0 0 0 0 0 0 0
coverage of exact paried-end matches 8.61 16.1 8.45 12.8 9.65 5.31 14.22 11.09 0 0 0 0 0 0 0
Whole genome Transcriptome
855418
Contig ID Category
AK3 AK4 AK5 AK6 AK7 AK9 AK14 AK20 NA10851 NA12878 NA18507 NA19240 ABT KB1 Eskimo YH
225 224 232 240 269 60 122 194 580 1959 188 201 0 166 102 4906
17100 18699 15312 19275 20444 9060 10797 14159 27070 70524 6768 7236 0 11056 7156 158178
190 194 206 198 244 56 102 178 364 186 0 0 0 100 0 202
169 198 181 221 184 45 111 178 449 199 0 1 0 103 78 656
12844 16423 11636 17671 13984 6795 9736 12733 20414 7164 0 36 0 7828 5476 23214
12.54 16.04 11.36 17.26 13.66 6.64 9.51 12.43 19.94 7 0 0.04 0 7.64 5.35 22.67
118 162 130 170 132 32 84 156 272 142 0 0 0 70 0 146
8968 13562 8040 13680 10032 4832 7284 10906 11972 5112 0 0 0 5320 0 5110
8.76 13.24 7.85 13.36 9.8 4.72 7.11 10.65 11.69 4.99 0 0 0 5.2 0 4.99
Nomatch paired-end reads Publicly opened human genome
Nature Genetics: doi:10.1038/ng.872