analysis of exon arrays slides provided by dr. yi xing
TRANSCRIPT
![Page 1: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/1.jpg)
Analysis of Exon Arrays
Slides provided by Dr. Yi Xing
![Page 2: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/2.jpg)
Outline
– Design of exon arrays– Background correction – Probe selection, expression index
computation– Evaluation of gene level index– Exon level analysis– Conclusion
![Page 3: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/3.jpg)
1. Basic design of Exon Array
3’ Arrays Exon Arrays
1 gene --- 1 or 2 probesets 1 gene --- many probesets
Probes from 600 bps near 3’ end Probes from each putative exon
Probeset has 11 PM, 11 MM probes Probeset has 4 PM probes
54,000 probesets 1.4 Million probesets, 6 M features
Average16 probes per RefSeq gene Average147 probes per RefSeq gene
![Page 4: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/4.jpg)
Exon Array Probesets Classified by Annotational Confidence
Core 21%
Extended38%
Full41%
• Core probesets target exons supported by RefSeq mRNAs.
• Extended probesets target
exons supported by ESTs or partial mRNAs.
• Full probesets target exons supported purely by computational predictions.
![Page 5: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/5.jpg)
2. Background modeling: predict non-specific hybridization from probe
sequence
• Wu and Irizarry (2005) use probe effect modeling to obtain more accurate expression index on 3’ arrays
• Johnson et al (2006) use probe effect modeling to detect ChIP peaks for Tiling arrays
• Kapur et al (2007) use probe effect modeling to correct background for Exon array
![Page 6: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/6.jpg)
Background modeling in Exon Arrays• logBi = α*niT + ∑ βjk Iijk + ∑ γk nik
2+ εi
• Estimate parameters from either– Background probes (n = 37,687)– Full probes (n = 400,000)
• test on a different array (with single scaling constant)
Train on Background Probes, Test on Background Probes R2
Train/Test Cerebellum Heart Liver Cerebellum 0.64 0.67 Heart 0.64 0.65 Liver 0.66 0.64 Train on Full Probes, Test on Background Probes R2
Train/Test Cerebellum Heart Liver Cerebellum 0.61 0.63 Heart 0.61 0.63 Liver 0.64 0.63
• Full probes useful for modeling background
![Page 7: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/7.jpg)
Array stem cell mPromoter R2
exon array R2
H9-38-3B 0.60644 0.632071
H9-38-3C 0.597005 0.623045
H9-38-3CM_8 0.589289 0.603118
H9-38-7B 0.580949 0.596331
H9-39-7B 0.542581 0.555235
H9-41-7B 0.603742 0.631153
H9-43-3B 0.612422 0.634044
H9-43-7B 0.594246 0.61426
Promoter array may be used to train exon array background
![Page 8: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/8.jpg)
Preliminary conclusions
• Background correction based on background probe effect modeling can greatly reduce background noise
• Model parameters are similar for different ChIP-DNA samples, or for different RNA samples, but not across DNA and RNA.
• The data may be rich enough to support learning of more complex models with even better predictive power.
![Page 9: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/9.jpg)
3. Probe selection and expression index computation
![Page 10: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/10.jpg)
Probes
Samples
Core probes
Gene-level visualization: Heatmap of Intensities
major histocompatibility complex,
class II, DM beta
![Page 11: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/11.jpg)
Heatmap of Pairwise Correlations
Probes
Probes
HLA_DMB
![Page 12: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/12.jpg)
First observations
• Heapmap of correlations is a useful complement to heatmap of intensities
• Core probes have higher intensity than extended and full probes
![Page 13: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/13.jpg)
Probe selection for gene-level expression
• Most full and extended probes are not suitable for estimating gene-level expression– Probes may target false exon predictions
• Even some core probes may not be suitable– Bad probes with low affinity, or cross-hybridize
– Probes targeting differentially spliced exons
• Probe selection– Selecting a suitably large subset of good probes targeting
constitutively spliced regions of the gene
– Use only to selected probes to estimate gene expression
![Page 14: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/14.jpg)
_____________ ________________________ _____________ constitutive alternatively spliced constitutive
Heatmap of CD44 core probes (Ordered By Genomic Locations)
![Page 15: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/15.jpg)
ataxin 2-binding protein 1
![Page 16: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/16.jpg)
These examples motivated our Probe Selection Strategy
• Probe selection procedure (on core probes)– Hierarchical clustering of the probe intensities across 11 tissues
(33 samples), and cut the tree at various heights (0.1,0.2,…1.0).– Choose a height cutoff to strike a balance between the size of the
largest sub-group and the correlation within the sub-group.– Iteratively remove probes if they do not correlate well with current
expression index– At least 11 core probes need to be chosen.
– If the total number of core probes is less than 11 for the entire
transcript cluster, we skip probe selection.
(Xing Y, Kapur K, Wong WH. PLoS ONE. 2006 20;1:e88)
![Page 17: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/17.jpg)
Hierarchical Clustering of CD44 Core Probes (distance=1-corr, average linkage)
h=0.144 (42%) probes
![Page 18: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/18.jpg)
Computation of gene level expression index
Background correction
Normalization
Probe selection
Computation of Overall Gene Expression Indexes
GeneBASE: Gene-level Background Adjusted Selected probe ExpressionDownload: http://biogibbs.stanford.edu/~kkapur/GeneBASE/Xing, Kapur, Wong, PLoS ONE, 1:e88, 2006 Kapur, Xing, Wong, Genome Biology, 8:R82, 2007
(linear scaling or none)
(dChip type model)
Gene level quantile normalization
optional
![Page 19: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/19.jpg)
In most cases selection does not affect fold changes
![Page 20: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/20.jpg)
spectrin, beta, non-erythrocytic 4 (SPTBN4)
Sometimes, selections change fold-change significantly
BetaIV spectrins are essential for membrane stability and the molecular organization of nodes of Ranvier along neuronal axons
![Page 21: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/21.jpg)
4. Evaluations of gene level index
![Page 22: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/22.jpg)
Before selection
Aft
er s
elec
tio
n
Fold-change of liver over muscle, in 438 genes with high fold-change in 3’ expression array data
1st evaluation: tissue fold change
![Page 23: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/23.jpg)
Before selection
Aft
er s
elec
tio
n
Probe selection allows more sensitive detection of fold-changes
Zoom-in
![Page 24: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/24.jpg)
Before selection
Aft
er s
elec
tio
n
FC of muscle over liver, in 500 genes detected to be overexpressed in muscle over liver by 3’ array
![Page 25: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/25.jpg)
Before selection
Aft
er s
elec
tio
n
Zoom-in
FC of muscle over liver
![Page 26: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/26.jpg)
2nd evaluation: Presence/Absence calls
• Use SAGE data to construct gold-standard • Presence in tissue if 100 tags per million• Absence if no tags in given tissue but >100 tpm
in at least another tissue
• Exon array A/P calls: use sum of z-scores for core probes (z-score is computed based on background model)
![Page 27: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/27.jpg)
(a)
(b)
(c)Cerebellum
Heart
Kidney
ROC curves shows that background correction improves A/P calls.
Red: Exon, Z-score callBlue: Exon Affy callBrown: 3’ Affy call, max probesetPurple: 3’ Affy call, min probe set
![Page 28: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/28.jpg)
3rd evaluation: Cross-species conservation
• 3’ and Exon array data for six adult tissues in both human and mouse
• Expression computed for about 10,000 pairs of human-mouse ortholog pairs
![Page 29: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/29.jpg)
3’ arrays Exon arrays
Similarity of gene expression profiles in six human tissues and six corresponding mouse tissues. For each ortholog pair we calculated the Pearson correlation coefficient (PCC) of expression indexes across six tissues (solid line). We also permutated ortholog relationships and calculated the PCC for random human-mouse gene pairs (dashed line).
(Xing Y, Ouyang Z, Kapur K, Scott MP, Wong WH. Mol Biol Evol. April 2007)
![Page 30: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/30.jpg)
3’ arrays correlations Exon arrays correlations
3’ arrays scatter plot Exon arrays scatter plot
Exon arrays also reveal conservation of absolute abundance of transcripts in individual tissues!
![Page 31: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/31.jpg)
4th evaluation: q-PCR
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2 -1 0 1 2
On log scale, exon array fold change estimate is correlated with qPCR fold change (corr = 0.9)
![Page 32: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/32.jpg)
5. Issues in exon level analysis
![Page 33: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/33.jpg)
Challenges
• The experimental validation rate in several published exon array studies are highly variable. – Gardina et al. BMC Genomics 7:325, 21%– Kwan et al. Genome Res 17:1210, 45%– Hung et al. RNA 14:284, 22%-56%– Clark et al. Genome Biol 8:R64, 84%.
• Most exons are targeted by no more than four probes. No probes for splice junctions.
• Noise in observed probe intensities (due to background, cross-hybridization) can make the inferred splicing pattern unreliable.
![Page 34: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/34.jpg)
MADS: Microarray Analysis of Differential Splicing
1. Correction for background (non-specific hybridization)
2. Probe selection and expression index calculation
4. Detection of differential splicing 3. Correction for cross-hybridization
1. Kapur, Xing, Wong, Genome Biology, 8:R82, 20072. Xing, Kapur, Wong WH. PLoS ONE. 2006 20;1:e883. Xing et.al., 2008, RNA, 2008, 14(8): 1470-1479
logPM i TnT jkI jkk {A,C,G}
j1
25
knk2
k {A,C,G}
i
![Page 35: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/35.jpg)
Splicing Index: Corrected Probe Intensity
Estimated Gene Expression Level
![Page 36: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/36.jpg)
Analysis of “gold-standard” alternative splicing data via PTB knockdown experiments
• Our “gold-standard” - a list of exons with pre-determined inclusion/exclusion profiles in response to PTB depletion (Boutz P, et.al. Genes Dev. 2007, 21(13):1636-52.)
• We used shRNA to knock-down PTB, generated Exon array data, and analyzed data on “gold-standard” exons.
• MADS detected all exons with large changes (>25%) in transcript inclusion levels, and offered improvement over Affymetrix’s analysis procedure.
Collaboration with Douglas Black (UCLA)
Boutz P, et.al. Genes Dev. 2007, 21(13):1636-52.
![Page 37: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/37.jpg)
MADS sensitivity correlates with the magnitude of change in exon inclusion
levels of “gold-standard exons”
Xing et.al., 2008, RNA, 2008, 14(8): 1470-1479
![Page 38: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/38.jpg)
Exon array detection of novel PTB-dependent splicing events
control
shRNA knockdown of splicing repressor PTB
![Page 39: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/39.jpg)
Detection of alternative 3’-UTR and Poly-A sites of Ncam1
30 differentially spliced exons were tested; 27 were validated.
Validation rate: 27/30=90%
![Page 40: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/40.jpg)
Cross-Hybridization
• Probes are designed to hybridize to their target transcripts
• Often probes have 0,1,2,3 base pair mismatches to non-target transcripts
• Cross-hyb seriously complicates exon-level analysis.
![Page 41: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/41.jpg)
Mapping mismatches to probes
• 6,000,000 probes• Each 25bp long• 3,000,000,000bp genome sequence• For 1-bp mismatch, a naïve search needs O(6M
x 3G x 25) ~ years of CPU time• Fast matching algorithm (by Hui Jiang) makes
this feasible in hours
![Page 42: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/42.jpg)
Distribution of Number of Cross-hyb Transcripts
0 Trans. 1 Trans. 2 Trans. 3 Trans. ≥ 4 Trans.
0 bp 97.05 2.14 0.40 0.18 0.23
0-1 bp 96.14 2.60 0.59 0.27 0.40
0-2 bp 95.59 2.79 0.69 0.33 0.60
0-3 bp 92.36 5.06 1.12 0.48 0.98
0-4 bp 80.50 13.37 3.10 1.09 1.93
Full Probes
0 Trans. 1 Trans. 2 Trans. 3 Trans. ≥ 4 Trans.
0 bp 99.52 0.40 0.05 0.01 0.02
0-1 bp 99.21 0.62 0.10 0.03 0.03
0-2 bp 98.90 0.84 0.15 0.05 0.06
0-3 bp 97.49 1.98 0.29 0.10 0.13
0-4 bp 88.25 9.67 1.36 0.35 0.36
Core Probes
![Page 43: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/43.jpg)
Correction of sequence-specific cross-hybridization to off-target transcripts
PAN3
Estimated expression levels of off-target transcripts of EEF1A1
Intensities of four probes of the target exon of PAN3
![Page 44: Analysis of Exon Arrays Slides provided by Dr. Yi Xing](https://reader030.vdocuments.us/reader030/viewer/2022033023/56649ece5503460f94bdbf63/html5/thumbnails/44.jpg)
Conclusion• Gene level index is accurate and reflects absolute abundance
• We show that sequence-specific modeling of microarray noise (background and cross-hybridization) improves the precision of exon-level analysis of exon array data.
• Overall, our data demonstrate that exon array design is an effective approach to study gene expression and differential splicing.
• Development of future “probe rich” exon arrays, with increased probe density on exons and inclusion of splice junction probes, will offer more powerful tools for global or targeted analysis of alternative splicing.