lecture 9,10: dna microarrays: novel applications
TRANSCRIPT
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
1
10.555Bioinformatics: Principles, Methods and Applications
MIT, Spring term, 2003
Lecture 9,10:DNA Microarrays: Novel
Applications
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
2
Probing cellular function
DNA microarrays
Environment
DNA
mRNAFluxes
Proteins
Proteomics Metabolic Flux Analysis
Signal Transduction
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
3
DNA Micro-Array Methodology
DNA
Protein
mRNA
ReverseTranscription
Sample
Control
ReverseTranscription
MixEqually Apply to
Microarray
Sample Control
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
4
DNA Micro-Arraying Technology
<100 µm
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
5
Mountains of Biological Data
19,200 Human Gene Array
from TIGR
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
6
Beyond transcriptional studies: New applications of microarrays
1. Genome-wide screening for genes conferring specific cellular traits (Gill et al., PNAS, May 2002)
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
7
Fragment genomic DNA and gel purify to average size of
0.5-3 kbp
Ligate into plasmid toget Genomic Library [pTAGL]
Purify plasmids from time points throughout growth in selective
conditions
Fragment plasmids and label with fluorescent nucleotide
Heterogeneous Selective Growth of overexpression
library
33%
68%
88%
Plasmid DNA
Identify Genes byDNA Micro-Array
Genomic DNA
Cy5
Cy3
Growth AdvantageGrowth Disadvantage
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
8
(5.000)
(4.000)
(3.000)
(2.000)
(1.000)
-
1.000
2.000
0 5 10 15 20 25
Region of selective advantage for overexpression library
hours
ln(O
D60
0)
0.0% Pine-Sol
0.25% Pine-Sol
>1% Pine-Sol
Wild-Type Cell
Library Transformed Wild-Type Cell
Determining Selective Conditions
Which plasmidsare enriched?
Which plasmids are enriched?
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
9
0.4% Pine-Solmid-exponential phase
0.4% Pine-Sol early stationary phase
Control early stationary phase
Same culture+3 hours
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
10
-5
-4
-3
-2
-1
0
1
2
0 5 10 15 20 25 30hours
ln(O
D60
0) 0.0% Pine-Sol
0.4% Pine-Sol
Dynamic Screening of Plasmid Populations
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
11
0.4% Exponential 0.4% Early Stationary 0.0% Early Stationary
Plasmids: Resistance GenesGenomic DNA: Control
Discovering Antibiotic Resistance and Susceptibility Genes
Susceptibility Gene?
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
12
0% Pine-Sol
Cy3
Sig
nal
14000ygcA
sucDgabD
icdA
phePputA
fdoG
metR pheA
putA
cysH yhgBglnA
lpxD
livG, livJ, fdhF, proA, kdsB, trpE
0.4% Pine-SolC
y3 S
igna
l
14000
0Cy5 Signal0 2000Cy5 Signal0 2000
Quantifying Plasmid Levels to Identify Resistance Genes
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
13
Gene FunctionygcA UnknowngabD Succinate-semialdehyde dehydrogenasepheP Phenylalanine specific permeasesucD Succinyl-CoA synthetaseputA Proline dehydrogenaseicdA Isocitrate dehydrogenasefdoG Formate dehydrogenasemetR Positive regulator for methionine geneslpxD Glucosamine-N-acyltransferase (lipid A)pheA Phenylalanine dehydrogenasecysH Adenylsulfate reductaseglnA Glutamine synthetaseyhgB UnknownlivG High affinity branched chain amino acid transporterlivJ High affinity branched chain amino acid transporter
fdhF Formate dehydrogenasetrpE Anthranilate synthase component (tryptophan)
kdsBCMP-3-deoxy-D-manno-octulosonatecytidylytransferase (lipid A)
proA Glutamyl P reductase
Antibiotic Resistance Genes
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
14
0.112
0.080
0.000
0.065
0.065
0.165
0.000
0
CV4
0.040
0.015
0.029
0.135
0.135
0.020
0*7
NA
p-value5
9
4
4
4
4
4
4
29
N6
1.19RespBAD-livJ
1.14RespBAD-trpE
Res
Null
Null
Res
Res
Control
Trait
1.06 pBAD-hybC
1.06pBAD-leuC
1.18pBAD-pheP
1.40Enriched Library
1.20Library
1.03pUC19 Control
MIC1 Ratio2Transformant
Table 2
1MIC: Minimum Inhibitory Concentration. 2Ratio of MIC of the corresponding strain by the MIC value of the pUC19 control. 3MIC for pUC control was 0.6 +/- 0.063 %(v/v) Pine-Sol in LBA. 4Coefficient of Variation = SD/mean. Each strain was tested four to nine times with the pUC control tested 29 times. 5p-value is the one-tailed probability that the mean MIC between the transformant and the pUC19 control were equal using a students means t-test for two samples of unequal variances. Therefore the null hypothesis that the MIC means of the control and the strains reported are the same is rejected with 95% confidence for all strains except of those of the null genes. 6Number of separate MIC assays. 7Variance in library MIC assays was zero
Validation Experiments
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
15
Beyond transcriptional studies: New applications of microarrays
1. Genome-wide screening for genes conferring specific cellular traits (Gill et al., 2002)
2. Genome-wide location and function of DNA binding proteins (Ren et al., Science, December 22, 2000)
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
16
Research Goal
• Understand how DNA binding proteins regulate global gene expression
• Study genes whose expression is directly controlled by Gal4 and Ste12
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
17
ChIP Chip
• ChIP: Chromatin Immuno-Precipitation• Fix cells with formaldehyde and harvest• Use an antibody to precipitate DNA
fragments bound to protein of interest• Remove cross-links, amplify, label (Cy5)• Mix with unenriched sample (Cy3)• Bind to DNA microarray
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
18
Example of results
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
19
Why?
• Microarrays alone cannot distinguish primary effects from secondary effects
• Identification of where protein binds gives exact interaction
g3g2g1
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
20
Example: Gal4 in yeast
• Gal4 activates the genes necessary for galactose metabolism
• Find genes which are upregulated in galactose and bound by Gal4
• 10 targets found with 0.001 ≥ P value
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
21
Results
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
22
Results, Part II
• 7 genes known to be bound by Gal4– GAL1, GAL2, GAL3, GAL7, GAL10, GAL80, GCY1
• 3 others (MTH1, PCL10, FUR4)
• The consensus binding sequence (CGGN11CCG) occurs in many other places“…sequence alone is not sufficient to account for
specificity of Gal4 binding in vivo.”
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
23
Confirmation
+/- Strains with or without tagged Gal4I/P Unenriched or Enriched DNA
Standard Chromatin IP
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
24
Additional confirmationRT-PCR to quantify expression of 3 unexpected genes
Wild type vs gal4-
“Galactose-induced expression of FUR4, MTH1, and PCL10 is Gal-4 dependent”
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
25
Results, Part IIIInduce transport and conversion
Reduce other transporters
Maximize energy from galactose
Increase pyrimadines for UDP (?)
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
26
Beyond transcriptional studies: New applications of microarrays
1. Genome-wide screening for genes conferring specific cellular traits (Gill et al., PNAS)
2. Genome-wide location and function of DNA binding proteins (Ren et al., Science, December 22, 2000)
3. Experimental annotation of human genome (Shoemaker et al., Nature 409:922-927, 2001; Rosetta Inpharmatics)
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
27
WHAT THE PAPER CLAIMS...
Microarray technology:• High-throughput microarray based experimental method
to validate predicted exons• Group the exons into genes by co-regulated expression• Define full-length mRNA transcriptsApproaches:1 Exon array approach - high-throughput method2 Tiling array approach - higher-resolution method
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
28
EXON ARRAY APPROACH
Analysis of human chromosome 22q
• Exon arrays consisting of long oligonucleotide probes derived from the 8,183 predicted exons annotated on chromosome 22q were fabricated
• Hybridization with flourescently labeled cDNAs derived from 69 experiments conducted on pairs of conditions(e.g., kidney vs. lung, testes vs uterus, etc.) using Cy3/Cy5 labels
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
29
FABRICATION OF EXON ARRAYS
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
30
COMPENDIUM OF EXPERIMENTS
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
31
VALIDATION OF GENE BOUNDARIES
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
32
SUMMARY OF RESULTS
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
33
TILING ARRAY APPROACH
• Higher-resolution view of a genomic region of interest• Can potentially reveal exons not identified by current gene
prediction algorithms• Provides information about alternative splicing• Can be applied specifically on initial and terminal exons
where gene prediction programs are not very accurate• Does not need any a priori information on exon content of
genomic sequence
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
34
TESTES TRANSCRIPT REFINED USING TILING ARRAYS
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
35
HUMAN GENOME SCAN USING EXON ARRAYS
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
36
DISCUSSION
• A microarray approach to the simultaneous validation of gene predictions and study of the transcriptome under any number of medically interestingly conditions
• Exon-based approach provides high-throughput screening of diverse cell types, growth conditions and disease states
• Method could be used for rapid annotation of sequence information from the Human Genome project
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
37
Beyond transcriptional studies: New applications of microarrays
1. Genome-wide screening for genes conferring specific cellular traits
2. Genome-wide location and function of DNA binding proteins (Ren et al., Science, December 22, 2000)
3. Experimental annotation of human genome4. Large scale identification of secreted and
membrane associated gene products (Diehn et al., Nature Genetics, 25:58-62 2000)
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
38
Motivation
• Importance of membrane-associated and secreted proteins– Receptors– Transporters– Adhesion molecules– Hormones– Cytokines
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
39
Current Methods
• Computational– Potential amino-terminal membrane-targeting
signals– Trans-membrane domains
• Require knowledge of entire coding sequence
• Experimental– No large-scale genomic methods
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
40
Method for isolating membrane-bound polysomes
Cy5Cy3
Equilibrium density centrifugation in aSucrose gradient
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
41
Evaluation for genes with known localization
Distinctly membrane-bound
Distinctly cytosolic
Overlap:• Confirmed in other organisms
and from literature• No simple classification,
biological explanation
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
42
Characterization of unclassified genes
Moving Average Methodology•Sort genes by the ratio value•Compute fraction of the genes known to be membrane-associated in a given window of specified size•Plot this fraction against the expression of the central gene in window
Compare with the already characterized genes, in the following graphs.Fraction = probability
Human Cells (Jurkat)Window size: 151 genes
Yeast CellsWindow size: 175 genes
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
43
Comparison with computational predictions
Genes enriched in themembrane fraction
Only for Yeast genes, since fully sequenced
Also identify several genes that are membrane-associated but were not computationally predicted
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
44
Misclassifications/Exceptions• Cytosolic protein in expt. membrane fraction
– HAC1: spliced in cytoplasm by interaction with Ire1p, an ER transmembrane protein
• Cytosolic protein with membrane binding domain– Calcineurin B: associated to cytoplasmic side of plasma
membrane. Ribosomes may be recruited to PM
• Alternative splice sites– The present microarray cannot distinguish between the
various forms
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
45
Analysis of Microarray Data
1. Internal Validation• Background• Normalization• Confidence intervals
2. Analysis of Static expression data• PCA, Decision trees
3. Analysis of Dynamic expression data• Clustering (Hierarchical, SOM’s)
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
46
Analysis of Microarray Data
1. Internal ValidationRNA Diagnostics (yield, purity)Average Yield: 3.5 +/- 0.8 ug RNA/108 cells (Cyanobacteria)Average Purity: A260/280 = 1.85 +/- 0.1
Data Filtering (Removes precipitants, ghost spots, weak signals)Filter 1: (Backgroundlocal-Backgroundavg) > xSDbackgroundFilter 2: (Signallocal - Backgroundlocal ) < ySDB’local
Filter1 Filter2 S/N Cy3 S/N Cy5 % Spotsx(SDavg) y(SDB’local) Retained
1 0 4.1 2.2 98%3 0.5 5.3 2.8 91%1 1 7.8 3.8 75%2 2 10.3 4.5 57%0.5 2 6.7 3.4 69%
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
47
Filtering
Cy3 Cy5Labeling 112 NT/Cy3 293 NT/Cy5Brightness (Ex*QY) 57,000 70,000
Precipitation Ghost Spots
Adjustment
Signal Background
Internal Validation of Micro-Arraying Methodology
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
48
Signal/Noise: S/N= (Signali - Backgroundi )/σback
For S/N > 1.96 the signal is diferent than the background with 95% confidence interval (CI)
Labeling DiagnosticsExtinction Quantum Brightness Avg. Incorp’nCoeff. (EC) Yield (QY) (EC*QY) (NT/Cy)
Cy3: 150,000 0.38 57,000 112 +/- 47Cy5: 250,000 0.28 70,000 293 +/-154
Ratio NTCy5/NTCy3Molar 2.62Brightness 3.21
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
49
(2.0
)
0.4
2.7
5.1
7.5
9.9
12.2
14.6
0
50
100
150
200
250
How Repeatable are Our Results?
CV = 30-45%
95% Confidence Interval = 1.6-1.9 = Intensity SampleIntensity Control
How Confident Are We That Signal is not Noise?
BackgroundSignal
Confidence
SDbackground
Internal Validation of Micro-Arraying Methodology
# Sp
ots
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
50
Normalization:1. By the average or total signal for each fluorophore (accounts for variations in brightness and total RNA)
• Values now are reported as fractions of total RNA. This may change under certain conditions (partial arrays)
2. By the average of rRNA genes. They can change too
3. Backgrounds: BCy3 / BCy5
4. Specific brightness: BRCy3 / BRCy5
5. Total signal: Scy3 / SCy5
6. Total signal minus background: ((S-B)3 / (S-B)5)Is class discovery dependent upon filtering and normalization?
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
51
-5.9
3
-4.4
3
-3.6
8
-2.9
9
-2.4
3
-1.8
1
-1.2
3
-0.6
5
-0.0
7
0.57
1.12
1.80
2.32
3.00
3.56
4.56
5.53
0
2040
60
80
100
120140
160
180
200
Filtering
Filtered then Adjusted by Average SignalFiltered then Adjusted by Background and BrightnessNo Filter, Adjusted by Background and Brightness
Avg. InducedAvg. Repressed
Log2 (Sample/Control)
# of
Gen
es
Adjustment
Internal Validation of Micro-Arraying Methodology
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
52
Analysis of Microarray Data
1. Internal Validation• Background• Normalization• Confidence intervals
2. Analysis of Static expression data• PCA, Decision trees
Key Questions• Sample classification-Diagnosis• Identification of discriminatory genes
3. Analysis of Dynamic expression data• Clustering (Hierarchical, SOM’s)
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
53
Data Analysis and Pattern Classification
Problem-1: Consider N samples and M genes with theircorresponding expression levels, ei , where i = 1, …, M. M1of these tissues are characterized as “Healthy”, while theother M2 are labeled as “Pathological”. Find the set of discriminatory genes whose expression levels can diagnose the state, i.e. healthy or pathological, of a new sample tissue.Feature Space: The space of expression levels for the Mgenes, i.e. FS = {e1 , e2 , e3 , …, eM-1 , eM }Class: A set of genes characterized by the same label, e.g.C1 = “Healthy” and C2 = “Pathological”.Pattern: The specific M-tuple of expression levels, whichcharacterizes a tissue as belonging to a specific class, i.e.p(2) = {e(2)
1 , e(2)2 , e(2)
3 , …, e(2)M-1 , e(2)
M }, Pattern for“Pathological” Tissues.
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
54
Data Analysis and Pattern Classification
Pattern Classification: The process through which the feature space, FS, is partitioned into K exclusive regions,FSi i = 1, 2, …, K. Thus,
FS(i) ∩ FS(j) = 0 and ∪i=1-K FS(i) = FSDiscriminant Functions: d (p) = d (e1 , e2 , e3 , …, eM-1 , eM)define the partition of the feature space into the K regions.
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
55
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
56
Data Analysis and Pattern Discovery
• Problem-2: Consider N samples and M genes with theircorresponding expression levels, ei , where i = 1, …, M.“Discover” the patterns in gene expression levels which arecommon in a number of samples, i.e. find the groups ofsamples, each of which is characterized by a common patternin gene expression and define this common pattern of geneexpression levels for each group of samples.
• Problem-3: Consider one type of sample and the geneexpression levels for M genes over a period of L timepoints. “Discover” the patterns in gene expression levels,which are common for a particular group of genes, andcluster the genes with similar patterns into the same group.
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
57
Data Analysis and Pattern Classification
Training: The process through which one determines thediscriminant functions, using past examples of “pattern” -“class” associations, i.e. associations betweenpattern p(i) = {e(i)
1 , e(i)2 , e(i)
3 , …, e(i)M-1 , e(i)
M } and Class C(i)
Types of Problems:• Static: when the gene expression levels represent theexpression at a single time.• Dynamic, or Time-Dependent: when the expression levelsare measured over a period of time at various time intervals.
• Equal sampling intervals.• Unequal sampling intervals.
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
58
Data Analysis and Pattern Classification
• Issues to Resolve:– Labeling the various samples– Representation:Selecting the distinguishing features for classification; particularly important for time-dependent data, e.g. do you use the values, or the time derivatives of expression levels for classification?– Selecting the form of the discriminant function– Do you have statistically “enough” data for training?– Do you have enough data for testing?– What is the “noise” in your measurements?– What is the sensitivity of the generated discriminantfunction?– What is the robustness of the resulting classification scheme?
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
59
Information Theory:Decision Trees in Pattern Classification
Let N be the total number of examples (e.g. samples) and Mithe number of samples in each of the K classes.The Shannon entropy provides a measure of the information contentin the data set,
I(M1, M2, …, MK) = Σi=1-K (Mi/M) log2 (Mi/M)
• If all examples belong in the same class then I = 0.• The smaller the entropy the less variety of classes (more order) in the data set.
Split the data into two groups G1 and G2 with M(1) and M(2) examples (samples) in each group. Compute the information content for each group and for the whole set of examples.
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
60
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
61
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
62
0 0.05 0.1 0.15 0.2 0.25-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Principal Component 1
Prin
cipa
l Com
pone
nt 2
Loading plot of the Genes
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7
x 104
-4
-2
0
2
4
6
8
10x 10
4
Principal Component 1
Prin
cipa
l Com
pone
nt 2
Score Plot of the Tissue Samples
Colon
Esoph
Lu-4
Lu-3 S tom
Blood
VU-3 Cerv
ix
Pros t
Endo-1
Endo-2P lc-1 Plc-
2
VU-1
VU-2
Lu-1
Lu-2
Mu-2
Mu-1
Mu-3
Tes tes
Spleen
Myo-1
Myo-2
Kid-1
Kid-2
Kid-3
Kid-4
Br-Ag
Br4
Br-MC
Br-HC
Br-CP
Ovary1
Ovary2
Li-1
Li-2
Bre-1
Bre-2
Br-Ce
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7
x 104
-4
-2
0
2
4
6
8
10x 10
4
Principal Component 1
Prin
cipa
l Com
pone
nt 2
Score Plot of the Tis sue Samples
Colon
Esoph
Lu-4
Lu-3
Stom
Blood
VU-3
Cervix
Prost
Endo-1
Endo-2
Plc-1
Plc-2
VU-1
VU-2
Lu-1
Lu-2
Mu-2
Mu-1
Mu-3
Tes te
s
Spleen
Myo-1
Myo-2 Kid-
1
Kid-2
Kid-3
Kid-4
Br-Ag
Br4
Br-MC
Br-HC
Br-CP
Ovary1
Ovary2
Li-1
Li-2
Bre-1
Bre-2 Br-C
e
Figure 1: Selection of relevant genes using the loadings on the principal components.
(a) (b)
(c)(d)
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
63
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
Principal Component 1
Prin
cipa
l Com
pone
nt 2
Loading Plot of the Genes
0.5 1 1.5 2 2.5 3
x 104
-1
-0.5
0
0.5
1
1.5
2x 10
4
Spleen
BR-AG
PE1
VU2
BR-HC
Liver
CER2 Blood
PE2
BR-CA
Kid-3 Lung
Ovary
Sk Mus
CER1
Kid-2
Myo-1
P lac
Kid-4
BR-L4
VU1
Kid-1
Myo-2
BR-MOT
Score P lot of the Samples
Principal Component 1
Prin
cipa
l Com
pone
nt 2
0 1 2 3 4 5 6 7
x 104
-3000
-2500
-2000
-1500
-1000
-500
0
500
Spleen
BR-AG
PE1
VU2
BR-HC
Liver
CER2
Blood
PE2
BR-CA
Kid-3
Lung
Ovary
Sk Mus
CER1
Kid-2
Myo-1
P lac
Kid-4
BR-L4
VU1
Kid-1
Myo-2
BR-MOT
Score Plot of the Samples
Principal Component 1
Prin
cipa
l Com
pone
nt 2
0 1 2 3 4 5 6 7
x 104
-6000
-5000
-4000
-3000
-2000
-1000
0
1000
SpleenBR-A
G
PE1 VU2
BR-H
C Liver
CER2 Blood
PE2 BR-C
A
Kid-3
Lung
Ovary
Sk Mus
CER1 Kid-
2
Myo-1
P lac
Kid-4
BR-L4
VU1
Kid-1
Myo-2
BR-MOT
Score Plot of the Samples
Principal Component 1
Prin
cipa
l Com
pone
nt 2
Figure 2: Projection of the samples using the genes in the specific structures observed in the Loading plot.(a) The projection using genes in structure A separated the skeletal muscle tissue. The genes inthis structure contained several troponins and skeletal muscle related genes. (b)The projection using genesin structure B separated the liver and lung. These genes contained albumins and apolipoproteins, among others (c) The genes in structure C separated the brain samples from the remaining tissues. The structure contained severalbrain specific genes and ribosome-related genes.
(a)
(b) (c)
A
B
C
10.555-Bioinformatics-MIT-2003
L 09-10: Microarrays 1-Novel applications
64
0 0.5 1 1.50
5
10
15
20
25
30His togram of the angles of the genes in the loading plot
Angle
Num
ber o
f Gen
es
Structure A
Structure BStructure C
0 0.5 1 1.5 2 2.5 3
x 104
-1
-0.5
0
0.5
1
x 104
P rincipal Component 1
Prin
cipa
l Com
pone
nt 2
Li-1
Li-2
NLi-1
NLi-3
NLi-2
0 1 2 3 4 5 6
x 104
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5x 10
4
Mu-2
Mu-1
Mu-3
P rincipal Component 1
Prin
cipa
l Com
pone
nt 2
NMu-
1NM
u-2
NMu-
3
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
x 104
-1
-0.5
0
0.5
1
1.5
2
2.5x 10
4
P rincipal Component 1
Prin
cipa
l Com
pone
nt 2