analysis of microarray data using monte carlo neural networks
DESCRIPTION
Analysis of Microarray Data using Monte Carlo Neural Networks. Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative Biology East Tennessee State University. Outline of Talk. Microarray Data Neural Networks A Simple Perceptron Example - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/1.jpg)
Analysis of Microarray Data using Monte Carlo Neural Networks
Jeff Knisley, Lloyd Lee Glenn,Karl Joplin, and Patricia Carey
The Institute for Quantitative BiologyEast Tennessee State University
![Page 2: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/2.jpg)
Outline of Talk
Microarray Data Neural Networks A Simple Perceptron Example Neural Networks for Data Mining
A Monte Carlo Approach Incorporating Known Genes
Models of the Neuron and Neural Networks
![Page 3: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/3.jpg)
Microarray Data
Goal: Identify genes which are up- or down- regulated when an organism is in a certain state
Examples: What genes cause certain insects to enter
diapause (similar to hibernation)? In Cystic Fibrosis, what non-CFTR genes are
up- or down-regulated?
![Page 4: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/4.jpg)
cDNA Microarray’s
Obtain mRNA from a population/tissue in the given state (sample) and a population/tissue not in the given state (reference)
Synthesize cDNA’s from mRNA’s in cell cDNA is long (500 – 2,000 bases) But not necessarily the entire gene Reference labeled green, Sample labeled red
Hybridize onto “spots”—each spot is a gene Each “Spot” is often (but not necessarily) a gene cDNA’s bind to each spot in proportion to
concentrations
![Page 5: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/5.jpg)
cDNA Microarray Data
Ri, Gi = intensities of ith spot Absolute intensities often cannot be compared Same reference may be used for all samples There are many sources of Bias Significant spot-to-spot intensity variations
may have nothing to do with the biology
Normalization to Ri,= Gi on average Most genes are unchanged, all else equal But rarely is “all else equal”
![Page 6: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/6.jpg)
Microarray Data
Several Samples (and References) A time series of microarrays A comparison of several different samples Data is in the form of a table
Jth microarray intensities are Rj,i, Gj,I
We often have subtracted background intensity
Question: How can we use Rj,i, Gj,I for n samples to predict which genes are up- or down- regulated for a given condition?
![Page 7: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/7.jpg)
MicroArray Data
We do not use Mj,I = log2( Rj,I / Gj,I ) Large | Mj,I | = obvious up or down regulation
In comparison to other | Mj,I | But must be large across all n microarrays
Otherwise, hard to make conclusions from Mj,I
It is often difficult to manage i jiMn
1
![Page 8: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/8.jpg)
Microarray Data
-6
-4
-2
0
2
4
6
0 2 4 6 8 10 12 14 16
Mi =
Log
2(R
i/Gi)
Log2(Ri Gi)
![Page 9: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/9.jpg)
Microarray Analysis
Is a classification problem Clustering: Classify genes into a few
identifiable groups Principal Component Analysis: Choose
directions (i.e., axes) (i.e., principal components) that reveal the greatest variation in the data and then find clusters
Neural Nets and Support Vector Machines Trained with Positive and Negative Examples Classifies unknown as positive or negative
![Page 10: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/10.jpg)
Artificial Neural Network (ANN)
Made of artificial neurons, each of which Sums inputs from other neurons Compares sum to threshold Sends signal to other neurons if above
threshold Synapses have weights
Model relative ion collections Model efficacy (strength) of synapse
![Page 11: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/11.jpg)
Artificial Neuron
th thi jw synaptic weight betweeni and j neuron
" "firing function that maps state to output
i i ix s i ij js w x
1x2x3x
nx
1iw2iw
3iw
inw
..
.
thj threshold of j neuron
Nonlinear firing function
![Page 12: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/12.jpg)
Firing Functions are Sigmoidal
j
j
j
![Page 13: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/13.jpg)
Possible Firing Functions
Discrete:
Continuous:
0
1j j
j jj j
if ss
if s
1
1 j j jj j s
se
![Page 14: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/14.jpg)
3 Layer Neural Network
Output
Hidden(is usually much larger)
Input
The output layer mayconsist of a single neuron
![Page 15: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/15.jpg)
ANN as Classifiers
Each neuron acts as a “linear classifier” Competition among neurons via nonlinear
firing function = “local linear classifying” Method for Genes:
Train Network until it can classify between references and samples
Eliminating weights sufficiently close to 0 does not change local classification scheme
![Page 16: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/16.jpg)
Multilayer Network
1 1 1t w x
1x
2x3x
nx
..
....
tN N N w x
12
N1
N
j jj
out
1
Nt
j j jj
out
w x
![Page 17: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/17.jpg)
How do we select w’s
Define an energy function
t vectors are the information to be “learned” Neural networks minimize energy The “information” in the network is
equivalent to the minima of the total squared energy function
2
1
1
2
n
i i ii
E t
![Page 18: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/18.jpg)
Back Propagation
Minimize the Energy Function Choose wj and j so that
In practice, this is hard Back Propagation with cont. sigmoidal
Feed Forward and Calculate E Modify weights using a rule
Repeat until E is sufficiently close to 0
0, 0ij j
E E
w
jjj
jjnewj
newjjjj
newj
newj twwt
1
,
![Page 19: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/19.jpg)
ANN as Classifier
1. Remove % of genes with synaptic weights that are close to 0
2. Create ANN classifier on reduced arrays
3. Repeat 1 and 2 until only the genes that most influence the classifer problem remain
Remaining genes are most important in classifying references versus samples
![Page 20: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/20.jpg)
Simple Perceptron Model
Input
referencefromif
samplefromifOutput
0
1
Gene 1
Gene 2
Gene m
w1
w2
wm
The wi can be interpreted to be measures of how important the ith gene is to determining the output
![Page 21: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/21.jpg)
Simple Perceptron Model
Features The wi can be used in place of the Mji
Detects genes across n samples & references Ref: Artificial Neural Networks for Reducing
the Dimensionality of Gene Expression Data, A. Narayanan, et al. 2004.
Drawbacks The Perceptron is a linear classifier (i.e., only
classifies linearly separable data) How to incorporate known genes
![Page 22: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/22.jpg)
Linearly Separable Data
Separation using Hyperplanes
![Page 23: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/23.jpg)
Data that Cannot be separated Linearly
![Page 24: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/24.jpg)
Functional Viewpoint
ANN is a mapping f: Rn → R Can we train perceptron so that f(x1,…,xn) =1 if
x vector is from a sample and f(x1,…,xn) =0 if x is from a reference?
Answer: Yes if data can be linearly separated, but no otherwise
So then can we design such a mapping for a more general ANN?
![Page 25: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/25.jpg)
Hilbert’s Thirteenth Problem
Original: “Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?”
Modern: Can any continuous function of n variables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?
![Page 26: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/26.jpg)
Kolmogorov’s Theorem
Modified Version: Any continuous function f
of n variables can be written
where only h depends on f
2 1
11 1
,...,n n
n ij j ij i
f s s h g s
![Page 27: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/27.jpg)
Cybenko (1989)
Let be any continuous sigmoidal function,
and let x = (x1,…,xn). If f is absolutely integrable
over the n-dimensional unit cube, then for all >0,
there exists a (possibly very large ) integer N and
vectors w1,…,wN such that
where 1,…,N and 1,…,N are fixed parameters.
1
NT
j j jj
f
x w x
![Page 28: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/28.jpg)
Recall: Multilayer Network
1 1 1t w x
1x
2x3x
nx
..
....
tN N N w x
12
N1
N
j jj
out
1
Nt
j j jj
out
w x
![Page 29: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/29.jpg)
ANN as Classifer
Answer: (Cybenko) for any >0, the function f(x1,…,xn) =1 if x vector is from a sample and f(x1,…,xn) =0 if x is from a reference can be approximated to within by a multilayer neural network.
But the weights no longer have the one-to-one correspondence to genes.
![Page 30: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/30.jpg)
ANN and Monte Carlo Methods
Monte Carlo methods have been a big success story with ANN’s Error estimates with network predictions ANN’s are very fast in the forward direction
Example: ANN+MC implement and outperform Kalman Filters (recursive linear filters used in Navigation and elsewhere) (De Freitas J. F. G., et. al., 2000)
![Page 31: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/31.jpg)
Recall: Multilayer Network
..
....
12
N
1
Nt
j j jj
out
w x
N Genes N nodeHidden Layer
j correspond to genes,but do not directly dependon a single gene.
![Page 32: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/32.jpg)
Naïve Monte Carlo ANN Method
1. Randomly choose subset S of genes
2. Train using Back Propagation
3. Prune based on values of wj (or j , or both)
4. Repeat 2-3 until a small subset of S remains
5. Increase “count” of genes in small subset
6. Repeat 1-5 until each gene has 95% probability of appearing at least some minimum number of times in a subset
7. Most frequent genes are the predicted
![Page 33: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/33.jpg)
Additional Considerations
If a gene is up-regulated or down-regulated for a certain condition, then put it into a subset in step 1 with probability 1.
This is a simple-minded Bayesian method. Bayesian analysis can make it much better.
Algorithm distributes naturally across a multi-processor cluster or machine Choose the subsets first Distribute subsets to different machines Tabulate the results from all the machines
![Page 34: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/34.jpg)
What Next…
Cybenko is not the “final answer” Real neurons are much more complicated ANN abstract only a few features Only at the beginning of how to separate noise
and bias from the classification problem. Many are now looking at neurons themselves
for answers
![Page 35: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/35.jpg)
Components of a Neuron
Dendrites
Soma
nucleus
Axon
Myelin Sheaths
Synaptic Terminals
![Page 36: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/36.jpg)
Signals Propagate to Soma
Signals Decay at Soma if below a Certain threshold
![Page 37: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/37.jpg)
Signals May Arrive Close Together
If threshold exceeded,then neuron “fires,” sending a signal along its axon.
![Page 38: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/38.jpg)
Signal Propogation along Axon
Signal is electrical Membrane depolarization from resting -70 mV Myelin acts as an insulator
Propagation is electro-chemical Sodium channels open at breaks in myelin Rapid depolarization at these breaks Signal travels faster than if only electrical
Neurons send “spike trains” from one to another.
![Page 39: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/39.jpg)
Hodgkin-Huxley Model
1963 Nobel Prize in Medicine Cable Equation plus Ionic Currents (Isyn) Can only be solved numerically Produce Action Potentials
Ionic Channels n = potassium activation variable m = sodium activation variable h = sodium inactivation variable
![Page 40: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/40.jpg)
Hodgkin-Huxley Equations
24 3
24
1 , 1 ,
1
m l K NaK Nai
n n m m
h h
d V VC g V V g n V V g m h V V
R x t
n mn n m m
t th
h ht
where any V with subscript is constant, any g with a bar is constant, and each of the ’s and ’s are of similar form:
/80
10 /10
10 1,
8100 1
Vn nV
VV V e
e
![Page 41: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/41.jpg)
Hodgkin-Huxley nearly intractable
So researchers began developing artificial models to better understand what neurons are all about
![Page 42: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/42.jpg)
A New Approach
Poznanski (2001): Synaptic effects are isolated into hot spots
Soma
Synapse
![Page 43: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/43.jpg)
Tapered Equivalent Cylinder
Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder
Soma
![Page 44: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/44.jpg)
Tapered Equivalent Cylinder
Assume “hot spots” at x0, x1, …, xm
Soma
0 x0 x1 . . . xm l
. . .
![Page 45: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/45.jpg)
Ion Channel Hot Spots
Ij is the ionic current at the jth hot spot
Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0 (plus boundary conditions )
2
214
nm
m j jji
R d V VR C V I t x x
R x t
![Page 46: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/46.jpg)
Convolution Theorem
The solution to the original is of the form
The voltage at the soma is
0
0
, , ,n t
initial j jj
V x t V G x x t I d
0
0
0, 0, ,n t
initial j jj
V t V G x t I d
![Page 47: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/47.jpg)
Ion Channel Currents
At a hot-spot, “voltage” V satisfies ODE of the form
Assume that ’s and ’s are large degree polynomials Introduce a new family of functions
“Embed” original into system of ODE’s for
4 3m l K NaK Na
VC g V V g n V V g m h V V
t
rqprqp hmnU ,,
rqpU ,,
![Page 48: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/48.jpg)
Linear Embedding: Simple Example
nnVAA
dt
dV ...0To Embed
jj VU Let . Then
110
1 njn
jjj VjAVjAdt
dVjV
dt
dU
![Page 49: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/49.jpg)
Linear Embedding: Simple Example
110 njnjj UjAUjA
dt
dU
The result is
The result is an infinite dimensional linear system which is often as unmanageable as the original nonlinear equation.
However, linear embeddings do often produce good numericalapproximations.
Moreover, linear embedding implies that each Ij is given by a linear transformation of the vector of U’s
![Page 50: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/50.jpg)
The Hot-Spot Model “Qualitatively”
0
0
0, 0, ,n t
j jj
V t G x t I d
Weighted sumsof functions
of one variable
convolutionsof
TheSum
of
Kolmogorov’s Theorem (given that convolutions are related to composition)
![Page 51: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/51.jpg)
Any Questions?
![Page 52: Analysis of Microarray Data using Monte Carlo Neural Networks](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681599f550346895dc6eb19/html5/thumbnails/52.jpg)
References
Cybenko, G. Approximation by Superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems, 2(4),1989, p. 303-314.
De Freitas J. F. G., et. al. Sequential Monte Carlo Methods To Train Neural Network Models. Neural Computation, Volume 12, Number 4, 1 April 2000, pp. 955-993(39)
L. Glenn and J. Knisley, Solutions for Transients in Arbitrarily Branching and Tapering Cables, Modeling in the Neurosciences: From Biological Systems to Neuromimetic Robotics, ed. Lindsay, R., R. Poznanski, G.N.Reeke, J.R. Rosenberg, and O.Sporns, CRC Press, London, 2004.
A. Narayan, et. al Artificial Neural Networks for Reducing the Dimensionality of Gene Expression Data. Neurocomputing, 2004.