random walks presented by cindy xiaotong lin. why random walks? a random walk (rw) is a useful...
TRANSCRIPT
![Page 1: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/1.jpg)
Random Walks
Presented By Cindy Xiaotong Lin
![Page 2: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/2.jpg)
Why Random Walks?
A random walk (RW) is a useful model in understanding stochastic processes across a variety of scientific disciplines.
Random walk theory supplies the basic probability theory behind BLAST ( the most widely used sequence alignment theory).
![Page 3: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/3.jpg)
What is a Random Walk?
An Intuitive understanding: A series of movement which direction and size are randomly decided (e.g., the path a drunk person left behind).
Formal Definition: Let a fixed vector in the d-dimensional Euclidean space and a sequence of independent, identically distributed (i.i.d.) real-valued random variables in . The discrete-time stochastic process defined by
is called a d-dimensional random walk
nn XXXS 10
0XdR 1, nX n
dR 1: nSS n
![Page 4: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/4.jpg)
Definitions (cont.)
If and RVs take values in , then is called d-dimensional lattice random walk.
In the lattice walk case, if we only allow the jump from to where or , then the process is called d-dimensional sample random walk.
0X nXdI
1, nSn
),...,( 1 dxxX ),...,( 11 ddxxY 1
1k 1
![Page 5: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/5.jpg)
Definitions (cont.)
A random walk is defined as restricted walk if the walk is limited to the interval [a, b].
The endpoints a and b are called absorbing barriers if the random walk eventually stays there forever;
or reflecting barriers if the walk reaches the endpoint and bounces back.
![Page 6: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/6.jpg)
Example: sequence alignment modeled as RW
| | | ||| || |||ggagactgtagacagctaatgctatagaacgccctagccacgagcccttatc
Simple scoring schemes:at a position: +1, same nucleotides -1, different nucleotides
*
![Page 7: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/7.jpg)
Example (cont.): simple RW
Ladder Point (LP):the point in the walk lower than any previously reached points.
Excursion: the part of the walk from a LP until the highest point attained before the next LP.
Excursions in Fig: 1, 1, 4, 0, 0, 0, 3;
BLAST theory focused on the maximum heights achieved by these excursions.
Ladder point
![Page 8: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/8.jpg)
Example (cont.): General RW
Consider arbitrary scoring scheme (e.g. substitution matrix)
![Page 9: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/9.jpg)
RW: Consider a 1-d simple RW starting at h, restricted to the interval [a, b], where a and b are absorbing barriers, and
Problems: I. (Absorption Probabilities) what is the probability that eventually the walk finishes at b (or a) rather than a (or b), i.e., (or )?
II. What is the mean number of steps taken until the walk stops ( )?
Primary Study of RW: 1-d simple RW
h
qX n )1Pr(
hm
pX n )1Pr(
hu
![Page 10: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/10.jpg)
Methods
The Difference Equation Approach Classical
The Moment-Generating Function Approach Ready to generate to more complicate
walk
![Page 11: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/11.jpg)
Assume: the probability that the simple random walk eventually finishes (absorbed) at b.
Difference Equation obtained by comparing the situation just before and after the first step of the walk:
(7.4)
Initial Conditions: (7.5)
Difference Equation Approach (M1)
h
11 hhh qp
1,0 ba
![Page 12: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/12.jpg)
M1 (cont.): solutions
Solve Equ 7.4, using the theory of homogeneous difference equations
when :
The same procedure can be used to obtain the
probability that the walk ends at a,
qp
ab
ah
hee
ee**
**
ab
hb
hee
eeu **
**
p
qlog*
![Page 13: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/13.jpg)
M1 (cont.): mean number of steps
Difference Equation:
Initial Conditions:
Solution:
hm
111 hhh qmpmm
0 ba mm
ab
ah
hee
ee
pq
ab
pq
ahm **
**
![Page 14: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/14.jpg)
Moment-Generating function Approach (M2)
Recall the definition of mgf of a random variable Y:
In our case, mgf of random variable is:
According to Theorem 1.1, there exists a unique nonzero value of such that
(7.12)
)()()( yPeeEm Y
y
y
Y
nX
peqem )(
*
1)( * m
![Page 15: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/15.jpg)
M2 (cont.)
The mgf of the total displacement after N steps is from (2.17)
When the walk has just finished, the total displacement is either
or with the probabilities of or respectively:
)0(,1)(**
Npeqe N
hb ha h
hu
1)1(** )()( ha
hhb
h eae
![Page 16: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/16.jpg)
M2 (cont.)
Therefore, we have
Thus,
Which is identical to (7.9), the solution from difference equation approach.
1)1(** )()( ha
hhb
h eae
**
**
ab
ah
hee
ee
![Page 17: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/17.jpg)
M2(cont.): Mean number of steps until the walk stops
Assume the total displacement after N steps is
Theorem 7.1(Wald’s Identity) states:
Derivative with respect to on both sides, and obtain
N
j jN ST1
1))(( NTN emE
hN mSETE )()(
![Page 18: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/18.jpg)
M2(cont.)
In , (7.24) The mean of displacement in N steps
The mean of step size
Which states: the mean value of the final total displacement of the walk, is the mean size of each step multipled by the mean number of steps taken until the walk stops
hN mSETE )()(
)()()( hauhbTE hhN
qpSE )(
![Page 19: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/19.jpg)
M2(cont.)
The mean of number of steps until the walk stops,
Which is agree with the result from difference equation approach
qp
hbhaum hh
h
)()(
![Page 20: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/20.jpg)
An Asymptotic case: a walk BLAST concerns
The walks BLAST concerns are, a walk without upper boundary and ending at -1.
Applying the previous results and We get the following Asymptotic results:
The probability distribution of the maximum value that the walk ever achieves before reaching -1 is in the form of the geometric-like probability.
The mean number of steps until the walk stops,
bah ;1;0
b
pqm
10
![Page 21: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/21.jpg)
General Walk
Suppose generally the possible step sizes are, and their respective probabilities are, The mean of step size is negative, i.e.,
The mgf of S(step size) is,
ddcc ,1,...,0,...,1, dcc ppp ,...,1,
0)(
d
cjjjpSE
d
cj
jjepm )(
![Page 22: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/22.jpg)
General Walk (cont.)
According to Theorem 1.1, there exists unique positive , such that,
To consider the walk that start at 0, with stopping boundary at -1 and without upper boundary, impose an artificial barrier at
The possible stopping points can be,
And Wald’s Identity states, where, is the total displacement when
the walk stops.
*
1*
d
cj
jjep
0y
.1,...,,...,1, dyycc
1)(*
NTeE NT
![Page 23: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/23.jpg)
General Walk
Thus,
Where, is the probability that the walk finishes at the point k.
The mean of number of steps until the walk stops or would be
111
**
dy
yk
kk
ck
kk ePeP
kP
A 0m
d
cj j
c
j jN
jp
jR
SE
TEA 1
)(
)(
![Page 24: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/24.jpg)
General Walk: unrestricted
Objective: Find the probability distribution of the maximum value that the walk ever achieves before reaching -1 or lower.
Define: the probability that in the unrestricted walk,
the maximum upward excursion is or less; is the probability that the walk visits the
positive value before reaching any other positive value.
)(yFunrY
ykQ
k
![Page 25: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/25.jpg)
General Walk: unrestricted
Therefore,
The event that in the unrestricted walk the maximum upward excursion is y or less is the union of the event that the maximum excursion never reaches positive values and the events the first positive value achieved by the excursion is k, k=1,2,…y, then the walk never achieves a further height exceeding y-k
Applying the Renewal Theorem, we have,
d
Y
y
kkY
QQQQ
kyFQQyFunrunr
...1
);()(
21
_
0
_
),,(
,))(1(lim
*_
*
k
yY
y
QQfV
VeyFunr
![Page 26: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/26.jpg)
General Walk: restricted
Consider general walk starting at 0, lower barrier at -1.The size of an excursion of the unrestricted walk can
exceed the value either before or after reaching negative value, i.e.,
Where, the probability that the size of an excursion in the restricted walks exceeds the value up y. is the probability that the first negative value reached by the walk is .
y
)(* yF Y
)()()( *
1
** jyFRyFyF unrunr Y
c
jjYY
jRj
![Page 27: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/27.jpg)
General Walk: restricted
Then,
d
k
kk
c
j
jj
yY
ekQe
eRQ
C
CeyYyF
1
1
_
*
))(1(
)1(
,~)Pr()(
**
*
*
![Page 28: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/28.jpg)
Application: BLAST
BLAST is the most frequently used method for assessing which DNA or protein sequences in a large database have significant similarity to a given query sequence;
a procedure that searches for high-scoring local alignments between sequences and then tests for significance of the scores found via P-value.
The null hypothesis to be test is that for each aligned pair of animo acids, the two amino acids were generated by independent mechanism.
![Page 29: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/29.jpg)
BLAST (cont.) : modeling
The positions in the alignment are numbered from left to right as 1, 2,…, N. A score S(j, k) is allocated to each position where the aligned amino acid pair (j,k) is observed, where S(j,k) is the (j,k) element in the substitution matrix chosen.
An accumulated score at position i is calculated as the sum of the scores for the various amino acid comparison at position 1, 2,…,i. As i increases, the accumulated score undergoes a random walk.
![Page 30: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/30.jpg)
BLAST (cont.) : calculating parameters
Let Y1, Y2,… be the respective maximum heights of the excursions of this walk after leaving one ladder point and before arriving the next, and let Ymax be the maximum of these maxima. It is in effect the test statistic used in BLAST. So it is necessary to find its null hypothesis distribution.
The asymptotic probability distribution of any Yi is shown to be the geometric-like distribution. The values of C and in this distribution depend on the substitution matrix used and the amino acid frequencies {pj} and {pj’}. The probability distribution of Ymax also depends on n, the mean number of ladder points in the walk.
*
![Page 31: Random Walks Presented By Cindy Xiaotong Lin. Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649d015503460f949d3973/html5/thumbnails/31.jpg)
Discussion
???