estimation & inference for point processes
Post on 05-Jan-2016
46 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
1. MLE
2. K-function & variants
3. Residual methods
4. Separable estimation
5. Separability tests
Estimation & Inference for
Point Processes
2
Maximum Likelihood Estimation:
For a space-time point process N, the log-likelihood function is given by
log L = ∫oT∫S log(t,x) dN - ∫o
T∫S (t,x) dt dx.
Why?-------------x-x-----------x-----------x----------x---x---------------------0 t1 t2 t3 t4 t5 t6 T
Consider the case where N is a Poisson process observed in time only.
L = P(points at t1,t2,t3,…, tn, and no others in [0,T])= P(pt at t1) x P(pt at t2) x … x P(pt at tn) x P{no others in [0,T]}= (t1) x (t2) x … x (tn) x P{no others in [0,t1)} x … x P{no others in [tn,T]} = (t1) x … x (tn) x
exp{-∫ot1 (u) du} x … x exp{-∫tn
T (u) du}
= (ti) x exp{-∫oT (u) du}.
So log L = ∑ log (ti) - ∫oT (u) du.
3
log L = ∫oT∫S log(t,x) dN - ∫o
T∫S (t,x) dt dx.
Here (t,x) is the conditional intensity. The case where the Papangelou intensity p(t,x) is used instead is called the pseudo-likelihood.
When depends on parameters , so does L:
log L() = ∫oT∫S log(t,x; ) dN - ∫o
T∫S (t,x; ) dt dx.
Maximum Likelihood Estimation (MLE):
Find the value of that maximizes L().
(In practice, by finding the value that minimizes -log L().)
•Example: stationary Poisson process with rate (t,x) = .
log L() = ∫oT∫S log(t,x) dN - ∫o
T∫S (t,x) dt dx
= n log( - ST
d logL()/d = n/ - ST
which = 0 when = n/ST.
o
S
o T
4
Under somewhat general conditions, is consistent, asymptotically normal, and asymptotically efficient (see e.g. Ogata 1978, Rathbun 1994). Similarly for pseudo-likelihoods (Baddeley 2001).
Important counter-examples:
() = + t, () = exp{ + t}, (for < 0).
Other problems with MLE:
• Bias can be substantial. e.g. Matern I, = min{||(xi , yi) - (xj , yj)||}.
• Optimization is tricky: requires initial parameter estimate and a tolerance threshold; can fail to converge; can converge to local maximum, etc.
Nevertheless, MLE and pseudo-MLE are the only commonly-used methods for fitting point process models.
5
K-function & Variations:
Usual K-function, for spatial processes only (Ripley 1978):
• Assume the null hypothesis that N is stationary Poisson, with constant rate
• K(h) = 1/Expected # of pts within distance h of a given pt]
• Estimated via 1/ [∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n], where = n/S.
• Under the null hypothesis, K(h) = 1/h2] = h2.
• Higher K indicates more clustering; lower K indicates inhibition.
• Centered version, L(h) = √[K(h)/ - h.
L > 0 indicates clustering, L < 0 indicates inhibition.
• Version based on nearest-neighbors only (J-function):
J(h) ~ 1/Pr{nearest neighbor of a given point is within distance h}
6
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
7
K-function & Variations:
Weighted K-function (Baddeley, Møller and Waagepetersen 2002; Veen 2006):
• Null hypothesis is a general conditional intensity (x,y).
• Weight each point (xi,yj) by a factor of wi = (xi,yi)-1.
• Estimated K-function is S ∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n2;
Kw(h)^
= S ∑∑i≠j wi wj I(|(xi ,yi) - (xj,yj)| ≤ h) / n2,
where wi = (xi,yi)-1.
• Asymptotically normal, under certain regularity conditions (Veen 2006).
• Centered version: Lw(h)^
= √[Kw(h)^
/π] - h, for R2.
• Lw(h)^
> 0 indicates more weight in clusters within h than expected according
to the model for (x,y). ==> (x,y) is too low in clusters. That is, the model does not adequately capture clustering in the data.
• Lw(h)^
< 0 indicates less weight in pairs within h than expected according to
the model for (x,y). ==> (x,y) is too high, for points within distance h. The model over-estimates the clustering in the data (or underestimates inhibition).
8
These statistics can be used for estimation as well as testing:
Given a class of models with parameter to be estimated, choose the parameter that minimizes some distance between the observed estimate of K(h) and the theoretical function K(h; ) [Guan 2007].
Similarly for other statistics such as Kw(h)^
[Veen 2006].
9
10
Model(x,y;) = (x,y) + (1- ).
h (km)
11
3) How else can we tell how well a given pp model fits?
a) Likelihood statistics (LR, AIC, BIC).[For instance, AIC = -2 logL() + 2p.Overly simplistic. Not graphical.
b) Other tests TTT, Khamaladze (Andersen et al. 1993)
Cramèr-von Mises, K-S test (Heinrich 91)
Higher moment and spectral tests (Davies 77)
Integrated residual plots (Baddeley et al. 2005):
Plot: N(Ai) - C(Ai), over various areas Ai. Useful for the mean, but questionable power.
Fine-scale interactions not inspected. d) Rescaling, thinning (Meyer 1971; Schoenberg 1999, 2003)
12
For multi-dimensional point processes: ^
Stretch/compress one dimension according to , keeping others fixed.
^ Transformed process is Poisson with rate 1
iff. = almost everywhere.
13
Problems with multi-dimensional residual analysis:
* Irregular boundary, plotting.
* Points in transformed space can be hard to interpret.
* For highly clustered processes: boundary effects, loss of power.
Possible solutions: truncation, horizontal rescaling.
Thinning: Suppose inf (xi ,yi) = b.
Keep each point (xi ,yi) in original dataset
with probability b / (xi ,yi) .
Obtain a different residual process, same scale as data.
Can repeat many times --> many Poisson processes (but not quite independent!)
14
15
16
17
18
19
20
21
Conditional intensity (t, x1, …, xk; ): [e.g. x1=location, x2 = size.]
Separability for Point Processes: • Say is multiplicative in mark xj if
(t, x1, …, xk; ) = 0 j(t, xj; j) -j(t, x-j; -j),where x-j = (x1,…,xj-1, xj+1,…,xk), same for -j and -j
• If ~is multiplicative in xj
^ and if one of these holds, then
j, the partial MLE,= j, the MLE:• S -j(t, x-j; -j) d-j = , for all -j.
• S j(t, xj; j) dj = , for all j.
^
~
• S j(t, x; ) d = S j(t, xj; j) dj = , for all .
22
Individual Covariates:
• Suppose is multiplicative, and j(t,xj; j) = f1[X(t,xj); 1] f2[Y(t,xj); 2].
If H(x,y) = H1(x) H2(y), where for empirical d.f.s H,H1,H2,and if the
log-likelihood is differentiable w.r.t. 1,
then the partial MLE of 1 = MLE of 1.
(Note: not true for additive models!)
• Suppose is multiplicative and the jth component is additive: j(t,xj; j) = f1[X(t,xj); 1] + f2[Y(t,xj); 2].
If f1 and f2 are continuous and f2 is small: S f2(Y; 2)2 / f1(X;
~1) dp 0],
then the partial MLE 1 is consistent.
23
Impact
• Model building.
• Model evaluation / dimension reduction.
• Excluded variables.
24
Model Construction
For example, for Los Angeles County wildfires:
• Relative Humidity, Windspeed, Precipitation, Aggregated rainfall over previous 60 days, Temperature, Date
• Tapered Pareto size distribution f, smooth spatial background .
(t,x,a) = 1exp{2R(t) + 3W(t) + 4P(t)+ 5A(t;60) + 6T(t) + 7[8 - D(t)]2} (x) g(a).
Estimating each of these components separately might be somewhat reasonable, as a first attempt at least, if the interactions are not too extreme.
25
r = 0.16(s
q m
)
26
Testing separability in marked point processes:
Construct non-separable and separable kernel estimates of by smoothing over all coordinates simultaneously or separately. Then compare these two estimates: (Schoenberg 2004)
May also consider:
S5 = mean absolute difference at the observed points.
S6 = maximum absolute difference at observed points.
27
28
S3 seems to be most powerful for large-scale non-separability:
29
However, S3 may not be ideal for Hawkes processes, and all these statistics are terrible for inhibition processes:
30
For Hawkes & inhibition processes, rescaling according to the separable estimate and then looking at the L-function seems much more powerful:
31
Los Angeles County Wildfire Example:
32
Statistics like S3 indicate separability, but the L-function after rescaling shows some clustering:
33
Summary:
1) MLE: maximize log L() = ∫oT∫S log(t,x; ) dN - ∫o
T∫S (t,x; ) dt dx.
2) Estimated K-function: S ∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n2;
L(h)^
= √[Kw(h)^
/π] - h
Kw(h)^
= S ∑∑i≠j wi wj I(|(xi ,yi) - (xj,yj)| ≤ h) / n2, where wi = (xi,yi)-1.
3) Residuals:
Integrated residuals: [N(Ai) - C(Ai)].Rescaled residuals [stretch one coordinate according to ∫ (x,y) d ].Thinned residuals [keep each pt with prob. b / (xi ,yi) ].
4) Separability: when one coordinate can be estimated individually. Convenient, and sometimes results in estimates similar to global MLEs.
5) A separability test is and an alternative is L(h) after rescaling according to the separable kernel intensity estimate.
Next time: applications to models for earthquakes and wildfires.
top related