estimation & inference for point processes

1. MLE

2. K-function & variants

3. Residual methods

4. Separable estimation

5. Separability tests

Estimation & Inference for

Point Processes

Maximum Likelihood Estimation:

For a space-time point process N, the log-likelihood function is given by

log L = ∫oT∫S log(t,x) dN - ∫o

T∫S (t,x) dt dx.

Why?-------------x-x-----------x-----------x----------x---x---------------------0 t1 t2 t3 t4 t5 t6 T

Consider the case where N is a Poisson process observed in time only.

L = P(points at t1,t2,t3,…, tn, and no others in [0,T])= P(pt at t1) x P(pt at t2) x … x P(pt at tn) x P{no others in [0,T]}= (t1) x (t2) x … x (tn) x P{no others in [0,t1)} x … x P{no others in [tn,T]} = (t1) x … x (tn) x

exp{-∫ot1 (u) du} x … x exp{-∫tn

T (u) du}

= (ti) x exp{-∫oT (u) du}.

So log L = ∑ log (ti) - ∫oT (u) du.

log L = ∫oT∫S log(t,x) dN - ∫o

T∫S (t,x) dt dx.

Here (t,x) is the conditional intensity. The case where the Papangelou intensity p(t,x) is used instead is called the pseudo-likelihood.

When depends on parameters , so does L:

log L() = ∫oT∫S log(t,x; ) dN - ∫o

T∫S (t,x; ) dt dx.

Maximum Likelihood Estimation (MLE):

Find the value of that maximizes L().

(In practice, by finding the value that minimizes -log L().)

•Example: stationary Poisson process with rate (t,x) = .

log L() = ∫oT∫S log(t,x) dN - ∫o

T∫S (t,x) dt dx

= n log( - ST

d logL()/d = n/ - ST

which = 0 when = n/ST.

Under somewhat general conditions, is consistent, asymptotically normal, and asymptotically efficient (see e.g. Ogata 1978, Rathbun 1994). Similarly for pseudo-likelihoods (Baddeley 2001).

Important counter-examples:

() = + t, () = exp{ + t}, (for < 0).

Other problems with MLE:

• Bias can be substantial. e.g. Matern I, = min{||(xi , yi) - (xj , yj)||}.

• Optimization is tricky: requires initial parameter estimate and a tolerance threshold; can fail to converge; can converge to local maximum, etc.

Nevertheless, MLE and pseudo-MLE are the only commonly-used methods for fitting point process models.

K-function & Variations:

Usual K-function, for spatial processes only (Ripley 1978):

• Assume the null hypothesis that N is stationary Poisson, with constant rate

• K(h) = 1/Expected # of pts within distance h of a given pt]

• Estimated via 1/ [∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n], where = n/S.

• Under the null hypothesis, K(h) = 1/h2] = h2.

• Higher K indicates more clustering; lower K indicates inhibition.

• Centered version, L(h) = √[K(h)/ - h.

L > 0 indicates clustering, L < 0 indicates inhibition.

• Version based on nearest-neighbors only (J-function):

J(h) ~ 1/Pr{nearest neighbor of a given point is within distance h}

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

K-function & Variations:

Weighted K-function (Baddeley, Møller and Waagepetersen 2002; Veen 2006):

• Null hypothesis is a general conditional intensity (x,y).

• Weight each point (xi,yj) by a factor of wi = (xi,yi)-1.

• Estimated K-function is S ∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n2;

Kw(h)^

= S ∑∑i≠j wi wj I(|(xi ,yi) - (xj,yj)| ≤ h) / n2,

where wi = (xi,yi)-1.

• Asymptotically normal, under certain regularity conditions (Veen 2006).

• Centered version: Lw(h)^

= √[Kw(h)^

/π] - h, for R2.

• Lw(h)^

> 0 indicates more weight in clusters within h than expected according

to the model for (x,y). ==> (x,y) is too low in clusters. That is, the model does not adequately capture clustering in the data.

• Lw(h)^

< 0 indicates less weight in pairs within h than expected according to

the model for (x,y). ==> (x,y) is too high, for points within distance h. The model over-estimates the clustering in the data (or underestimates inhibition).

These statistics can be used for estimation as well as testing:

Given a class of models with parameter to be estimated, choose the parameter that minimizes some distance between the observed estimate of K(h) and the theoretical function K(h; ) [Guan 2007].

Similarly for other statistics such as Kw(h)^

[Veen 2006].

Model(x,y;) = (x,y) + (1- ).

h (km)

3) How else can we tell how well a given pp model fits?

a) Likelihood statistics (LR, AIC, BIC).[For instance, AIC = -2 logL() + 2p.Overly simplistic. Not graphical.

b) Other tests TTT, Khamaladze (Andersen et al. 1993)

Cramèr-von Mises, K-S test (Heinrich 91)

Higher moment and spectral tests (Davies 77)

Integrated residual plots (Baddeley et al. 2005):

Plot: N(Ai) - C(Ai), over various areas Ai. Useful for the mean, but questionable power.

Fine-scale interactions not inspected. d) Rescaling, thinning (Meyer 1971; Schoenberg 1999, 2003)

For multi-dimensional point processes: ^

Stretch/compress one dimension according to , keeping others fixed.

^ Transformed process is Poisson with rate 1

iff. = almost everywhere.

Problems with multi-dimensional residual analysis:

* Irregular boundary, plotting.

* Points in transformed space can be hard to interpret.

* For highly clustered processes: boundary effects, loss of power.

Possible solutions: truncation, horizontal rescaling.

Thinning: Suppose inf (xi ,yi) = b.

Keep each point (xi ,yi) in original dataset

with probability b / (xi ,yi) .

Obtain a different residual process, same scale as data.

Can repeat many times --> many Poisson processes (but not quite independent!)

Conditional intensity (t, x1, …, xk; ): [e.g. x1=location, x2 = size.]

Separability for Point Processes: • Say is multiplicative in mark xj if

(t, x1, …, xk; ) = 0 j(t, xj; j) -j(t, x-j; -j),where x-j = (x1,…,xj-1, xj+1,…,xk), same for -j and -j

• If ~is multiplicative in xj

^ and if one of these holds, then

j, the partial MLE,= j, the MLE:• S -j(t, x-j; -j) d-j = , for all -j.

• S j(t, xj; j) dj = , for all j.

• S j(t, x; ) d = S j(t, xj; j) dj = , for all .

Individual Covariates:

• Suppose is multiplicative, and j(t,xj; j) = f1[X(t,xj); 1] f2[Y(t,xj); 2].

If H(x,y) = H1(x) H2(y), where for empirical d.f.s H,H1,H2,and if the

log-likelihood is differentiable w.r.t. 1,

then the partial MLE of 1 = MLE of 1.

(Note: not true for additive models!)

• Suppose is multiplicative and the jth component is additive: j(t,xj; j) = f1[X(t,xj); 1] + f2[Y(t,xj); 2].

If f1 and f2 are continuous and f2 is small: S f2(Y; 2)2 / f1(X;

~1) dp 0],

then the partial MLE 1 is consistent.

Impact

• Model building.

• Model evaluation / dimension reduction.

• Excluded variables.

Model Construction

For example, for Los Angeles County wildfires:

• Relative Humidity, Windspeed, Precipitation, Aggregated rainfall over previous 60 days, Temperature, Date

• Tapered Pareto size distribution f, smooth spatial background .

(t,x,a) = 1exp{2R(t) + 3W(t) + 4P(t)+ 5A(t;60) + 6T(t) + 7[8 - D(t)]2} (x) g(a).

Estimating each of these components separately might be somewhat reasonable, as a first attempt at least, if the interactions are not too extreme.

r = 0.16(s

Testing separability in marked point processes:

Construct non-separable and separable kernel estimates of by smoothing over all coordinates simultaneously or separately. Then compare these two estimates: (Schoenberg 2004)

May also consider:

S5 = mean absolute difference at the observed points.

S6 = maximum absolute difference at observed points.

S3 seems to be most powerful for large-scale non-separability:

However, S3 may not be ideal for Hawkes processes, and all these statistics are terrible for inhibition processes:

For Hawkes & inhibition processes, rescaling according to the separable estimate and then looking at the L-function seems much more powerful:

Los Angeles County Wildfire Example:

Statistics like S3 indicate separability, but the L-function after rescaling shows some clustering:

Summary:

1) MLE: maximize log L() = ∫oT∫S log(t,x; ) dN - ∫o

T∫S (t,x; ) dt dx.

2) Estimated K-function: S ∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n2;

= √[Kw(h)^

/π] - h

Kw(h)^

= S ∑∑i≠j wi wj I(|(xi ,yi) - (xj,yj)| ≤ h) / n2, where wi = (xi,yi)-1.

3) Residuals:

Integrated residuals: [N(Ai) - C(Ai)].Rescaled residuals [stretch one coordinate according to ∫ (x,y) d ].Thinned residuals [keep each pt with prob. b / (xi ,yi) ].

4) Separability: when one coordinate can be estimated individually. Convenient, and sometimes results in estimates similar to global MLEs.

5) A separability test is and an alternative is L(h) after rescaling according to the separable kernel intensity estimate.

Next time: applications to models for earthquakes and wildfires.

estimation & inference for point processes

Documents

multi-rate modeling, model inference, and estimation for...

statistical estimation of models of sequence evolution...

chapter 2 estimation, inference, and hypothesis … 2...

ii .2 statistical inference: sampling and estimation

package ‘sim.diffproc’sim.diffproc/sim.diffproc.pdf ·...

penalized sieve estimation and inference of semi

pycbc inference: a python-based parameter estimation

classical and bayesian inference - fil |...

lecture 12 statistical inference (estimation) point and...

chapter 5: introduction to inference: estimation and ... 5:...

mixed-effects biological models: estimation and inference

inference for determinantal point processes without ... ·...

polynomial spline estimation and inference of proportional...

statistical inference part ii point estimation

b4 estimation and inference - information engineering...

estimation and inference in threshold type regime...

estimation (point estimation) statistical inference for...

generative particle variational inference via estimation

characteristic-sorted portfolios: estimation and inference

bayesian estimation and § inference