combining direct spectral estimators:...

Combining Direct Spectral Estimators: I

• tapering introduced as means of creating SDF estimator S(d)(·)with potentially better bias properties than periodogram S(p)(·)• both S(d)(·) & S(p)(·) inherently noisy – smoothing across fre-

quencies leads to lag window estimator S(lw)(·)• large-sample approximation to variance of S(lw)(f ) points out

price of tapering, namely, inflation of variance by Ch > 1:

var {S(lw)(f )} ≈ ChS2(f )

BWN ∆t

− Ch = 1 if and only if rectangular taper used (i.e., no real ta-pering done, leading to periodogram & use of prewhitening)

− Ch ≈ 2 for Hanning data taper

SAPA2e–371 VIII–1

Combining Direct Spectral Estimators: II

• Q: can we compensate for increase in variance due to tapering?

• A: yes, by combining together different S(d)(·)’s formed usingsame time series

• two schemes

− multitapering: use multiple orthogonal tapers on {Xt}−Welch’s overlapped segment averaging (WOSA): use a single

taper multiple times on {Xt}• like lag window estimators, multitaper and WOSA estimators

reduce variance over that given by S(d)(·), but do so withoutextracting price similar to Ch

• will discuss multitaper estimators first and then WOSA

Multitaper Spectral Estimation: I

• tapering useful for S(·) with large dynamic range, but increases

variance of S(lw)(f ) by Ch > 1

• alternatives to S(lw)(·), i.e., smoothing S(d)(·), include prewhiten-ing (see Chapter 6), WOSA and multitapering (Thomson, 1982)

• why multitapering?

− works automatically on high dynamic range SDFs

− natural definition of resolution

− can tradeoff bias/variance/resolution easily

− for some processes, can argue

∗ superior to prewhitening (Thomson, 1990a)

∗ superior to WOSA (Bronez, 1992)

SAPA2e–375, 376, 377 VIII–3

Multitaper Spectral Estimation: II

− produces ‘S(d)(·)’ with EDOF ν = 2K, where K is thenumber of tapers used (2 to 10 typically; recall that widthof 95% CIs shrinks considerably as ν increases from 2 to 10)

− can get internal estimate of variance (‘jackknifing’)

− can handle mixed spectra (i.e., line components)

− extends naturally to irregularly sampled processes

SAPA2e–375, 376, 377 VIII–4

Widths of 95% CIs on dB Scale versus EDOF ν

100 101 102 103

Basics of Multitapering: I

• basic multitaper estimator is average of K direct spectral esti-mators:

S(mt)(f ) ≡ 1

K−1∑k=0

S(mt)k (f ),

S(mt)k (f ) ≡ ∆t

∣∣∣∣∣∣N−1∑t=0

hk,tXte−i2πft∆t

∣∣∣∣∣∣2

is called kth eigenspectrum and uses kth taper {hk,t} normal-

ized by∑t h

2k,t = 1

Basics of Multitapering: II

• generalization of interest later on: weighted multitaper estima-tor

S(wmt)(f ) ≡K−1∑k=0

dkS(mt)k (f ),

where weights dk are nonnegative and sum to unity

• basic multitaper estimator is a special case of the above: justset dk = 1/K

Basics of Multitapering: III

• spectral window for kth eigenspectrum:

Hk(f ) ≡ ∆t

∣∣∣∣∣∣N−1∑t=0

hk,te−i2πft∆t

∣∣∣∣∣∣2

• an eigenspectrum is just a direct spectral estimator, so

E{S(mt)k (f )} =

∫ fN

−fNHk(f − f ′)S(f ′) df ′

• for the basic multitaper estimator, we thus have

E{S(mt)(f )} =

∫ fN

−fNH(f−f ′)S(f ′) df ′ with H(f ) ≡ 1

K−1∑k=0

Hk(f )

SAPA2e–372, 373 VIII–8

Basics of Multitapering: IV

• corresponding result for weighted multitaper estimator is

E{S(wmt)(f )} =

∫ fN

−fNH(f−f ′)S(f ′) df ′ with H(f ) ≡

K−1∑k=0

dkHk(f )

• leakage for S(mt)(·) or S(wmt)(·) OK if Hk(·)’s all have smallsidelobes

• if all K eigenspectra are approximately unbiased, then multi-taper estimators will be also approximately so

Basics of Multitapering: V

• recall standard measure of effective bandwidth for direct spec-tral estimator:

BHk ≡ widtha {Hk(·)} =

(∫ fN−fNHk(f ) df

∫ fN−fNH2k(f ) df

=∆t∑N−1

τ=−(N−1)(h ? hk,τ )2

• for basic and weighted multitaper estimators, measures are

BH =∆t∑N−1

τ=−(N−1)

∑K−1k=0 h ? hk,τ

BH =∆t∑N−1

τ=−(N−1)

(∑K−1k=0 dkh ? hk,τ

SAPA2e–372, 373 VIII–10

Basics of Multitapering: VI

• rationale for averaging eigenspectra is to produce estimator ofS(f ) with variance smaller than any single eigenspectrum

• making use of Exercise [2.1e] (Problem 4), have

var {S(mt)(f )} =1

K−1∑k=0

var {S(mt)k (f )} +

∑j<k

cov {S(mt)j (f ), S

(mt)k (f )}

• assume S(·) locally constant about f

• for j 6= k, can argue cov {S(mt)j (f ), S

(mt)k (f )} ≈ 0 if

N−1∑t=0

hj,thk,t = 0, i.e., if tapers are orthogonal

• var {S(mt)k (f )} ≈ S2(f ) =⇒ var {S(mt)(f )} ≈ S2(f )/K

SAPA2e–373, 374 VIII–11

Basics of Multitapering: VII

• corresponding result for weighted multitaper estimator is

var {S(wmt)(f )} ≈ S2(f )

K−1∑k=0

which reduces to

var {S(mt)(f )} ≈ S2(f )

by setting dk = 1/K

Basics of Multitapering: VIII

• to determine approximate distribution of S(mt)(f ), note that,since each eigenspectrum is a direct spectral estimator,

S(mt)k (f )

d= S(f )χ2

2/2 for 0 < f < fNasymptotically as N →∞• factoid: if χ2

ν1& χ2

ν2are independent chi-square RVs with ν1

& ν2 DOFs, then χ2ν1

+χ2ν2

is chi-square RV with ν1 + ν2 DOF

• assuming mutual independence, implies

S(mt)(f )d= S(f )χ2

2K/2K for 0 < f < fNasymptotically as N →∞

SAPA2e–374, 375 VIII–13

Basics of Multitapering: IX

• corresponding result for weighted multitaper estimator:

S(mt)(f )d= S(f )χ2

ν/ν for 0 < f < fNasymptotically as N →∞, where

ν =2∑K−1

k=0 d2k

(note that ν = 2K when dk = 1/K)

• two sets of orthogonal tapers in common use

− Slepian (DPSS) tapers (Thomson, 1982)

− sinusoidal tapers (Riedel and Sidorenko, 1995)

• both are termed constructive multitaper estimators (as op-posed to deconstructed weighted multitaper estimators to bediscussed later on)

Slepian Multitapers: I

• tapers minimize spectral window sidelobes

• for fixed design bandwidth 2W , measure sidelobes via

β2k(W ) ≡

∫W−W Hk(f ) df∫ fN−fNHk(f ) df

• for given W and N , {h0,t} maximizes β20(W )

• {hk,t} maximizes β2k(W ) amongst sequences orthogonal to

{h0,t}, {h1,t}, . . ., {hk−1,t}

Slepian Multitapers: II

• computation of {hk,t}’s requires solution of Ahk = β2k(W )hk,

where hTk =[hk,0, · · · , hk,N−1

At′,t =sin(2πW (t′ − t))

π(t′ − t)is (t′, t)th element of N ×N matrix A

• can solve using inverse iteration (stable, but slow), numericalintegration (Thomson, 1982) or tridiagonal formulation (fast!)

Slepian Multitapers: III

• number of {hk,t}’s with good leakage protection is 2NW ∆t−1or less

• strategy & considerations for picking K

− set design half-bandwidthW as multiple J of spacing 1/N ∆tbetween Fourier frequencies, where J = 2, 3, 4, . . .

−W = J/N ∆t expressed as NW∆t = J (or just NW = J)

− set K < 2NW ∆t = 2J , noting that, as K increases, vari-ance decreases, but leakage gets worse

− increasing W implies a decrease in resolution, but gain moreleakage-free tapers to work with (i.e., can increase K)

• following plots use NW = 4 (i.e, 2NW = 8), for which Kshould be no greater than 7, but eighth taper {h7,t} also shown

SAPA2e–378, 379 VIII–17

AR(4) Series, NW = 4 Slepian {h0,t} & Tapered Series

−0.08

0 512 1024t

−0.2

−0.1

−0.08

0 512 1024t

−0.2

−0.1

−0.08

0 512 1024t

−0.2

−0.1

NW = 4 Slepian H0(·), S(mt)0 (·), H(·) and S(mt)(·), K = 1

0.00 0.01f

0.0 0.5f

SAPA2e–381, 384 VIII–21

0.00 0.01f

0.0 0.5f

SAPA2e–381, 384 VIII–22

0.00 0.01f

0.0 0.5f

SAPA2e–383, 385 VIII–23

Sinusoidal Multitapers: I

• recall notion of smoothing window bias:

bW ≡∫ fN

−fNWm(f − f ′)S(f ′) df ′ − S(f )

≈ S′′(f )

∫ fN

−fNφ2Wm(φ) dφ =

S′′(f )

24β2W

(part of criterion used to derive Papoulis lag window)

• recall notion of spectral window bias (Exer. [211]):

b(f ) ≡ E{S(d)(f )} − S(f )

≈ S′′(f )

∫ fN

−fNφ2H(φ) dφ ≡ S′′(f )

24β2H

Sinusoidal Multitapers: II

• for given N , {h0,t} minimizes β2H0

• {hk,t} minimizes β2Hk amongst sequences orthogonal to

{h0,t}, {h1,t}, . . . , {hk−1,t}

• scheme due to Riedel & Sidorenko (1995), who actually workedwith continuous parameter processes (similiar to Papoulis, 1973)

• can approximate solutions well using

hk,t =

{(k + 1)π(t + 1)

}, t = 0, 1 . . . , N−1,

which is very easy to compute!

SAPA2e–414, 415 VIII–25

Sinusoidal Multitapers: III

• all {hk,t}’s give moderate leakage protection (like Hanning)

• strategy & considerations for picking K

− bandwidth BH ≈ (K+1)/(N+1) increases with K (dashed& solid red lines show BH/2 & (K + 1)/(2N + 2) on plots)

− leakage relatively unchanged as K increases

− can trade off variance and resolution by increasing K, whichdecreases variance and resolution (i.e., increases bandwidth)

• sinusoidal tapers vs. Slepian tapers

− 1 parameter (K) vs. 2 parameters (2W & K)

− moderate vs. adjustable leakage protection

− juggle resolution/variance vs. leakage/resolution/variance

− simple expression vs. need software to compute

AR(4) Series, Sinusoidal {h0,t} & Tapered Series

−0.05

0 512 1024t

−0.2

−0.1

−0.05

0 512 1024t

−0.2

−0.1

−0.05

0 512 1024t

−0.2

−0.1

Sinusoidal H0(·), S(mt)0 (·), H(·) and S(mt)(·), K = 1

0.00 0.01f

0.0 0.5f

VIII–30

0.00 0.01f

0.0 0.5f

VIII–31

0.00 0.01f

0.0 0.5f

VIII–32

Recovery of ‘Lost Information”: I

• Slepian tapers are solutions to Ahk = β2k(W )hk

• N orthonormal solutions h0, . . . , hN−1

• can order via eigenvalues (concentration measure):

1 > β20(W ) > β2

1(W ) > · · · > β2N−1(W ) > 0

• only first K < 2NW ∆t have β2k(W ) ≈ 1

• form V =[h0, h1, . . . , hN−1

]• V TV = IN restates orthonormality, where IN is N ×N iden-

tity matrix:N−1∑t=0

hj,thk,t =

{1, j = k;

0, j 6= k

Recovery of ‘Lost Information’: II

• since V T = V −1, also have V V T = IN , yielding

N−1∑k=0

hk,thk,t′ =

{1, t = t′;

0, t 6= t′.

• thus have

N−1∑k=0

N−1∑t=0

(hk,tXt

N−1∑t=0

N−1∑k=0

h2k,t︸︷︷︸

N−1∑t=0

because (∗) – the energy across tapers – is unity

• following figures shows relative influence of Xt’s by plotting∑K−1k=0 h2

k,t versus t for K = 1, . . . , 8 (here NW ∆t = 4)

Decomposition of Slepian Taper Energy for K = 1

0 256 512 768 1024

Multitapering of White Noise: I

• assume X0, . . . , XN−1 is zero mean Gaussian white noise withunknown variance s0 and SDF S(f ) = s0 (here we set ∆t tounity for convenience)

• by several criteria, best estimate of s0 is∑N−1t=0 X2

t /N = s(p)0

• implies best estimate of S(f ) is s(p)0

• can obtain best estimator from S(p)(·) via∫ 1/2

−1/2S(p)(f ) df = s

(p)0 ;

i.e., ‘smoothing’ with Wm(f ) = 1

• Equation (392) says var {s(p)0 } = 2s2

Multitapering of White Noise: II

• let S(d)(·) be direct spectral estimator based upon {h0,t}

• smoothing S(d)(·) with Wm(f ) = 1 yields∫ 1/2

−1/2S(d)(f ) df =

N−1∑t=0

h20,tX

2t = s

(d)0 .

• since var {X2t } = 2s2

0 (Isserlis – Equation (32a)), have

var {s(d)0 } =

N−1∑t=0

var {h20,tX

2t } = 2s2

N−1∑t=0

h40,t = 2s2

0Ch/N,

where, as before, Ch ≡ N∑N−1t=0 h4

Multitapering of White Noise: III

• use Cauchy inequality∣∣∣∣N−1∑t=0

∣∣∣∣2 ≤ N−1∑t=0

|at|2N−1∑t=0

|bt|2 ,

with at = h20,t and bt = 1 to argue

∑N−1t=0 h4

0,t ≥ 1/N ; i.e.,

Ch ≥ 1, with equality if and only if h0,t = 1/√N

• can conclude

var {s(d)0 } = 2s2

0Ch/N > 2s20/N = var {s(p)

0 }for any nonrectangular taper

Multitapering of White Noise: IV

• claim: multitapering reclaims best estimator

• let {h0,t}, {h1,t}, . . . , {hN−1,t} be orthonormal

• let V be the N ×N matrix given by

h0,0 h1,0 . . . hN−1,0

h0,1 h1,1 . . . hN−1,1... ... . . . ...

h0,N−1 h1,N−1 . . . hN−1,N−1

• orthonormality says V T V = IN & hence V V T = IN

• kth eigenspectrum:

S(mt)k (f ) ≡

∣∣∣∣∣∣N−1∑t=0

hk,tXte−i2πft

∣∣∣∣∣∣2

Multitapering of White Noise: V

• form S(mt)(·) by averaging all S(mt)k (·)’s:

S(mt)(f ) ≡ 1

N−1∑k=0

S(mt)k (f )

N−1∑k=0

N−1∑t=0

hk,tXte−i2πft

N−1∑u=0

hk,uXuei2πfu

N−1∑t=0

N−1∑u=0

N−1∑k=0

hk,thk,u

︸︷︷︸

1 if t = u; 0 if t 6= u

e−i2πf (t−u)

N−1∑t=0

X2t = s

Multitapering of White Noise: VI

• note: holds for any set of orthonormal tapers!

• as K increases, can study rate of decay

var {S(mt)(f )} = var

K−1∑k=0

S(mt)k (f )

K−1∑j=0

K−1∑k=0

cov {S(mt)j (f ), S

(mt)k (f )}

• Exercise [8.8b] indicates how to compute this for white noise

SAPA2e–394, 395 VIII–43

Multitapering of White Noise: VII

• following plot shows example for f = 1/4 using Slepian tapers

− N = 64; NW = 4; s0 = 1; S(f ) = 1

− thick curve: var {S(mt)(1/4)} vs. K

∗K = 1: var {S(mt)(1/4)} = S2(f ) = 1

∗K = N : var {S(mt)(1/4)} = 2/N.= 0.03

∗ curve agrees with these values

− thin curve: computed assuming

cov {S(mt)j (f ), S

(mt)k (f )} = 0 when j 6= k

• dashed vertical line marks Shannon number 2NW = 8

• two curves agree closely for K ≤ 2NW

• variance decreases slowly for K > 2NW (bias then can be badfor nonwhite processes)

SAPA2e–394, 395 VIII–44

var {S(mt)(1/4)} versus Number K of Eigenspectra

0 16 32 48 64

10−2

10−1

Multitapering of White Noise: VIII

• following plot shows 2nd example, which differs from 1st onlyin that NW = 16 rather than 4

• two examples similar in some aspects, but differ in others

− NW = 4 achieves minimum only when K = 64, whereasNW = 16 achieves minimum at both K = 32 and K = 64

− NW = 4 after Shannon number has smaller values, whereasNW = 16 after Shannon has larger values (except K = 64)

• 2nd example shows that, if U0, U1, . . . are correlated RVs withcommon mean µ & common variance, variance of sample mean

µN =1

N−1∑n=0

can actually increase as N increases!

SAPA2e–394, 395 VIII–46

0 16 32 48 64

10−2

10−1

Multitapering of White Noise: IX

• following plot shows 3rd example, which differs from 1st & 2ndonly in that sinusoidal tapers are used rather than Slepian

• after accounting for numerical precision, R declares that, atall K, var {S(mt)(1/4)} is identical for Slepian tapers withNW = 16 and sinusoidal tapers!

• Q: why?

• A: hmmm . . .

0 16 32 48 64

10−2

10−1

Quadratic Spectral Estimators: I

• provides important motivation for multitapering

• let X0, . . . , XN−1 be portion of real-valued stationary processwith zero mean 0, SDF S(·) and ACVS {sτ}• for fixed f , define Zt ≡ Xte

i2πft∆t

• Exercise [5.13a]: {Zt} stationary with

SZ(f ′) = S(f − f ′) and sZ,τ = sτei2πfτ ∆t

(recall that S(·) is periodic with a period of 2fN )

• note: SZ(0) = S(f ), so can estimate S(f ) by estimating SZ(·)at f = 0

SAPA2e–395, 396, 177 VIII–50

Quadratic Spectral Estimators: II

• let Z be vector with tth element Zt

• let ZH be its Hermitian transpose:

ZH ≡[Z∗0 , . . . , Z

∗N−1

]note: if A real-valued matrix, then AH = AT

• since XtXt′∆t has same units as S(f ), consider

S(q)(f ) ≡ S(q)Z (0) ≡ ∆t

N−1∑s=0

N−1∑t=0

Z∗sQs,tZt = ∆tZHQZ,

where Qs,t is (s, t)th element of weight matrix Q

• S(q)(f ) called quadratic spectral estimator

Quadratic Spectral Estimators: III

• assumptions about N ×N matrix Q:

− Qs,t is real-valued

− Q is symmetric; i.e., Qs,t = Qt,s− Qs,t does not depend on {Zt}

• if Q positive semidefinite (PSD), then S(q)(f ) ≥ 0, which isobviously a desirable property in view of S(f ) ≥ 0

• three examples of quadratic estimators

Quadratic Spectral Estimators: IV

1. lag window estimator (need not be PSD):

S(lw)(f ) ≡ ∆t

N−1∑τ=−(N−1)

wm,τ s(d)τ e−i2πfτ ∆t

= ∆t

N−1∑s=0

N−1∑t=0

wm,t−shsXshtXte−i2πf (s−t) ∆t

= ∆tN−1∑s=0

N−1∑t=0

Z∗s hswm,s−tht︸︷︷︸= Qs,t

Quadratic Spectral Estimators: V

2. direct spectral estimator (always PSD – special case of S(lw)(f )with wm,τ = 1):

S(d)(f ) ≡ ∆t

∣∣∣∣∣∣N−1∑t=0

htXte−i2πft∆t

∣∣∣∣∣∣2

= ∆t

N−1∑s=0

N−1∑t=0

Z∗s hsht︸︷︷︸= Qs,t

3. basic multitaper estimator (always PSD):

S(mt)(f ) ≡ ∆t

K−1∑k=0

∣∣∣∣∣∣N−1∑t=0

∣∣∣∣∣∣2

= ∆t

N−1∑s=0

N−1∑t=0

K−1∑k=0

hk,shk,tK︸︷︷︸

= Qs,t

SAPA2e–397, 372 VIII–54

Quadratic Spectral Estimators: VI

• goal: set Q so S(q)(·) unbiased and has small variance

• to get S(q)(f ) ≥ 0, assume Q is PSD: let K = rank of Q, andassume 1 ≤ K ≤ N (rules out K = 0, which is not of interest)

• matrix theory: there exists an N ×N orthonormal matrix HNsuch that HT

NQHN = DN , where DN is a diagonal matrixwith diagonal elements

d0 ≥ d1 ≥ · · · ≥ dK−1 > 0 and dK = · · · = dN−1 = 0

with each dk being an eigenvalue of Q

• Exercise [397a]: as a result of the above, can write

Q = HDKHT ,

where H is N×K matrix (the 1st K columns of HN ), and DKis a diagonal matrix with diagonal entries d0, d1, . . . , dK−1

Quadratic Spectral Estimators: VII

• Exercise [397b]: substituting HDKHT for Q in

S(q)(f ) = ∆tZHQZ

yields

S(q)(f ) = ∆tZHHDKHTZ = ∆t

K−1∑k=0

∣∣∣∣∣∣N−1∑t=0

∣∣∣∣∣∣2

where hk,t is the (k, t)th element of H

• reusing definition S(mt)k (f ) = ∆t

∣∣∣∑N−1t=0 hk,tXte

−i2πft∆t∣∣∣2,

can write

S(q)(f ) =

K−1∑k=0

dkS(mt)k (f ) = S(wmt)(f )

SAPA2e–397, 398 VIII–56

Quadratic Spectral Estimators: VIII

• conclusion: can write any PSD quadratic estimator

S(q)(f ) = ∆tZHQZ

with weight matrix Q of rank K as a weighted multitaper es-timator with

− weights dk given by positive eigenvalues of Q

− tapers {hk,t} given by associated eigenvectors of Q (tapersare mutually orthonormal)

• S(q)(f ) formulated as S(wmt)(f ) termed a deconstructed mul-titaper estimator

• opposing concept is constructive multitaper estimator for whichtapers are formulated explicitly (e.g., Slepian or sinusoidal)

Deconstructed Multitaper Spectral Estimators: I

• as an example of a deconstructed multitaper spectral estimator,consider an m = 179 Parzen lag window estimator used with adirect spectral estimator employing a Hanning data taper

• following plot shows estimate for AR(4) time series of Fig-ure 36(e)

Parzen S(lw)(·) with Hanning S(d)(·) & AR(4) SDF

0.0 0.5f

m = 179

Deconstructed Multitaper Spectral Estimators: II

• weight matrix Q for m = 179 Parzen lag window estimator isof full rank so K = N , but, as shown in next plot, eigenval-ues decay rapidly to zero, implying that effective rank is muchsmaller

Eigenvalues of Q for Lag Window Estimator

●●

●● ● ● ● ● ● ● ●

0 5 10 15

Deconstructed Multitaper Spectral Estimators: III

• following plots show first seven eigenvectors of Q – these serveas data tapers in weighted multitaper representation for lagwindow estimator

SAPA2e–435, 437, 438 VIII–62

AR(4) Series, Deconstructed {h0,t} & Tapered Series

−0.05

0 512 1024t

−0.2

−0.1

−0.05

0 512 1024t

−0.2

−0.1

−0.05

0 512 1024t

−0.2

−0.1

Deconstructed Multitaper Spectral Estimators: IV

• following plots show eigenspectra based on seven data tapers,along with weighted multitaper estimates of increasing order K

• weights dk determined by eigenvalues dk, but renormalized foreach K to sum to unity:

dk =dk∑K−1

k′=0dk′, k = 0, . . . , K − 1

(renormalization required because∑N−1k=0 dk = 1)

• plots also show spectral windows for eigenspectra and weightedmultitaper estimates, with vertical red dashed lines indicatingstandard bandwidth measures (either BHk or BH)

SAPA2e–435, 437, 438 VIII–66

Deconstructed H0(·), S(mt)0 (·), H(·) and S(wmt)(·), K = 1

0.00 0.01f

0.0 0.5f

VIII–67

0.00 0.01f

0.0 0.5f

VIII–68

0.00 0.01f

0.0 0.5f

Deconstructed Multitaper Spectral Estimators: V

• because eigenvalues decrease to zero rapidly, K = 7 multitaperestimate is a reasonable approximation to lag window estimate,as next plot shows

SAPA2e–437, 438 VIII–70

Parzen S(lw)(·) with K = 7 S(wmt)(·) Approximation

0.0 0.5f

SAPA2e–437, 327 VIII–71

Deconstructed Multitaper Spectral Estimators: VI

• lag window estimators are sometimes well approximated by low-order weighted multitaper estimators, suggesting that

‘lag-windowing and multiple-data-windowing are roughlyequivalent for smooth spectrum estimation’

(title of article by McCloud et al.,1999)

Quadratic Spectral Estimators: XI

• Q: what conditions on Q ensure S(q)(f ) has good bias andvariance properties?

• let’s consider line of thought leading to Slepian tapers (Bronez,1985)

First Moment of S(q)(·): I

• since S(q)(·) has a weighted multitaper representation, an ap-peal to results stated for the latter says that

E{S(q)(f )} =

∫ fN

−fNH(f − f ′)S(f ′) df ′,

H(f ) ≡K−1∑k=0

dkHk(f ) with Hk(f ) ≡ ∆t

∣∣∣∣∣∣N−1∑t=0

hk,te−i2πft∆t

∣∣∣∣∣∣2

First Moment of S(q)(·): II

• Exercise [8.12] gives equivalent ‘time domain’ expression:

E{S(q)(f )} = ∆t tr {QΣZ} = ∆t tr {HDKHTΣZ} = ∆t tr {DKHTΣZH},where tr = trace and ΣZ = covariance matrix for Zt’s,

− last equation above uses following result: if A and B havedimensions M ×N and N ×M , then tr {AB} = tr {BA}

• equating frequency and time domain expressions yields∫ fN

−fNH(f − f ′)S(f ′) df ′ = ∆t tr {DKHTΣZH}

First Moment of S(q)(·): III

• for general {Xt}, can get handle on first moment by incorpo-rating notion of resolution (key idea!)

• given resolution bandwidth 2W > 0, seek Q’s so

E{S(q)(f )} ≈ 1

∫ f+W

f−WS(f ′) df ′ ≡ S(f ),

i.e., no longer seek E{S(q)(f )} ≈ S(f )

• rationale

− ‘regularizes’ SDF estimation problem: S(·) smooth to somedegree, whereas S(·) need not be

− incorporates filter bandwidth in filtering interpretation ofS(·) (Section 5.6)

SAPA2e–398, 399 VIII–76

First Moment of S(q)(·): IV

• strategy

− set resolution bandwidth 2W appropriately

− optimize bias/variance within limitations imposed by choiceof 2W

• basically we are giving up finest possible resolution of 1/N ∆tto get handle on bias/variance

First Moment of S(q)(·): V

• definition for bias that incorporates notion of resolution is

b{S(q)(f )} = E{S(q)(f )} − S(f ) = E{S(q)(f )} − 1

∫ f+W

f−WS(f ′) df ′

• since

E{S(q)(f )} =

∫ fN

−fNH(f − f ′)S(f ′) df ′,

insisting upon b{S(q)(f )} = 0 would require

H(f ′) =

{1/(2W ), |f ′| ≤ W ;

0, W < |f ′| ≤ fN

• H(·) is sum of functions Hk(·) that are based on Fourier trans-

forms of index-limited sequences – hence H(·) cannot be exactlyband-limited

First Moment of S(q)(·): VI

• game plan

− insist that S(q)(f ) be unbiased for white noise

− break up bias into two components (yet to be defined, butwe’ll end up calling them ‘broad-band bias’ and ‘local bias’)

− develop bounds for both components for colored noise

Unbiasedness for White Noise: I

• for white noise S(f ) = s0 ∆t for all f , so

S(f ) =1

∫ f+W

f−WS(f ′) df ′ =

∫ f+W

f−Ws0 ∆t df ′ = s0 ∆t

• since

E{S(q)(f )} =

∫ fN

−fNH(f − f ′)S(f ′) df ′ = s0 ∆t

∫ fN

−fNH(f − f ′) df ′,

have unbiasedness if ∫ fN

−fNH(f ′) df ′ = 1

Unbiasedness for White Noise: II

• noting that∫ fN

−fNH(f ′) df ′ =

∫ fN

K−1∑k=0

dkHk(f ′) df ′ =

K−1∑k=0

because ∫ fN

−fNHk(f ′) df ′ =

N−1∑t=0

h2k,t = 1,

requirement for unbiasedness is

K−1∑k=0

dk = 1

SAPA2e–399, 400 VIII–81

Unbiasedness for White Noise: III

• alternative approach: since ΣZ = s0IN for white noise, can use

E{S(q)(f )} = ∆t tr {DKHTΣZH}= s0 ∆t tr {DKHTH}= s0 ∆t tr {DK}

because columns {hk,t} of H are orthonormal

• use of the fact that

tr {DK} =

K−1∑k=0

leads to the same requirement for white noise unbiasedness:K−1∑k=0

dk = 1

Broad-Band & Local Bias: I

• using new notion of bias, have

b{S(q)(f )} ≡ E{S(q)(f )} − S(f )

∫ fN

−fNH(f − f ′)S(f ′) df ′ − 1

∫ f+W

f−WS(f ′) df ′

∫ f+W

[H(f − f ′)− 1

]S(f ′) df ′

∫f ′ 6∈[f−W,f+W ]

H(f − f ′)S(f ′) df ′

≡ b(l){S(q)(f )}︸︷︷︸local bias

+ b(b){S(q)(f )}︸︷︷︸broad-band bias

• to bound bias terms, assume S(·) bounded by Smax; i.e., S(f ) ≤Smax <∞ for all f

Broad-Band and Local Bias: II

• bound on magnitude of local bias:∣∣∣b(l){S(q)(f )}∣∣∣ =

∣∣∣∣∣∫ f+W

[H(f − f ′)− 1

]S(f ′) df ′

∣∣∣∣∣≤∫ f+W

∣∣∣∣H(f − f ′)− 1

∣∣∣∣S(f ′) df ′

≤ Smax

∣∣∣∣H(f ′′)− 1

∣∣∣∣ df ′′;integral gives useful measure of local bias

• local bias small if H(f ) ≈ 1/2W over [−W,W ]

Broad-Band and Local Bias: III

• bound on broad-band bias (must be positive!):

b(b){S(q)(f )} =

∫f ′ 6∈[f−W,f+W ]

H(f − f ′)S(f ′) df ′

≤ Smax

∫f ′ 6∈[f−W,f+W ]

H(f − f ′) df ′

= Smax

∫f 6∈[−W,W ]

H(f ′′) df ′′

= Smax

(∫ fN

−fNH(f ′′) df ′′ −

−WH(f ′′) df ′′

)= Smax

(tr {DK} − tr {DKHTΣ(bl)H}

where Σ(bl) arises from the following argument

Broad-Band and Local Bias: IV

• let {Xt} be band-limited white noise; i.e., has SDF and ACVS

S(bl)(f ) ≡

{∆t, |f | ≤ W ;

0, W < |f | ≤ fN ,and s

(bl)τ ≡

{2W ∆t, τ = 0;sin (2πWτ ∆t)

πτ τ 6= 0.

• for this SDF (and letting f = 0 so Zt = Xt), have

E{S(q)(0)} =

∫ fN

−fNH(0− f ′)S(bl)(f ′) df ′

= ∆t

−WH(f ′′) df ′′ = ∆t tr {DKHTΣ(bl)H},

where (j, k)th element of Σ(bl) is s(bl)j−k

Minimizing Broad-Band Bias Measure: I

• measure of broad-band bias (similar concept to leakage) is thus

tr {DK} − tr {DKHTΣ(bl)H}

• insisting on tr {DK} = 1 ensures unbiasedness for white noise

• to minimize broad-band bias under this restriction,

maximize tr {DKHTΣ(bl)H} subject to tr {DK} = 1

SAPA2e–401, 402 VIII–87

Minimizing Broad-Band Bias Measure: II

• Exercsie [8.13] gives solution:

− set K = 1

−H = h0 is normalized eigenvector associated with largesteigenvalue λ0(N,W ) of Σ(bl)

− eigenvector is zeroth order Slepian sequence (technically: fi-nite subsequence of DPSS)

− broad-band bias measure = 1−λ0(N,W ), where λ0(N,W )is the concentration ratio (Exercise [402])

• solution conflicts with variance in white noise case: as K in-creases, variance decreases

• reasonable balance: use K orthonormal Slepian tapers

SAPA2e–402, 403 VIII–88

Managing Bias and Variance

• broad-band bias: Exercise [8.14] says this can be measured by

1− 1

K−1∑k=0

λk(N,W );

recall that λk(N,W ) is close to unity as long as K < 2NW ∆t

• variance: previous argument says

S(mt)(f )d=S(f )

approximately if S(·) not rapidly varying over [f −W, f +W ];

thus have var {S(mt)(f )} ≈ S2(f )/K

• local bias: small if H(f ) ≈ 12W over [−W,W ]; as following

figures show, decreases as K increases; here NW = 4 withN = 1024 for which 1

2W = N8 = 128

.= 21 dB

0.00 0.01f

0.0 0.5f

SAPA2e–381, 384 VIII–90

0.00 0.01f

0.0 0.5f

SAPA2e–381, 384 VIII–91

0.00 0.01f

0.0 0.5f

SAPA2e–383, 385 VIII–92

Adaptive Multitaper Estimation: I

• Section 8.5 discusses data-adaptive refinement to basic multi-tapering (developed for Slepian tapers)

• idea: weight eigenspectra adaptively according to need for leak-age suppression at each f

− if S(f ) relatively large, leakage not a concern, so can makeK large

− if S(f ) relatively small, leakage might be a concern, so shouldmake K small

Adaptive Multitaper Estimation: II

• adaptive multitaper estimator given by

S(amt)(f ) ≡∑K−1k=0 b2k(f )λkS

(mt)k (f )∑K−1

k=0 b2k(f )λk

where λk ≈ 1− 1/10j (with j ↓ as k ↑) &

bk(f ) =1

λk + (1− λk)s0 ∆tS(f )

1 + s0 ∆t10jS(f )

− λk’s downweight higher eigenspectra (slightly)

− s0 ∆t = average value of S(·)− bk(f ) small if 10jS(f )� s0 ∆t & large if 10jS(f )� s0 ∆t

• determine bk(f ) using preliminary estimate of S(·); can iterateto refine bk(f )’s if desired

Adaptive Multitaper Estimation: III

• assume

− S(mt)k (f )

d= S(f )χ2

2/2 for each eigenspectrum

− S(mt)k (f )’s are pairwise uncorrelated

• as before, assume S(amt)(f )d= aχ2

• EDOF argument similar to S(lw)(·) and S (WOSA)(·) yields

ν =2(E{S(amt)(f )}

var {S(amt)(f )}≈

2(∑K−1

k=0 b2k(f )λk

∑K−1k=0 b4k(f )λ2

Motivation for WOSA: I

• letX0, . . . , XN−1 be sample of zero mean Gaussian white noise

• partition into NB blocks of size NS = N/NB:

X0, . . . , XNS−1; XNS, . . . , X2NS−1; . . . ; X(NB−1)NS, . . . , XN−1

b = 0 b = 1 b = 2 b = 3

0 256 512 768 1024

Motivation for WOSA: II

• form periodograms for blocks b = 0, . . . , NB − 1:

S(p)b (fk) ≡ ∆t

∣∣∣∣∣∣NS−1∑t=0

XbNS+te−i2πfkt∆t

∣∣∣∣∣∣2

0.0 0.5f

Motivation for WOSA: III

• form average of NB periodograms:

S(p)(fk) ≡ 1

NB−1∑b=0

S(p)b (fk)

0.0 0.5f

Motivation for WOSA: IV

• S(p)b (fk)

d= S(fk)χ2

2/2 for 0 < fk < fN

• S(p)b (fk) independent of S

(p)b′ (fk) for b′ 6= b

• χ2ν = Y 2

0 + Y 21 + · · · + Y 2

ν−1 for IID N(0, 1) RVs Yj implies

χ2ν1

+ χ2ν2

= χ2ν1+ν2

if χ2ν1

and χ2ν2

are independent

• with ν ≡ 2NB, have

NB−1∑b=0

S(p)b (fk)

d=S(fk)

2NB=⇒ S(p)(fk) =

NB−1∑b=0

S(p)b (fk)

d=S(fk)

νχ2ν

• note similarity to S(lw)(f )d= S(f )χ2

Definition of WOSA: I

•Welch’s overlapped segment averaging (WOSA)

− generalization applicable to other {Xt}’s− break time series into NB blocks

∗ each block has NS points

∗ blocks now allowed to overlap

− apply data taper {h0, . . . , hNS−1} to each block

− form direct spectral estimator for each block

− average NB estimators together

SAPA2e–427, 428 VIII–100

Definition of WOSA: II

• for 0 ≤ l ≤ N −NS, let

S(d)l (f ) ≡ ∆t

∣∣∣∣∣∣NS−1∑t=0

htXt+le−i2πft∆t

∣∣∣∣∣∣2

•WOSA spectral estimator given by

S (WOSA)(f ) ≡ 1

NB−1∑j=0

S(d)jn (f ),

where n is integer such that n(NB − 1) = N −NS• overlapping recovers ‘information’ lost in tapering (not obvi-

ously useful if ht = 1/√NS, i.e., S

(d)l (f ) = S

(p)l (f ))

• plots show example N = 100, NS = 32, NB = 5 and n = 17

SAPA2e–427, 428 VIII–101

Block j = 4 Before and After Tapering

0 50 100t

First Moment Properties of WOSA: I

• first moment:

E{S (WOSA)(f )} =1

NB−1∑j=0

E{S(d)jn (f )};

however,

E{S(d)jn (f )} =

∫ fN

−fNH(f − f ′)S(f ′) df ′

is the same for all j, where H(f ) ≡ |H(f )|2 is the spectralwindow associated with

{h0, . . . , hNS−1} ←→ H(·)

SAPA2e–428, 429 VIII–103

First Moment Properties of WOSA: II

• thus have

E{S (WOSA)(f )} =

∫ fN

−fNH(f − f ′)S(f ′) df ′

• note that this expected value

− depends on just NS and data taper

− does not depend on N , NB or n

− can be biased if we make NS too small

SAPA2e–428, 429 VIII–104

Variance of WOSA Estimator: I

• assume E{S (WOSA)(f )} ≈ S(f )

• variance is given by

var {S (WOSA)(f )} = cov

NB−1∑j=0

S(d)jn (f ),

NB−1∑k=0

S(d)kn (f ),

NB−1∑j=0

var {S(d)jn (f )}

∑j<k

cov {S(d)jn (f ), S

(d)kn (f )}

• for 0 < f < fN can use var {S(d)jn (f )} ≈ S2(f )

Variance of WOSA Estimator: II

• assume:

− S(·) is locally constant

− f is not too close to 0 or fN− ht = 0 for t ≥ NS

• can argue (Exercise [8.25]):

cov {S(d)jn (f ), S

(d)kn (f )} ≈ S2(f )

∣∣∣∣∣∣NS−1∑t=0

htht+|k−j|n

∣∣∣∣∣∣2

i.e., depends on autocorrelation of {ht}

Variance of WOSA Estimator: III

• leads to the useful expression

var {S (WOSA)(f )} ≈ S2(f )

∑j<k

∣∣∣∣∣∣NS−1∑t=0

htht+|k−j|n

∣∣∣∣∣∣2

=S2(f )

NB−1∑m=1

(1− m

) ∣∣∣∣∣∣NS−1∑t=0

htht+mn

∣∣∣∣∣∣2

via a ‘diagonalization’ argument

Distribution of WOSA Estimator

• assuming S (WOSA)(f )d= aχ2

ν, usual EDOF argument yields

ν =2(E{S (WOSA)(f )}

var {S (WOSA)(f )}≈ 2NB

1 + 2∑NB−1m=1

(1− m

) ∣∣∣∑NS−1t=0 htht+mn

∣∣∣2• specialize to Hanning data taper

− following plot shows EDOF versus percentage overlap

− 50% overlap gets close to maximum EDOF

− for Hanning + 50% overlap (i.e., n = NS/2):

ν ≈ 2NB

1 + 2 (1− 1/NB)∣∣∣∑NS/2−1

t=0 htht+NS/2

∣∣∣2 ≈36N2

19NB − 1≈ 3.79N

• need 65% overlap for Slepian with NW = 4

SAPA2e–429, 430 VIII–108

DOFs ν versus % of Block Overlap for N = 1024

0 50 100percentage of overlap

NS = 256

NS = 128

NS = 64

S (WOSA)(·) with Hanning & 50% Overlap & AR(2) SDF

0.0 0.5f

NS = 4

0.0 0.5f

NS = 16

0.0 0.5f

NS = 32

0.0 0.5f

NS = 64

0.0 0.5f

NS = 64

0.0 0.5f

NS = 128

0.0 0.5f

NS = 256

0.0 0.5f

NS = 512

Advantages/Disadvantages of WOSA

• widely used in spectrum analyzers

• advantages

− computationally efficient

− can handle large N

− can handle ‘locally stationary’ processes

− can be ‘robustified’ (Chave et al., 1987)

• disadvantages

− leakage if NS too small

− loss of resolution if NS too small

Deconstructed Multitaper Spectral Estimators: VII

• as 2nd example of deconstructed multitaper estimator, considerWOSA estimator using a Hanning data taper for N = 1024with block size NS = 256 and 50% overlap, yielding NB = 7blocks

• following plot redisplays estimate for AR(4) time series of Fig-ure 36(e)

SAPA2e–435, 438 VIII–119

0.0 0.5f

NS = 256

Deconstructed Multitaper Spectral Estimators: VIII

• weight matrix Q for WOSA estimator has rank K = NB = 7,as following plot demonstrates because there are only 7 nonzeroeigenvalues

SAPA2e–435, 438 VIII–121

Eigenvalues of Q for WOSA Estimator

+ + + + + + + + +

0 5 10 15

Deconstructed Multitaper Spectral Estimators: IX

• following plots show eigenvectors associated with 7 nonzeroeigenvalues – these serve as data tapers in weighted multita-per representation for WOSA estimator

SAPA2e–435, 438 VIII–123

−0.05

0 512 1024t

−0.2

−0.1

−0.05

0 512 1024t

−0.2

−0.1

−0.05

0 512 1024t

−0.2

−0.1

Deconstructed Multitaper Spectral Estimators: X

• following plots show eigenspectra based on seven data tapers,along with weighted multitaper estimates of increasing order K

• because Q is of rank 7, weighted multitaper estimate of orderK = 7 is exactly the same as WOSA estimate

• weights dk determined by eigenvalues dk, but renormalized forK < 7 to sum to unity:

dk =dk∑K−1

k′=0dk′, k = 0, . . . , K − 1

(renormalization required because∑6k=0 dk = 1)

• plots also show spectral windows for eigenspectra and weightedmultitaper estimates, with vertical red dashed lines indicatingstandard bandwidth measures (either BHk or BH)

SAPA2e–435, 437, 438 VIII–127

0.00 0.01f

0.0 0.5f

VIII–128

0.00 0.01f

0.0 0.5f

VIII–129

0.00 0.01f

0.0 0.5f

VIII–130

Ocean Wave Data: I

• sea level time series for which N = 1024, ∆t = 1/4 second andfN = 2 cycles per second

0 50 100 150 200 250

t (seconds)

Ocean Wave Data: II

• arguably best SDF estimate seen so far is Gaussian S(lw)(·)based on S(d)(·) using NW = 2 Slepian data taper

• effective bandwidth of S(lw)(·) given by BU.= 0.135 Hz

• following plot shows basic multitaper estimate S(mt)(·)− set NW = 4 (resolution not main concern)

− maximum of 7 possible reasonable tapers, but S(mt)6 (·) poor

at high frequencies

− set K = 6, yielding ν = 12 EDOF

SAPA2e–439, 441, 443 VIII–132

S(mt)(·), NW = 4, K = 6 Slepian Tapers

0.0 0.5 1.0 1.5 2.0f

Ocean Wave Data: III

• next plots compare another S(mt)(·) estimate with Parzen lag

window estimate S(lw)(·)− set NW = 6 and K = 10 so ν = 20

− bandwidth of S(lw)(·) is 0.049 Hz ≈ 2W.= 0.047 Hz

− good agreement between S(mt)(·) and S(lw)(·)

0.0 0.5 1.0 1.5 2.0f

Parzen S(lw)(·), m = 150, using NW = 2 Slepian S(d)(·)

0.0 0.5 1.0 1.5 2.0f

Ocean Wave Data: IV

• next two plots show

1. adaptive multitaper estimate S(amt)(·) with NW = 4 andK = 7, along with 95% confidence intervals (CIs)

2. EDOFs ν versus f

• wider CIs for f > 1 Hz due to decreasing degrees of freedom

• final plot compares S(amt)(·) with Parzen S(lw)(·) – good over-all agreement between estimates

S(amt)(·), NW = 4, K = 7 Slepian Tapers

0.0 0.5 1.0 1.5 2.0f

Equivalent Degrees of Freedom ν for S(amt)(·)

0.0 0.5 1.0 1.5 2.0f

S(amt)(·) and Parzen S(lw)(·)

0.0 0.5 1.0 1.5 2.0f

VIII–141

Ocean Wave Data: V

• recall that effective bandwidth of Gaussian-based S(lw)(·) isBU

.= 0.135 Hz

• for WOSA estimator, effective bandwidth is BH, which de-pends upon selected data taper {ht} and block size NS

• following plot shows BH versus NS for Hanning data taper,along with dashed line showing BU for Gaussian lag windowestimate

BH versus NS for Hanning Data Taper (∆t = 0.25 sec)

40 60 80 100 120block size

Ocean Wave Data: VI

• NS = 61 yields BH.= 0.134 Hz (closest to BU

.= 0.135 Hz)

• setting n = 30 gives blocks that overlap by ≈ 50%

• let X ′t = Xt −X , i.e., centered time series

− block 1: X ′0, X′1, . . . , X

− block 2: X ′30, X′31, . . . , X

− block 3: X ′60, X′91, . . . , X

′120

− ...

− block 33: X ′960, X′961, . . . , X

′1020

• redefining last block to be X ′963, X′964, . . . , X

′1023 allows use of

entire time series

Hanning-based S (WOSA)(·) and Gaussian S(lw)(·)

0.0 0.5 1.0 1.5 2.0f

Ocean Wave Data: VII

• S (WOSA)(f ) has EDOF ν.= 62.6, whereas S(lw)(f ) has ν

.= 34.4

• recall of that variance is inversely proportional to ν

• 95% CIs based on S (WOSA)(f ) are tighter

• greater variability on explanation for S(lw)(·) being somewhatmore wobbly in appearance than S (WOSA)(f )

SAPA2e–440, 441 VIII–146

combining direct spectral estimators:...

Documents

comparison of least square estimators with rank regression...

combining spectral and spatial information into hidden

ivanov-regularised least-squares estimators over...

combining spectral and spatial features for deep learning...

regions: sub-saharan africa€¦ · allow for comparison...

combining spectral induced polarization with x-ray...

tee ee'e'ects of spectr.l..l estimation on matched...

2 estimators

informed spectral analysis: audio signal parameters ... ·...

adaptive control of discrete-time nonlinear systems...

on the derivation of estimators of foster-greer-thorbecke...

1 redescending m-estimators

estimators piping manhour manual

combining biased and unbiased estimators in high dimensions...

properties of estimators 2

shrinkage estimators - ucla statistics

bayesian estimation of the multifractality parameter for...

asymptotic distribution of estimators with overlapping...

16 w output power by high-efficient spectral beam ... · 16...

properties of estimators