nonparametric density estimation (one dimension) › tine › nonpar › nonparametric...derivation...

23
Nonparametric Density Estimation (one dimension) ardle, M¨ uller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Nonparametric kernel density estimation Tine Buch-Kromann February 7, 2007 Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

Upload: others

Post on 29-Jan-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Nonparametric Density Estimation(one dimension)

    Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

    Nonparametric kernel density estimation

    Tine Buch-Kromann

    February 7, 2007

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Derivation

    Idea of the histogram:

    1

    n · interval length#{obs that fall into a small interval containing x}

    Idea of the kernel density estimator:

    1

    n · interval length#{obs that fall into a small interval around x}

    Note: That the estimator does not depends on a origin of a bin grid.

    When we want to estimate the density in x , we consider theinterval [x − h, x + h):

    f̂h(x) =1

    2hn#{Xi ∈ [x − h, x + h)}

    Note: Interval length is 2h.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Derivation

    f̂h(x) =1

    2hn#{Xi ∈ [x − h, x + h)}

    =1

    2hn

    n∑

    i=1

    I (|x − Xi | ≤ h)

    =1

    hn

    n∑

    i=1

    1

    2I

    (∣∣∣∣x − Xi

    h

    ∣∣∣∣ ≤ 1)

    =1

    hn

    n∑

    i=1

    K

    (x − Xi

    h

    )

    where K (u) = 12 I (|u| ≤ 1) is the uniform kernel function, whichassigns weight 1/2 to each observation in the interval around x .Points outside the interval assigns the weight 0.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Derivation

    Improvement:More weight to observations very close to x and less weight toobservations farther away x (The Epanechnikov kernel function)

    K (u) =3

    4(1− u2)I (|u| ≤ 1)

    Kernel K (u)

    Uniform 12 I (|u| ≤ 1)Triangle (1− |u|)I (|u| ≤ 1)Epanechnikov 34(1− u2)I (|u| ≤ 1)Quartic 1516(1− u2)2I (|u| ≤ 1)Triweight 3532(1− u2)3I (|u| ≤ 1)Gaussian 1√

    2πexp(−12u2)

    Cosine π4 cos(

    π2 u

    )I (|u| ≤ 1)

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Derivation

    Kernel Density EstimatorGeneral form of the kernel density estimator of a probabilitydensity f , based on a sample X1, ...,Xn

    f̂h(x) =1

    n

    n∑

    i=1

    Kh(x − Xi )

    where Kh(·) = 1hK (·/h) and h is the bandwidth.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Bandwidth

    Bandwidth, h:h controls the smoothness of the estimate (similar to thehistogram) and the choice of h is a crucial problem.

    The problem of how to determine the value of h is a reasonableway is handled later.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • The Kernel Function

    Properties:Kernel functions are usually probability density function:

    ∫K (u) du = 1

    K (u) ≥ 0 ∀u in the domain of K

    Consequences:The kernel density estimator is a pdf

    ∫f̂h(x) dx = 1

    Moreover, f̂h(x) inherits all the continuity and differentiabilityproperties of K : If K is ν times continuously differentiable thenalso f̂h(x) is ν times cont. diff.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • The Kernel Function

    The smoothness of f̂h(x) (for the same value of h) depends ofkernel function.

    ...even if both kernel functions are continuous

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Kernel Density Estimator as a Sum of Bumps

    An alternative view of the kernel density estimation.Rescaled kernel function

    1

    nhK

    (x − Xi

    h

    )=

    1

    nKh (x − Xi )

    Note: The area under the rescaled kernel function is∫

    1

    nhK

    (x − Xi

    h

    )dx =

    1

    nh

    ∫K (u)h du

    =1

    nhh

    ∫K (u) du

    =1

    n

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Rescaled kernel function

    Rewrite the kernel density function as the sum of rescaledkernel functions

    f̂h(x) =1

    n

    n∑

    i=1

    Kh(x − Xi ) =n∑

    i=1

    1

    nhK

    (x − Xi

    h

    )

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Statistical Properties: Bias

    Bias:

    Bias(f̂h(x)) =h2

    2f ′′(x)µ2(K ) + o(h2), h → 0

    where µ2(K ) =∫

    s2K (s) ds.

    The bias is proportional to h2:Choose a small h to reduce the bias.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Statistical Properties: Bias

    Bias depends on f ′′(x) (curvature):In peaks of f : The bias < 0 since f ′′ < 0 around a local maximumof f .In ”valleys” of f : The bias > 0 since f ′′ > 0 around a local mini-mum of f .The magnitude of the bias depends of the absolute value of f ′′.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Statistical Properties: Variance

    Variance:

    V(f̂h(x)) =1

    nh||K ||22f (x) + o

    (1

    nh

    ), nh →∞

    where ||K ||22 =∫

    K 2(s) ds, the squared L2 norm of K .

    The variance is proportional to (nh)−1:

    I Choose large h to reduce the variance.I Increase n to reduce the variance.I The variance increases in ||K ||22: Flat kernels reduce the

    variance.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Statistical Properties: MSE

    Trade-off between bias and variance:Increasing h will lower the variance but increases the bias and viceversa.

    Minimizing the MSE is a compromise between over- andundersmoothing.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Statistical Properties: MSE

    Mean Squared Error:

    MSE(f̂h(x)) =h4

    4f ′′(x)2µ2(K )2 +

    1

    nh||K ||22f (x) + o(h4) + o

    (1

    nh

    )

    Note: MSE → 0 as h → 0 and nh →∞,ie. the kernel density estimator is a consistent estimator.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Statistical Properties: MISE and AMISE

    MISE: Mean Integrated Squared Error

    MISE(f̂h) =1

    nh||K ||22 +

    h4

    4µ2(K )

    2||f ′′||22 + o(

    1

    nh

    )+ o(h4),

    as h → 0, nh →∞.AMISE: Approx. formula for MISE, ignoring higher order terms

    AMISE(f̂h) =1

    nh||K ||22 +

    h4

    4µ22(K )||f ′′||22

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Statistical Properties: Optimal bandwidth

    Bandwidth: Optimal wrt. AMISE

    hopt =

    ( ||K ||22||f ′′||22 µ2(K )2 n

    )1/5∼ n−1/5

    Depends on ||f ′′||22 - unknown.Convergence of AMISE:

    AMISE(f̂hopt ) =5

    4

    (||K ||22)4/5 (

    µ2(K )||f ′′||2)2/5

    n−4/5 ∼ n−4/5

    Note: AMISE converges at the rate n−4/5.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Statistical Properties: Comparison with the histogram

    AMISE: Histogram

    AMISE(f̂h) =1

    nh+

    h2

    12||f ′||22

    Bandwidth: Optimal wrt. AMISE (histogram)

    h0 =

    (6

    n||f ′||22

    )1/3∼ n−1/3

    AMISE converges at the rate n−2/3 in the histogram.

    Slower rate of convergence in the histogram compared to thekernel density estimator.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Bandwidth selection

    The two most frequently used method of banwidth selection:

    I The plug-in method,I Cross-validation methods.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Bandwidth selection: Silverman’s Rule of Thumb

    Plug-in methods:Replace unknown parameters with estimates.

    AMISE optimal bandwidth

    hopt =

    ( ||K ||22||f ′′||22 µ2(K )2 n

    )1/5

    Unknown parameter is ||f ′′||22.Assume f is a normal(µ, σ2)-distribution, then

    ||f ′′||22 = σ−53

    8√

    π≈ 0.212σ−5

    Replace the σ with σ̂.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Bandwidth selection: Silverman’s Rule of Thumb

    Choose kernel function: Gaussian kernel.

    Then the ”Rule-of-Thumb” bandwidth

    ĥrot =

    (||ϕ||22

    ||f̂ ′′||22 µ22(ϕ) n

    )1/5

    =

    (4σ̂5

    3n

    )1/5

    ≈ 1.06σ̂n−1/5

    Applicable formula for bandwidth selection.If X is normally distributed, then ĥrot gives the optimal bandwidth.If X is not normally distributed, then ĥrot will give a bandwidth nottoo far from the optimal bandwidth, if the distribution of X is nottoo different from the normal distribution.

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Bandwidth selection: Silverman’s Rule of Thumb

    Practical problem:The Rule-of-Thumb bandwidths is sensitive to outliers:A single outlier may cause a too large estimator of σ and hence atoo large bandwidth.

    Robust estimator:Calculate

    R = X[0.75n]︸ ︷︷ ︸75%−quantile

    − X[0.25n]︸ ︷︷ ︸25%−quantile

    We assume X ∼ N(µ, σ2) and Z ∼ N(0, 1), thereforeR = X[0.75n] − X[0.25n]

    = (µ + σZ[0.75n])− (µ + σZ[0.25n])= σ(Z[0.75n] − Z[0.25n])≈ σ(0.67− (−0.67))= 1.34σ

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)

  • Bandwidth selection: Silverman’s Rule of Thumb

    Therefore

    σ̂ =R

    1.34

    Plug it into the ”Rule-of-Thumb” bandwidth

    ĥrot ≈ = 1.06σ̂n−1/5

    = 1.06R

    1.34n−1/5

    ≈ 0.79R̂n−1/n

    ”Better-Rule-of-Thumb”:Combine the fist and the robust Rule-of-Thumb

    ĥrot = 1.06 min

    (σ̂,

    R

    1.34

    )n−1/5

    Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)