point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · point estimation...
TRANSCRIPT
Point estimation
PUBH 7401: Fundamentals of Biostatistical Inference
Eric F. LockUMN Division of Biostatistics, SPH
10/23/2018
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
General Framework of Statistical Inference
Experimental set-up is usually as follows:
I Identify population of interest
I Take a sample from that population
I Calculate a statistic
I Use that statistic to infer something about population –often, a parameter
If I repeated steps 2-4, I would obtain a different sample, calculatea different value for the statistic which may affect my inferences instep 4.
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Parameters
I Any summary measure of a population of interest
I Seen this before in the context of studying probability massfunctions and probability density functions
I In that context, parameter was some unknown quantity in thepmf/pdf. For example, µ and σ2 for the normal distribution orα and β for the gamma distribution
I Functions of those unknown quantities in a pdf/pmf are alsoparameters, e.g. E (X ) = αβ if X follows a gammadistribution
I Usually use Greek letters to denote parameters.
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Point Estimator versus Point Estimate
Definition of Point EstimatorA point estimator is a procedure or method for obtaining anestimate of the parameter (θ) from sample data.
I The point estimate is the estimate computed from a sample(a number).
I As any function of the sample data is a statistic, a pointestimator is just a statistic.
I Use ’θ̂’ to denote estimate for parameter θ: θ̂ = h(X ).I Use sample mean to estimate population mean µ:
µ̂ = X̄ = 1n
n∑i=1
Xi .
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example - Point Estimator versus Point Estimate
I A nutritionist is interested in estimating the average numberof calories an undergraduate student consumes throughalcohol in a week during the school year, µ
I Let X be the number of calories consumed via alcohol by arandom student: X1, . . . ,X6:
780, 100, 250, 1080, 0, 300
I Use the sample mean as the point estimatorµ̂ = h(X ) = 1
n∑n
i=1 Xi
I In this case, the point estimate is µ̂ = 418.3 calories.
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example - Different Possible EstimatorsWe tend to think that there is one estimation procedure for eachtype of data scenario but really there are many possibilitiesConsider the accompanying 20 observations on dielectricbreakdown voltage for pieces of epoxy resin. We are interested inestimating the population mean dielectric breakdown voltage (µ).Consider the following estimators and resulting estimates for µ.24.46 25.61 26.25 26.42 26.66 27.15 27.31 27.54 27.74 27.9427.98 28.04 28.28 28.49 28.50 28.87 29.11 29.13 29.50 30.88
1 Estimator: Sample Mean X ; Estimate: 27.7932 Estimator: Sample Median X̃ ; Estimate: 27.9603 Estimator: The midrange X e = {max(Xi )−min(Xi )}/2;
Estimate: 27.6704 Estimator: The 10% trimmed mean (discard the smallest and
largest 10% of sample and compute average); Estimate:27.838
5 Estimator: The number 27; Estimate: 27
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Different Possible Estimators
Consider the linear regression model:
Yi = β0 + β1Xi + εi (1)
I In the context of estimating parameters in a linear regressionmodel we typically use the value of the parameters whichminimize the sum of squared error:
Minimizen∑
i=1(Yi − β0 − β1Xi )2
I Could have chosen some other error to minimize or used anentirely different procedure
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Need Some Tools to Evaluate Possible Estimators
I Want some method to compare the performance of themethod over many different applications of the method(i.e. many different samples)
I We don’t care how well the method does in one particularsample
I For example, suppose that we are comparing differentdifferent procedures for shooting a free throw in basketball(e.g., overhand, underhand, backwards, etc.). You would notjudge the performance of the METHOD based on the result ofone shot
I Just because you miss once does not mean that theMETHOD is bad
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
General Framework of Statistical Inference
Experimental set-up is usually as follows:
I Identify population of interest
I Take a sample from that population
I Calculate a statistic - an estimate
I Use that statistic to infer something about population - theparameter
When we are comparing different statistics/estimators we areinterested in comparing different properties of the samplingdistribution of the statistic/estimator!
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Three General Metrics for Comparing Different Estimators
I Bias: E (θ̂ − θ) → How much does the center of the samplingdistribution of θ̂ differ from the true value of the parameter?
I Variance: V (θ̂) → How variable is the sampling distributionof θ̂?
I Mean squared error (MSE) E{(θ̂ − θ)2} → Combination ofbias and variance.
Note: MSE = Bias2 + Var
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example: Estimating Health Care Costs
I Recall that we were interested in estimating the averagehealthcare costs for workers at a large company.
I The population distribution was Gamma(2, 1000)
I This company planned to take a sample of 100 people. Let’sconsider two possible estimators of E (X ) = µ:
I µ̂1 = 1100
∑100i=1 Xi : sample mean (what we considered last
time)
I µ̂2 = 190
∑i :i∈D Xi where D is the set of observations in the
middle 90% of the sample (trimmed mean)
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example: Estimating Health Care Costs
Histogram of µ̂1: 1000 different samples
Mean Cost in Different Samples
Den
sity
1500 2000 2500
0.00
000.
0005
0.00
100.
0015
0.00
200.
0025
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example: Estimating Health Care Costs
Histogram of µ̂2: 1000 different samples
Mean Cost in Different Samples
Den
sity
1500 2000 2500
0.00
000.
0005
0.00
100.
0015
0.00
200.
0025
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example: Estimating Health Care Costs
Mean of sampling distribution of µ̂1
## [1] 2000.584
SD of sampling distribution of µ̂1
## [1] 141.3011
Mean of sampling distribution of µ̂2
## [1] 1887.656
SD of sampling distribution of µ̂2
## [1] 137.8121
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example: Estimating Health Care Costs
MSE of sampling distribution of µ̂1
## [1] 19946.38
MSE of sampling distribution of µ̂2
## [1] 31594.4
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Note on comparing properties of estimators
I The properties of various estimators can depend on the truevalue of the parameter and the distribution of the population.
I The properties we simulated in the previous example hold ifthe population follows a gamma distribution with α = 2 andβ = 1000.
I In other examples we may want to derive the properties forour estimator under more general conditions.
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example: Estimating a Proportion
Suppose we want to estimate the proportion of fifth grade studentswho read at grade level in Minnesota which we denote as θ.Having a limited budget, we randomly sample only n number of ofstudents in Minnesota. Let Xi equal 1 if the i th student reads atgrade level and equal 0 otherwise. We consider two differentestimators for θ given below:
θ̂1(X1, . . . ,Xn) = X1 + . . .+ Xnn
θ̂2(X1, . . . ,Xn) = X1 + . . .+ Xn + 1n + 2
Find the bias of each of the estimators and compare.Find the variance of each of the estimators and compare.Find the MSE of each of the estimators and compare.
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example #1 Plot of Bias (n=20)
0.0 0.2 0.4 0.6 0.8 1.0
−0.
04−
0.02
0.00
0.02
0.04
Comparison of Bias
True Proportion
Bia
s
Estimator 1Estimator 2
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example #1 Plot of Bias Squared (n=20)
0.0 0.2 0.4 0.6 0.8 1.0
0.00
000.
0005
0.00
100.
0015
0.00
20
Comparison of Bias Squared
True Proportion
Squ
ared
Bia
s
Estimator 1Estimator 2
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example #1 Plot of Variance (n=20)
0.0 0.2 0.4 0.6 0.8 1.0
0.00
00.
002
0.00
40.
006
0.00
80.
010
0.01
2
Comparison of Variance
True Proportion
Var
ianc
e
Estimator 1Estimator 2
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example #1 Plot of MSE (n=20)
0.0 0.2 0.4 0.6 0.8 1.0
0.00
00.
002
0.00
40.
006
0.00
80.
010
0.01
2
Comparison of MSE
True Proportion
MS
E
Estimator 1Estimator 2
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example #1 Plot of Ratio of MSE (n=20)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Ratio of MSE
True Proportion
MS
E1/
MS
E2
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example: Estimating a Proportion Part 2
Find the bias, variance, and MSE of the above problem for ageneral n. How does the relationship between the two estimatorschange as n increases?
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example #2 Plot of MSE (n=100)
0.0 0.2 0.4 0.6 0.8 1.0
0.00
000.
0005
0.00
100.
0015
0.00
200.
0025
Comparison of MSE
True Proportion
MS
E
Estimator 1Estimator 2
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Example #2 Plot of Ratio of MSE (n=100)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Ratio of MSE
True Proportion
MS
E2/
MS
E1
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
A Note on Unbiasedness
I A point estimator θ̂ is said to be an unbiased estimator of θ ifE (θ̂) for every possible value of θ.
I If θ̂ is not unbiased, the difference E (θ̂)− θ is called the biasof θ.
I Is the sample mean x̄ an unbiased estimator for µ?
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Robustness
I We had seen that the sampling distribution of an estimator θ̂depends on the distribution of the population
I Note: when I say distribution of the population I mean if wecollect data X1, . . . ,Xn, the distribution of Xi
I Therefore, the bias, variance, and MSE of θ̂ will depend onthe distribution of the population
I Of course, in real data analysis we do not know thedistribution of Xi
I An estimator which gives reasonable “good” results under avariety of distributions of the population is said to be robust
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Standard Error and Estimated Standard Error
Definition of Standard ErrorThe standard error of an estimator θ̂ is the standard deviation ofthe sampling distribution of θ̂, σθ̂ =
√V (θ̂)
Definition of Estimated Standard ErrorIf the standard error itself involves unknown parameters whosevalues can be estimated, substitution of these estimates into σθ̂yields the estimated standard error.
Will come back to this...
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Standard error example – proportion
I We want to estimate the proportion of fifth grade studentswho read at grade level in Minnesota: θ. We sample nstudents in Minnesota. Let Xi equal 1 if the i th student readsat grade level and equal 0 otherwise. Consider the samplemean estimate for θ:
θ̂ = 1n
n∑i=1
Xi
I What is the standard error for θ̂?
I What is a reasonable estimated standard error for θ̂?
I For n = 50 students, 30 read at grade level.I What is θ̂ and its estimated standard error?
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation
Consistency
I An estimator is consistent if it will always converge to the trueparameter as n→∞
I If X1,X2,X3, . . . ,Xn are independent from a distribution withparameter θ,
I θ̂ = h(X1, . . . ,Xn) is a consistent estimator of θ if
MSE(θ̂) = E{(θ̂ − θ)2} → 0
as n→∞.
I The sample mean X̄ is a consistent estimator for µ.
PUBH 7401: Fundamentals of Biostatistical Inference Point estimation