topic 7 sampling and sampling distributions. the term population represents everything we want to...
Post on 20-Dec-2015
214 views
TRANSCRIPT
• The term Population represents everything we want to study, bearing in mind that the population is ever changing and hence a dynamic concept.
• A Census is a snapshot of the population at any single point of time.
• For example, the last UK census was taken on the 29th of April 2001.
• The Office of National Statistics (ONS) attempted to get a picture of everything that is relevant on that specific day!
• A sample is usually a part or a fraction of the population and not the whole of the latter. The act of collecting samples is called sampling.
• Descriptive Statistics: Using the sample data to describe and draw conclusions about the sample only
• Inferential Statistics: Using the sample data to draw conclusions about the population
The statistician uses the information out of sample(s) to estimate population characteristics.
Getting access to the entire population may be prohibitively costly and a sample is therefore taken.
• Errors therefore occur naturally.
• A parameter is a defining characteristic of a population that can be quantified.
• The mean and standard deviationof the normal distribution are examples of parameters of the distribution.
•The difference between the actual population characteristic and the corresponding estimate is called a sampling error.
The process can be visualised as below:
Learning Objectives
• Determine when to use sampling instead of a census.
• Distinguish between random and nonrandom sampling.
• Be aware of the different types of error that can occur in a study.
• Understand the impact of the Central Limit Theorem on statistical analysis.
• Use the sampling distributions of xMEAN the sample mean and p (the sample proportion).
Reasons for Sampling
• Sampling can save money.
• Sampling can save time.
• For given resources, sampling can broaden the scope of the data set.
• Because the research process is sometimes destructive, the sample can save product.
• If accessing the population is impossible; sampling is the only option.
Reasons for Taking a Census
• Eliminate the possibility that a random sample is not representative of the population.
• The person authorizing the study is uncomfortable with sample information.
Random vs Nonrandom SamplingRandom vs Nonrandom Sampling
• Random sampling
• Every unit of the population has the same probability of being included in the sample.
• A chance mechanism is used in the selection process.
• Eliminates bias in the selection process
• Also known as probability sampling
Random vs Nonrandom SamplingRandom vs Nonrandom Sampling
• Nonrandom Sampling
• Every unit of the population does not have the same probability of being included in the sample.
• Open the selection bias
• Not appropriate data collection methods for most statistical methods
• Also known as nonprobability sampling
Data from nonrandom samples are not appropriate for analysis by inferential statistical methods.
Sampling Error occurs when the sample is not representative of the population
Errors
Nonsampling Errors
•Missing Data, Recording, Data Entry, and Analysis Errors
•Poorly conceived concepts , unclear definitions, and defective questionnaires
•Response errors occur when people do not know, will not say, or overstate in their answers
The Central Limit Theorem (CLT)
Whatever the population distribution, the distribution of the sample mean is normal( 2/n )as long as n is ‘large’
Proper analysis and interpretation of a sample statistic requires knowledge of its distribution.
Sampling Distribution of the Sample MeanSampling Distribution of the Sample Mean
Population
(parameter)
Sample
x
(statistic)
Calculate x
to estimate
Select a
random sample
Process ofInferential Statistics
Distribution of Sample Means for Various Sample Sizes
Distribution of Sample Means for Various Sample Sizes
U ShapedPopulation
n = 2 n = 5 n = 30
NormalPopulation
n = 2 n = 5 n = 30
A parameter is a defining characteristicof a population that can be quantified.
For example, the mean and standarddeviationof the normal distributionare parameters of the distribution
.Parameter to beestimated
.A Point EstimateFor the Parameter
[ ]A Confidence IntervalFor the Parameter
Parameter
.. .
...
. .
Estimator
Although each estimator is way off target, together they may well give a good estimation
. .. ... . .. .ParameterReal Line
++ + +
+- - --
(Unknown)
This method of estimation isUnbiased if and onlyif the algebraic sum of all‘errors’ is zero.
Each deviation from the parameter is called an error
And the ‘average’ of these errors isCalled standard error of estimation
.... ..Question: Given the five piece dataset,which point represents thesummary statistic ?
Answer: The sample mean is the bestof all available options
The Sampling Distribution of theSample Mean (xMEAN)
Suppose that the population mean = 20 and consider the following statistical process
Sample Number Value of xMEAN
1 18 2 24 3 21 - -
100 22
.. ... ... .. .... ...
. .. .. ..
. Location of Parameter
.Negatively biased estimator
..
...
Positively biased estimatorUnbiased estimator
Estimate Number Error 1 +6 2 +8 3 -10 4 +2 5 -6
Error 0 0 0 -1 0
Although the first set of estimates (in red)have an average of zero, it is probably notas good as the second one (in green)
. .. ... . .. .
..
.... ... ..
..
... ... ..
Available Resources: R1
Available Resources: R2
Available Resources: R3
This estimator is consistent if R1< R2< R3.
Formally, an estimator b of a parameter is unbiased if and only if the average of the b values is exactly That is, E(b) =
If E(b) then the estimator is biased and the difference E(b) is the biasof estimation
An estimator b of a parameter is efficient if and only if it has the smallest standard error of all unbiased estimators
The standard error of estimation for estimator b (seb) is given by
(seb) = E-b)2
An estimator b of a parameter is consistent if and only if its standard error gets smaller as n gets larger
Distribution of Sample Means for Various Sample Sizes
Distribution of Sample Means for Various Sample Sizes
U ShapedPopulation
n = 2 n = 5 n = 30
NormalPopulation
n = 2 n = 5 n = 30
The standard error (s.e.) of estimation for xMEAN is given by
s.e. = /n where is the population standard deviation and n is the sample size
The distribution of the sample mean(xMEAN ) for three sample sizes: n1 < n2 < n3
xMEAN
Density
Sample Size: n2
Sample Size: n1
Sample Size: n3
Summary
1. XMEAN is an unbiased estimator of the population mean
E(XMEAN ) =
2. Standard error of XMEAN (s.e.) is given by s.e. = /n
3. XMEAN is an efficient estimator of the population mean . It has the smallest of all s.e values
4. XMEAN is a consistent estimator of the population mean . The s.e. value becomes smaller as the sample gets larger
The Central Limit Theorem (CLT)
Whatever the population distribution, the distribution of the sample mean is normal( 2/n )as long as n is ‘large’
All three are unbiased. E1 is the mostEfficient, E3 is the least
FrequencyDensity
Estimator E2
Estimator E3
Estimator E1
.E1,E2,E3Describe these estimators
Both of E2 and E3 are unbiased but less efficient than E1. E1 is the most efficient, but it is positively biased.
FrequencyDensity
Estimator E2
Estimator E3
Estimator E1
.E1,E2,E3Describe these estimators
Each is a negatively biased estimator. . E1 is the most efficient of the three and E3 the least.
FrequencyDensity
Estimator E2
Estimator E3
Estimator E1
.E1,E2,E3Describe these estimators
Confidence Interval (CI)Sometimes, it is possible and
convenient to predict, with a certain amount of confidence in the prediction, that the true value of the parameter lies within a specified interval.
Such an interval is called a Confidence Interval (CI)
The statement ‘ [L, H] is the 95%
CI of ’ is to be interpreted that with 95% chance the population mean lies within the specified interval and with 5% chance it lies outside.
Example1 (Confidence Interval for the sample mean): Suppose that the result of sampling yields the following:
xMEAN = 25 ; n = 36. Use this information to construct a 95% CI for , given that = 16
Since n >24, we can say that xMEAN is approximately N(, 2/36).
Standardisation means that (xMEAN - )/(/6) is approximately z.
Now find the two symmetric points around 0 in the z table such that the area is 0.95. The answer is
z = 1.96.