estimation and hypothesis testing 1 (graduate statistics2)
DESCRIPTION
TRANSCRIPT
Parametric Statistical Inference: Estimation
Two Areas of Statistical Inference
I. Estimation of Parameters
II. Hypothesis Testing
I. Estimation of Parameters – involves the estimation of unknown population values (parameters) by the known sample values (statistic).
Remark: Statistic are used to estimate the unknown parameters.
Parametric Statistical Inference: Estimation
Types of Estimation
I. Point Estimate – consists of a single value used to estimate population parameters.
Example: can be used to estimate
s can be used to estimate
can be used to estimate
X
2s2
Parametric Statistical Inference: Estimation
II. A confidence – Interval Estimate – consists of an interval of numbers obtained from a point estimate, together with a percentage specifying how confident we are that the parameters lies in the interval.
Example: Consider the following statement.A 95% confidence interval for the mean grade of
graduate students is (1.0, 1.5).
Remark: The number 95% or 0.95 is called the confidence coefficient or the degree of confidence. The end points (1.0, 1.5) that is 1.50 and 1.0 are respectively called the lower and upper confidence limits.
Parametric Statistical Inference: Estimation
Remark: In general, we can always construct a 100% confidence interval. The Greek letter is referred as the level of significance and a fraction is called the confidence coefficient which is interpreted as the probability that the interval encloses the true value of the parameter.
The following table presents the most commonly used confidence coefficients and the corresponding z – values.
Confidence Coefficient
90% 0.10 1.282 .05 1.645
95% 0.05 1.645 0.025 1.960
99% 0.01 2.326 0.005 2.576
1
1
Z 2 2Z
Estimating the Population MeanEstimating the Population Mean
I. The Point Estimator of is .I. The Point Estimator of is .II. The Interval Estimator of is the II. The Interval Estimator of is the confidence interval given by:confidence interval given by:
1. when is known.1. when is known.
2. 2. when is unknown, when is unknown,
where is the t-value with where is the t-value with degrees of freedom.degrees of freedom.
X %1001
nZX
nZX
22
,
n
stX
n
stX
22
,
2
t 1nv
Estimating the Population MeanEstimating the Population Mean
Remark:Remark: If is unknown but for as long If is unknown but for as long as , we still use (1) instead of (2). as , we still use (1) instead of (2). This explains the notion that the This explains the notion that the t t is used is used only for small sample cases .only for small sample cases .
30n
30n
The Nature of The Nature of tt-Distribution-Distribution
develop by William Sealy Gosset (1896 – 1937), develop by William Sealy Gosset (1896 – 1937), an employee of the Guinness Brewery in Dublin, an employee of the Guinness Brewery in Dublin, where he interpreted data and planned barley where he interpreted data and planned barley experiments.experiments.
his findings was under the pseudonym “student” his findings was under the pseudonym “student” because of the Guinness Company’s restrictive because of the Guinness Company’s restrictive policy on publication by its employees.policy on publication by its employees.
The Nature of The Nature of tt-Distribution-Distribution
The Sampling Distribution William Sealy Gosset The Sampling Distribution William Sealy Gosset studied are then called Student’s studied are then called Student’s tt-distributions -distributions which is given bywhich is given by
..
n
sy
t
The Nature of The Nature of tt-Distribution-DistributionProperties of the Properties of the tt-distribution-distribution
1. unimodal;1. unimodal;
2. asymptotic to the horizontal axis;2. asymptotic to the horizontal axis;
3. symmetrical about zero;3. symmetrical about zero;
4. dependent on 4. dependent on vv, the degrees of freedom (for the statistic under , the degrees of freedom (for the statistic under discussion, ).discussion, ).
5. more variable than the standard normal distribution, 5. more variable than the standard normal distribution,
for ;for ;
6. approximately standard normal if 6. approximately standard normal if vv is large. is large.
Definition:Definition: Degrees of freedomDegrees of freedom – the number of values that are free – the number of values that are free to vary after a sample statistic has been computed, and they tell to vary after a sample statistic has been computed, and they tell the researcher which specific curve to use when a distribution the researcher which specific curve to use when a distribution consists of a family of curves.consists of a family of curves.
1nv
2
v
vtV 2n
Estimating the Population MeanEstimating the Population Mean
Example 1: Example 1: From a random sample of 16 applicants From a random sample of 16 applicants for certain graduate fellowships, the following for certain graduate fellowships, the following statistics are obtained about their GRE scores statistics are obtained about their GRE scores
, ,, ,
a. Give the best point estimate of the population mean.a. Give the best point estimate of the population mean.
b. Estimate the standard error of this estimate.b. Estimate the standard error of this estimate.
c. Find a 95% confidence interval on this population c. Find a 95% confidence interval on this population mean.mean.
000,16y 000,000,2562 y 000,400,182y
Estimating the Population MeanEstimating the Population Mean
Note: Note:
Example2: Example2: A study found that the average time it took a A study found that the average time it took a person to find a new job was 5.9 months. If a sample of person to find a new job was 5.9 months. If a sample of 36 job seekers was surveyed, find the 99% confidence 36 job seekers was surveyed, find the 99% confidence interval of the true population mean. Assume that the interval of the true population mean. Assume that the standard deviation is 0.8 month.standard deviation is 0.8 month.
1
22
nn
xxns
Estimating the Population MeanEstimating the Population Mean
Doing with Excel:Doing with Excel:
The expression in the confidence interval for The expression in the confidence interval for the population mean can be obtained in excel the population mean can be obtained in excel software by enteringsoftware by entering
= CONFIDENCE= CONFIDENCE
= - CONFIDENCE = - CONFIDENCE
= + CONFIDENCE = + CONFIDENCE
nZ
2
n,,
X n,,
X n,,
Estimating the Population ProportionEstimating the Population Proportion
In a binomial experiment, the point estimator of In a binomial experiment, the point estimator of thethe
population proportion population proportion p p is where is where xx represents the represents the
number of successes in number of successes in n n trials. On the other hand, a trials. On the other hand, a
confidence interval for population proportion confidence interval for population proportion p p is given is given
by:by:
where and .where and .
n
xp ^
n
qpzp
n
qpzp
^^
2
^^^
2
^
,
n
xp ^ ^
1 pq
Estimating the Population ProportionEstimating the Population Proportion
Example 1: Example 1: In a certain state, a survey of In a certain state, a survey of 500 workers showed that 45% belong to a 500 workers showed that 45% belong to a union. Find the 90% confidence interval of union. Find the 90% confidence interval of the true proportion of workers who belong the true proportion of workers who belong to a union.to a union.
Estimating the Population ProportionEstimating the Population Proportion
Example 2: Example 2: A survey of 120 female A survey of 120 female freshmen showed that 18 did not wish to freshmen showed that 18 did not wish to work after marriage. Find the 95% work after marriage. Find the 95% confidence interval of the true proportion confidence interval of the true proportion of females who do not work after marriage.of females who do not work after marriage.
Parametric Statistical Inference: Parametric Statistical Inference: Hypothesis TestingHypothesis Testing
Statistical hypothesis testingStatistical hypothesis testing – used in – used in making decisions in the face of uncertainty in the making decisions in the face of uncertainty in the context of choosing between two competing context of choosing between two competing statements about a population parameter of statements about a population parameter of interest.interest.
Remark:Remark: Statistical hypothesis testing involves Statistical hypothesis testing involves two competing claims, that is, statements two competing claims, that is, statements regarding a population parameter, and making a regarding a population parameter, and making a decision to accept one of these claims on the decision to accept one of these claims on the basis of evidence (and uncertainty in the basis of evidence (and uncertainty in the evidence).evidence).
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
Definition:Definition: A statistical hypothesis is any A statistical hypothesis is any statement or conjecture about the population.statement or conjecture about the population.
Two Types of Hypotheses Involved in a Hypothesis Two Types of Hypotheses Involved in a Hypothesis Testing ProcedureTesting Procedure
I. The Null Hypothesis - I. The Null Hypothesis - a statement that will involve a statement that will involve
specifying an educated guess about the value of the specifying an educated guess about the value of the population parameter.population parameter. - the hypothesis of no effect and - the hypothesis of no effect and
non-significance in which the researcher wants to non-significance in which the researcher wants to reject.reject.
0H
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
II. The Alternative Hypothesis - II. The Alternative Hypothesis - the the
statement to be accepted, in case, we statement to be accepted, in case, we reject the null hypothesis.reject the null hypothesis. - the contradiction of the null hypothesis - the contradiction of the null hypothesis
..
Example: Example: If the null hypothesis says that the If the null hypothesis says that the average grade of the graduate students is average grade of the graduate students is 50, then we write, .50, then we write, .
aH
50
0H
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
Example: Example: There are three possible alternative There are three possible alternative hypothesis which may be formulated from the hypothesis which may be formulated from the preceding null hypothesispreceding null hypothesis . .
a. (the average grade of the graduate a. (the average grade of the graduate students is less than 50)students is less than 50)
b. (the average grade of the graduate b. (the average grade of the graduate students is greater than 50)students is greater than 50)
c. (the average grade of the graduate c. (the average grade of the graduate students is not equal to 50) students is not equal to 50)
50: aH
50: aH
50: aH
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
Remarks: Remarks: 1. The first two alternative 1. The first two alternative hypotheses are called one-tailed or a directional hypotheses are called one-tailed or a directional
test.test.
2. The third alternative hypothesis is called two-2. The third alternative hypothesis is called two-tailed or a non-directional test.tailed or a non-directional test.
Decision RuleDecision Rule – a criterion that specifies – a criterion that specifies whether or not the null hypothesis should be whether or not the null hypothesis should be rejected in favor of the alternative hypothesis.rejected in favor of the alternative hypothesis.
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
Remark:Remark: The decision is based on the value of a The decision is based on the value of a test statistic, the value of which is determined test statistic, the value of which is determined from sample measurements.from sample measurements.
Critical RegionCritical Region – the area under sampling – the area under sampling distribution that includes unlikely sample distribution that includes unlikely sample
outcomes which is also known as the rejection outcomes which is also known as the rejection region.region.
-the area where the null hypothesis is rejected.-the area where the null hypothesis is rejected.
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
Acceptance RegionAcceptance Region –the region where –the region where the null hypothesis is accepted.the null hypothesis is accepted.
Critical Value –Critical Value – the value between the the value between the critical region and the acceptance regioncritical region and the acceptance region
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
Two Types of ErrorsTwo Types of Errors that may be committed in that may be committed in rejecting or accepting the null hypothesisrejecting or accepting the null hypothesis
I. Type I Error – occurs when we reject the null I. Type I Error – occurs when we reject the null hypothesis when it is true.hypothesis when it is true.
- denoted by .- denoted by .
II. Type II Error – occurs when we accept the null II. Type II Error – occurs when we accept the null hypothesis when it is false.hypothesis when it is false.
- denoted by .- denoted by .
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
The following table displays the possible consequences in the decision to accept or reject the null hypothesis.
Decision Null Hypothesis
True False
Reject Type I Correct Decision
Accept Correct Type I
Decision
0H
0H0H
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
Remark: is called the level of significance which is interpreted as the maximum
probability that the researcher is willing to commit a Type I Error.
Remark: The acceptance of the null hypothesis does not mean that it is true but it is a result of insufficient evidence to reject it.
0H
Parametric Statistical Inference: Hypothesis Testing Parametric Statistical Inference: Hypothesis Testing
Remark:Remark: and errors are related. For a and errors are related. For a fixed sample size fixed sample size nn, an increase of , an increase of results to a decrease in and a decrease results to a decrease in and a decrease in results to an increase in . However, in results to an increase in . However, decreasing the two errors simultaneously decreasing the two errors simultaneously can only be achieved by increasing the can only be achieved by increasing the sample size sample size nn. As increases, the size of . As increases, the size of the critical region also increases. Thus, if the critical region also increases. Thus, if is rejected at , then will also be rejected is rejected at , then will also be rejected at a level of significance higher than .at a level of significance higher than .
0H
0H
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
In hypothesis testing procedure, the following steps In hypothesis testing procedure, the following steps are suggested:are suggested:
1. State the null hypothesis and the alternative hypothesis.1. State the null hypothesis and the alternative hypothesis.2. Decide on the level of significance.2. Decide on the level of significance.3. Determine the decision rule, the appropriate test statistic 3. Determine the decision rule, the appropriate test statistic
and the critical region.and the critical region.4. Gather the given data and compute the value of the test 4. Gather the given data and compute the value of the test
statistic. Check the computed statistic. Check the computed value if it falls inside the critical region or in the value if it falls inside the critical region or in the
acceptance region.acceptance region.5. Make the decision and state the conclusion in words.5. Make the decision and state the conclusion in words.
Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing
Remark:Remark: Alternatively, the Alternatively, the pp-value can also be used to -value can also be used to make decision about the population of interest.make decision about the population of interest.
Definition: Definition: The The pp-value represents the chance of -value represents the chance of
generating a value as extreme as the observed value of generating a value as extreme as the observed value of the test statistic or something more extreme if the null the test statistic or something more extreme if the null hypothesis were true.hypothesis were true.
Remark:Remark: The p-value serves to measure how much The p-value serves to measure how much evidence we have against the null hypothesis. The evidence we have against the null hypothesis. The smaller the p-value, the more evidence we have.smaller the p-value, the more evidence we have.
Remark: Remark: If the p-value is less than the level of significance, If the p-value is less than the level of significance, then the null hypothesis is rejected, otherwise, the null then the null hypothesis is rejected, otherwise, the null hypothesis is accepted.hypothesis is accepted.