estimation and hypothesis testing 1 (graduate statistics2)

Parametric Statistical Inference: Estimation

Two Areas of Statistical Inference

I. Estimation of Parameters

II. Hypothesis Testing

I. Estimation of Parameters – involves the estimation of unknown population values (parameters) by the known sample values (statistic).

Remark: Statistic are used to estimate the unknown parameters.


Types of Estimation

I. Point Estimate – consists of a single value used to estimate population parameters.

Example: can be used to estimate

s can be used to estimate

can be used to estimate

X

2s2


II. A confidence – Interval Estimate – consists of an interval of numbers obtained from a point estimate, together with a percentage specifying how confident we are that the parameters lies in the interval.

Example: Consider the following statement.A 95% confidence interval for the mean grade of

graduate students is (1.0, 1.5).

Remark: The number 95% or 0.95 is called the confidence coefficient or the degree of confidence. The end points (1.0, 1.5) that is 1.50 and 1.0 are respectively called the lower and upper confidence limits.


Remark: In general, we can always construct a 100% confidence interval. The Greek letter is referred as the level of significance and a fraction is called the confidence coefficient which is interpreted as the probability that the interval encloses the true value of the parameter.

The following table presents the most commonly used confidence coefficients and the corresponding z – values.

Confidence Coefficient

90% 0.10 1.282 .05 1.645

95% 0.05 1.645 0.025 1.960

99% 0.01 2.326 0.005 2.576

1

1

Z 2 2Z

Estimating the Population MeanEstimating the Population Mean

I. The Point Estimator of is .I. The Point Estimator of is .II. The Interval Estimator of is the II. The Interval Estimator of is the confidence interval given by:confidence interval given by:

1. when is known.1. when is known.

2. 2. when is unknown, when is unknown,

where is the t-value with where is the t-value with degrees of freedom.degrees of freedom.

X %1001

nZX

nZX

22

,

n

stX

n

stX

22

,

2

t 1nv


Remark:Remark: If is unknown but for as long If is unknown but for as long as , we still use (1) instead of (2). as , we still use (1) instead of (2). This explains the notion that the This explains the notion that the t t is used is used only for small sample cases .only for small sample cases .

30n

30n

The Nature of The Nature of tt-Distribution-Distribution

develop by William Sealy Gosset (1896 – 1937), develop by William Sealy Gosset (1896 – 1937), an employee of the Guinness Brewery in Dublin, an employee of the Guinness Brewery in Dublin, where he interpreted data and planned barley where he interpreted data and planned barley experiments.experiments.

his findings was under the pseudonym “student” his findings was under the pseudonym “student” because of the Guinness Company’s restrictive because of the Guinness Company’s restrictive policy on publication by its employees.policy on publication by its employees.

The Nature of The Nature of tt-Distribution-Distribution

The Sampling Distribution William Sealy Gosset The Sampling Distribution William Sealy Gosset studied are then called Student’s studied are then called Student’s tt-distributions -distributions which is given bywhich is given by

..

n

sy

t

The Nature of The Nature of tt-Distribution-DistributionProperties of the Properties of the tt-distribution-distribution

1. unimodal;1. unimodal;

2. asymptotic to the horizontal axis;2. asymptotic to the horizontal axis;

3. symmetrical about zero;3. symmetrical about zero;

4. dependent on 4. dependent on vv, the degrees of freedom (for the statistic under , the degrees of freedom (for the statistic under discussion, ).discussion, ).

5. more variable than the standard normal distribution, 5. more variable than the standard normal distribution,

for ;for ;

6. approximately standard normal if 6. approximately standard normal if vv is large. is large.

Definition:Definition: Degrees of freedomDegrees of freedom – the number of values that are free – the number of values that are free to vary after a sample statistic has been computed, and they tell to vary after a sample statistic has been computed, and they tell the researcher which specific curve to use when a distribution the researcher which specific curve to use when a distribution consists of a family of curves.consists of a family of curves.

1nv

2

v

vtV 2n


Example 1: Example 1: From a random sample of 16 applicants From a random sample of 16 applicants for certain graduate fellowships, the following for certain graduate fellowships, the following statistics are obtained about their GRE scores statistics are obtained about their GRE scores

, ,, ,

a. Give the best point estimate of the population mean.a. Give the best point estimate of the population mean.

b. Estimate the standard error of this estimate.b. Estimate the standard error of this estimate.

c. Find a 95% confidence interval on this population c. Find a 95% confidence interval on this population mean.mean.

000,16y 000,000,2562 y 000,400,182y


Note: Note:

Example2: Example2: A study found that the average time it took a A study found that the average time it took a person to find a new job was 5.9 months. If a sample of person to find a new job was 5.9 months. If a sample of 36 job seekers was surveyed, find the 99% confidence 36 job seekers was surveyed, find the 99% confidence interval of the true population mean. Assume that the interval of the true population mean. Assume that the standard deviation is 0.8 month.standard deviation is 0.8 month.

1

22

nn

xxns


Doing with Excel:Doing with Excel:

The expression in the confidence interval for The expression in the confidence interval for the population mean can be obtained in excel the population mean can be obtained in excel software by enteringsoftware by entering

= CONFIDENCE= CONFIDENCE

= - CONFIDENCE = - CONFIDENCE

= + CONFIDENCE = + CONFIDENCE

nZ

2

n,,

X n,,

X n,,

Estimating the Population ProportionEstimating the Population Proportion

In a binomial experiment, the point estimator of In a binomial experiment, the point estimator of thethe

population proportion population proportion p p is where is where xx represents the represents the

number of successes in number of successes in n n trials. On the other hand, a trials. On the other hand, a

confidence interval for population proportion confidence interval for population proportion p p is given is given

by:by:

where and .where and .

n

xp ^

n

qpzp

n

qpzp

^^

2

^^^

2

^

,

n

xp ^ ^

1 pq


Example 1: Example 1: In a certain state, a survey of In a certain state, a survey of 500 workers showed that 45% belong to a 500 workers showed that 45% belong to a union. Find the 90% confidence interval of union. Find the 90% confidence interval of the true proportion of workers who belong the true proportion of workers who belong to a union.to a union.


Example 2: Example 2: A survey of 120 female A survey of 120 female freshmen showed that 18 did not wish to freshmen showed that 18 did not wish to work after marriage. Find the 95% work after marriage. Find the 95% confidence interval of the true proportion confidence interval of the true proportion of females who do not work after marriage.of females who do not work after marriage.

Parametric Statistical Inference: Parametric Statistical Inference: Hypothesis TestingHypothesis Testing

Statistical hypothesis testingStatistical hypothesis testing – used in – used in making decisions in the face of uncertainty in the making decisions in the face of uncertainty in the context of choosing between two competing context of choosing between two competing statements about a population parameter of statements about a population parameter of interest.interest.

Remark:Remark: Statistical hypothesis testing involves Statistical hypothesis testing involves two competing claims, that is, statements two competing claims, that is, statements regarding a population parameter, and making a regarding a population parameter, and making a decision to accept one of these claims on the decision to accept one of these claims on the basis of evidence (and uncertainty in the basis of evidence (and uncertainty in the evidence).evidence).

Parametric Statistical Inference: Hypothesis TestingParametric Statistical Inference: Hypothesis Testing

Definition:Definition: A statistical hypothesis is any A statistical hypothesis is any statement or conjecture about the population.statement or conjecture about the population.

Two Types of Hypotheses Involved in a Hypothesis Two Types of Hypotheses Involved in a Hypothesis Testing ProcedureTesting Procedure

I. The Null Hypothesis - I. The Null Hypothesis - a statement that will involve a statement that will involve

specifying an educated guess about the value of the specifying an educated guess about the value of the population parameter.population parameter. - the hypothesis of no effect and - the hypothesis of no effect and

non-significance in which the researcher wants to non-significance in which the researcher wants to reject.reject.

0H


II. The Alternative Hypothesis - II. The Alternative Hypothesis - the the

statement to be accepted, in case, we statement to be accepted, in case, we reject the null hypothesis.reject the null hypothesis. - the contradiction of the null hypothesis - the contradiction of the null hypothesis

..

Example: Example: If the null hypothesis says that the If the null hypothesis says that the average grade of the graduate students is average grade of the graduate students is 50, then we write, .50, then we write, .

aH

50

0H


Example: Example: There are three possible alternative There are three possible alternative hypothesis which may be formulated from the hypothesis which may be formulated from the preceding null hypothesispreceding null hypothesis . .

a. (the average grade of the graduate a. (the average grade of the graduate students is less than 50)students is less than 50)

b. (the average grade of the graduate b. (the average grade of the graduate students is greater than 50)students is greater than 50)

c. (the average grade of the graduate c. (the average grade of the graduate students is not equal to 50) students is not equal to 50)

50: aH

50: aH

50: aH


Remarks: Remarks: 1. The first two alternative 1. The first two alternative hypotheses are called one-tailed or a directional hypotheses are called one-tailed or a directional

test.test.

2. The third alternative hypothesis is called two-2. The third alternative hypothesis is called two-tailed or a non-directional test.tailed or a non-directional test.

Decision RuleDecision Rule – a criterion that specifies – a criterion that specifies whether or not the null hypothesis should be whether or not the null hypothesis should be rejected in favor of the alternative hypothesis.rejected in favor of the alternative hypothesis.


Remark:Remark: The decision is based on the value of a The decision is based on the value of a test statistic, the value of which is determined test statistic, the value of which is determined from sample measurements.from sample measurements.

Critical RegionCritical Region – the area under sampling – the area under sampling distribution that includes unlikely sample distribution that includes unlikely sample

outcomes which is also known as the rejection outcomes which is also known as the rejection region.region.

-the area where the null hypothesis is rejected.-the area where the null hypothesis is rejected.


Acceptance RegionAcceptance Region –the region where –the region where the null hypothesis is accepted.the null hypothesis is accepted.

Critical Value –Critical Value – the value between the the value between the critical region and the acceptance regioncritical region and the acceptance region


Two Types of ErrorsTwo Types of Errors that may be committed in that may be committed in rejecting or accepting the null hypothesisrejecting or accepting the null hypothesis

I. Type I Error – occurs when we reject the null I. Type I Error – occurs when we reject the null hypothesis when it is true.hypothesis when it is true.

- denoted by .- denoted by .

II. Type II Error – occurs when we accept the null II. Type II Error – occurs when we accept the null hypothesis when it is false.hypothesis when it is false.

- denoted by .- denoted by .


The following table displays the possible consequences in the decision to accept or reject the null hypothesis.

Decision Null Hypothesis

True False

Reject Type I Correct Decision

Accept Correct Type I

Decision

0H

0H0H


Remark: is called the level of significance which is interpreted as the maximum

probability that the researcher is willing to commit a Type I Error.

Remark: The acceptance of the null hypothesis does not mean that it is true but it is a result of insufficient evidence to reject it.

0H

Parametric Statistical Inference: Hypothesis Testing Parametric Statistical Inference: Hypothesis Testing

Remark:Remark: and errors are related. For a and errors are related. For a fixed sample size fixed sample size nn, an increase of , an increase of results to a decrease in and a decrease results to a decrease in and a decrease in results to an increase in . However, in results to an increase in . However, decreasing the two errors simultaneously decreasing the two errors simultaneously can only be achieved by increasing the can only be achieved by increasing the sample size sample size nn. As increases, the size of . As increases, the size of the critical region also increases. Thus, if the critical region also increases. Thus, if is rejected at , then will also be rejected is rejected at , then will also be rejected at a level of significance higher than .at a level of significance higher than .

0H

0H


In hypothesis testing procedure, the following steps In hypothesis testing procedure, the following steps are suggested:are suggested:

1. State the null hypothesis and the alternative hypothesis.1. State the null hypothesis and the alternative hypothesis.2. Decide on the level of significance.2. Decide on the level of significance.3. Determine the decision rule, the appropriate test statistic 3. Determine the decision rule, the appropriate test statistic

and the critical region.and the critical region.4. Gather the given data and compute the value of the test 4. Gather the given data and compute the value of the test

statistic. Check the computed statistic. Check the computed value if it falls inside the critical region or in the value if it falls inside the critical region or in the

acceptance region.acceptance region.5. Make the decision and state the conclusion in words.5. Make the decision and state the conclusion in words.


Remark:Remark: Alternatively, the Alternatively, the pp-value can also be used to -value can also be used to make decision about the population of interest.make decision about the population of interest.

Definition: Definition: The The pp-value represents the chance of -value represents the chance of

generating a value as extreme as the observed value of generating a value as extreme as the observed value of the test statistic or something more extreme if the null the test statistic or something more extreme if the null hypothesis were true.hypothesis were true.

Remark:Remark: The p-value serves to measure how much The p-value serves to measure how much evidence we have against the null hypothesis. The evidence we have against the null hypothesis. The smaller the p-value, the more evidence we have.smaller the p-value, the more evidence we have.

Remark: Remark: If the p-value is less than the level of significance, If the p-value is less than the level of significance, then the null hypothesis is rejected, otherwise, the null then the null hypothesis is rejected, otherwise, the null hypothesis is accepted.hypothesis is accepted.

estimation and hypothesis testing 1 (graduate statistics2)

Technology

confidence interval

population parameters

true population

population meancan

degree of confidence

population proportion

used confidence coefficients

interval of numbers