descriptive statistics and inferential statistics shibin liu sas beijing r&d
Post on 26-Dec-2015
381 Views
Preview:
TRANSCRIPT
Descriptive Statistics and Inferential Statistics
Shibin LiuSAS Beijing R&D
Agenda
• 0. Lesson overview• 1. Basic Statistical Concepts• 2. Descriptive Statistics • 3. Picturing your data• 4. Confidence Intervals for the mean• 5. Hypothesis Testing• 6. Summary
2
Agenda
• 0. Lesson overview• 1. Basic Statistical Concepts• 2. Descriptive Statistics • 3. Picturing your data• 4. Confidence Intervals for the mean• 5. Hypothesis Testing• 6. Summary
3
Lesson overview
4
Lesson overview
5
Lesson overview
6
Lesson overview
7
12
36
206
31
4062
8
X a v b f R
P 9 O 8I t 0 3 L
Lesson overview
8
Descriptive Statistics
Inferential Statistics
Lesson overview
9
Populations and Samples
Parameters and Statistics
Variable Classifications
Scales of Measurement
Lesson overview
10
Lesson overview
11
Point estimators
Distribution of sample means
Central limit theorem
Standard error of the mean
Confidence intervals
Lesson overview
12
Define null /alternative hypotheses
Specify significance level
Collect data
Reject or fail to reject the null hypothesis
Lesson overview
13
What do you want to
examine?
The relationship between variables
The difference between groups on one or more
variables
The location, spread, and shape of the data’s
distribution
Summary statistics
or graphics?
How many
groups?Which kind
of variables?
SUMMARY STATISTICS
DISTRIBUTION ANALYSIS
TTEST
LINEAR MODELS
CORRELATIONS
ONE-WAY FREQUENCIES
& TABLE ANALYSIS
LINEAR REGRESSION
LOGISTIC REGRESSION
Summary statistics
Both Two
Two or more
Descriptive Statistics
Descriptive Statistics, histogram, normal,
probability plots
Analysis of variance
Continuous only
Frequency tables, chi-square test
Categorical response variable
Descriptive Statistics
Inferential Statistics
Lesson 1 Lesson 2 Lesson 3 & 4 Lesson 5
Agenda
• 0. Lesson overview• 1. Basic Statistical Concepts• 2. Descriptive Statistics • 3. Picturing your data• 4. Confidence Intervals for the mean• 5. Hypothesis Testing• 6. Summary
14
Basic Statistical Concepts
15
∏
∑
σ
∆α
Descriptive Statistics and Inferential Statistics
Populations and Samples
Parameters and Statistics
Variable Classifications
Scales of Measurement
Statistical Methods
Basic Statistical Concepts
16
• In this topic, you will learn to do following:– Distinguish between descriptive and inferential
statistics – Define populations and samples– Distinguish between parameters and statistics – Classify variables– Explain other statistical concepts, including scale
of measurement
Basic Statistical Concepts
17
Descriptive Statistics
12
36
206
314062
8
Basic Statistical Concepts
18
Descriptive Statistics
% percent
μ average
variabilityσ
Basic Statistical Concepts
19
Descriptive Statistics
describe
summarize??? !!!
Basic Statistical Concepts
20
Descriptive Statistics
Measures of location
Measures of variability
Frequency
Basic Statistical Concepts
21
Descriptive Statistics
Basic Statistical Concepts
22
Descriptive Statistics
Draw preliminary conclusions
Characterize central tendency
Inspect the spread and shape
Screen for unusual data values
Goals:
Basic Statistical Concepts
23
Descriptive Statistics
Extreme values?
Unusual shape?
Data value cluster?
Unique feature?
Error free?
???
Basic Statistical Concepts
24
Descriptive Statistics
Basic Statistical Concepts
25
Inferential Statistics
reliability
precision
Basic Statistical Concepts
26
Inferential Statistics
generalize
Basic Statistical Concepts
27
Inferential StatisticsDescriptive Statistics
Basic Statistical Concepts
28
Populations and Samples
Population= complete set of observations
Basic Statistical Concepts
29
Representative sample
Characteristics similar to population's characteristics
sample
Populations and Samples
Basic Statistical Concepts
30
Simple random sample
Each sample of a given size
Equal chance of selection
Populations and Samples
Basic Statistical Concepts
31
Convenience sampling
Populations and Samples
May not be representative, Why? For example:
a. the first n…b. Male/female
Basic Statistical Concepts
32
Populations and Samples
Question 1.
A sample from a population should be
a) Randomb) Representativec) Biasedd) All of above
Answer: b
Basic Statistical Concepts
33
Populations and Samples
Question 2.
A medical researcher wants to verify if patients at a hospital during a particular time frame had the positive effect. From the entire set of 5000 patients admitted to the hospital during that time, the researcher selects a sample of 200 patients.Match these items between the two groups:
All 5000 patients
The 200 selected patients
The first 200 patients admitted to the hospital during this time frame
200 patients whose patients ID numbers were selected at random from the 5000
c
d
b
a
a) Simple random sample
b) Convenience sample
c) Population
d) Sample
Basic Statistical Concepts
34
Parameters and Statistics
Parameters
Statistics
sr
s2
σρ μ
σ2
Greek letters
English letters
Parameters are numerical values that summarize characteristics of population.Usually unknown.
Basic Statistical Concepts
35
Parameters and Statistics
Population Parameters Sample Statistics
Mean μ
Variance σ2 s2
Standard Deviation σ s
=
=
Sample mean
Sample variance
Sample standard deviation
=
Basic Statistical Concepts
36
Uses of Variables: Independent and Dependent
Independent variable
Dependent variable
Response variable
Outcome variable
Control variable
Explanatory variable
Predictor variable
Different values changes as required
Different values changes in response
Context 1.Control: fertilizer->plant
Context 2.Dependent variable value is given: credit default or not
Basic Statistical Concepts
37
Variables Classification by Types: Quantitative and Categorical
???
Quantitative Categorical
Basic Statistical Concepts
38
Variables Classification by Types: Quantitative and Categorical
Quantitative
Numerical
1245 0.36 206 31 8
Arithmetic operations
Discrete
Continuous
Basic Statistical Concepts
39
Variables Classification by Types: Quantitative and Categorical
0 1 2 3 4 5 6
Discrete
2.34
Basic Statistical Concepts
40
Variables Classification by Types: Quantitative and Categorical
Continuous
200, 201, 201.13, 199.904 (ml/bottle)
Basic Statistical Concepts
41
Variables Classification by Types: Quantitative and Categorical
Categorical
Attribute data
No arithmetic operations
Nominal
Ordinal
Basic Statistical Concepts
42
Variables Classification by Types: Quantitative and Categorical
No ordering
Nominal
Gender
Beverage
2 1
3 2 1
Qualitative
Classification
Basic Statistical Concepts
43
Variables Classification by Types: Quantitative and Categorical
Degree: Low, moderate, severeOrdinal
Basic Statistical Concepts
44
Variables Classification by Types: Quantitative and Categorical
Question 3.
a) Categorical (ordinal)b) Quantitative (discrete)c) Categorical (nominal)d) Quantitative (continuous)
Match these items between the two groups:
The number of students in a classHair color (black, brown, gray, red)Income (low, medium, high)The numbers on athletes’ team shirtThe height of trees in a forestName (Adam, Becky, Christina, Dave…)The length of time for a light bulb to burn outThe number of phone calls a company receives
B
B
CACDCD
Basic Statistical Concepts
45
By Scale of measurement: Nominal Ordinal Interval Ratio
Scale of measurement?
To know the scale of measurement for each variable, then select statistical procedure appropriate useful for that variable.
Basic Statistical Concepts
46
By Scale of measurement: Nominal Ordinal Interval Ratio
Nominal
Degree: Low, moderate, severeOrdinal
Basic Statistical Concepts
47
By Scale of measurement: Nominal Ordinal Interval Ratio
Continuous
Interval
Ratio
Basic Statistical Concepts
48
By Scale of measurement: Nominal Ordinal Interval Ratio
Interval
Low Medium High
1. Body temperature: no true 0 point
2. PH=8 (Sea) ≠ 2* PH=4 (Tomato juice)
Basic Statistical Concepts
49
By Scale of measurement: Nominal Ordinal Interval Ratio
Ratio
True 0 point
1. 2 kg apples; 0 kg apples
2. 0 $; 100$ = 2* 50$
Basic Statistical Concepts
50
By Scale of measurement: Nominal Ordinal Interval Ratio
Question 4.
a) Categorical (nomina)b) Quantitative (interval)c) Categorical (ordinal)d) Quantitative (ratio)
Match these items between the two groups:
ID BodyTempGenderHeartRate
a
ab
d
Basic Statistical Concepts
51
Statistical Methods
Univariate
Bivariate
Multivariable
Multivariate
The selection of Statistical Methods depends on how many variables involved.
Basic Statistical Concepts
52
Statistical Methods
Univariate
RangeDispersionCentral tendencyFrequency
Basic Statistical Concepts
53
Statistical Methods
Bivariate
Correlation analysis
Chi-square tests of independence
Basic Statistical Concepts
54
Statistical Methods
Multiple linear regression
N-way ANOVA
Multivariable
Basic Statistical Concepts
55
Statistical Methods
Factor analysis
Clustering Multivariate
Multivariate ANOVA
Multivariate linear regression
Basic Statistical Concepts
56
Scenario: exploring your data
School of Carver County want to check if the average score of SAT is 1200?
Gender SATScore IDNumber
Male 1170 61469897
Female 1090 33081197
Male 1240 68137597
Female 1000 37070397
Male 1210 64608797
Female 970 60714297
Male 1020 16907997
Female 1490 9589297
Male 1200 93891897
Female 1260 85859397
80 students are selected randomly, and their scores are recorded into TestScores.
TestScores
Basic Statistical Concepts
57
Scenario: exploring your data
Question 5.
a) Yes b) No
Is the data for these 80 students representative of the entire population of magnet school students in Carver County?
a
Basic Statistical Concepts
58
Scenario: exploring your data
Question 6.
a) Categorical, quantitative, categorical b) Categorical, quantitative, quantitativec) Quantitative, quantitative, categorical d) Quantitative, categorical, categorical
Select the correct type for the following variables: Gender SATScore IDNumber
a
Basic Statistical Concepts
59
Scenario: exploring your data
Question 7.
a) Univariate b) Bivariate c) Multivariate d) All of above
What types of statistical methods can be applied to this data?
d
Agenda
• 0. Lesson overview• 1. Basic Statistical Concepts• 2. Descriptive Statistics • 3. Picturing your data• 4. Confidence Intervals for the mean• 5. Hypothesis Testing• 6. Summary
60
Descriptive Statistics
61
In this topic, we will learn to do the following:
– Explain the basics of descriptive statistics– Describe distributions can tell you about your data– Use the Summary Statistics task to produce descriptive
statistics
Descriptive Statistics
62
Describing your data
Number of times each value appears
Outlier
Descriptive Statistics
63
Measures of Location
Median
9389888483827978787774737268686835
Mode most common data value, 68 here
Middle value, when ordered
Odd : middle value Even : average of two middle values
Descriptive Statistics
64
Measures of Location
Mode=1
Median=1.5
1
1 3 10
1
2
Mean=3
Descriptive Statistics
65
Measures of Location
Question 8.
a) Meanb) Median c) Either mean or median
For a symmetric distribution, which of the following statistics is a good measure for the middle of the data?
Answer: c
Descriptive Statistics
66
Percentile
Percentile = the value of a variable below which % of observations fallPercentile locate a position in your data rather than a given proportion of data valuesQuartiles break your data into quarters
63 69 75 81 87 930
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Descriptive Statistics
67
Percentile
63 69 75 81 87 930
0.5
1
1.5
2
2.5
3
3.5
4
4.5
93898884838279787877747372686763
25th Percentile =72.5
First quartile
Q1Lower quartile
Descriptive Statistics
68
Percentile
63 69 75 81 87 930
0.5
1
1.5
2
2.5
3
3.5
4
4.5
93898884838279787877747372686763
50th Percentile =78
Median
Q2Middle quartile
Descriptive Statistics
69
Percentile
63 69 75 81 87 930
0.5
1
1.5
2
2.5
3
3.5
4
4.5
93898884838279787877747372686763
75th Percentile =83.5
Third quartile
Q3Upper quartile
Descriptive Statistics
70
Measure of Variability
Sample variance:
Measure Definition
Range The difference between the maximum and minimum data values
Interquartile Range
The difference between the 25th and 75th percentiles
Variance A measure of dispersion of the data around the mean
Standard Deviation
A measure of dispersion expressed in the same units of measurement as your data (the square root of the variance)
=
Coefficient of variation(C.V.): 𝑠𝑥×100
Descriptive Statistics
71
Measure of Variability
Question 9.
a) True b) False
The IQR(interquartile range ) is a measure of the middle 50% of your data. It is not as likely to affected by outliers and therefore is more robust than the overall range.
a
Descriptive Statistics
72
Scenario
Check the SAT score of XXX school, 80 random selecteddescriptive statistics
the average=1200?
Agenda
• 0. Lesson overview• 1. Basic Statistical Concepts• 2. Descriptive Statistics • 3. Picturing your data• 4. Confidence Intervals for the mean• 5. Hypothesis Testing• 6. Summary
73
Picturing your data: Objectives
74
In this topic, we will learn to do the following:
– Look at the distribution of continuous variables– Describe the normal distribution– Use the Distribution Analysis task to generate descriptive
statistics, including histogram, box and normal probability plots.
Picturing your data: Normal Distributions
76
Characters of normal distribution:– Symmetric– Defined by μ and σ– Bell shaped– Mean=median=mode
Picturing your data: Normal Distributions
77
The bean machine, a device invented by Francis Galton, can be called the first generator of normal random variables. This machine consists of a vertical board with interleaved rows of pins. Small balls are dropped from the top and then bounce randomly left or right as they hit the pins. The balls are collected into bins at the bottom and settle down into a pattern resembling the Gaussian curve.
Planche de Galton
Picturing your data: Normal Distributions
78
Standard Normal Distribution when: μ= 0; σ= 1
𝑓 (𝑥 ;𝜇 ,𝜎2 )= 1𝜎 √2𝜋
𝑒− 12
( 𝑥−𝜇𝜎
)2
Picturing your data: Assessing normality
79
For a normal distribution In SAS: Skewness=0; Kurtosis=0,In some other place: Skewness=0; Kurtosis=3
Statistical summary
Picturing your data: Skewness
80
Skewness>0Mean>median
Skewness<0Mean<median
Picturing your data: Kurtosis
81
Kurtosis measures the tendency of your data to be distributed toward the center or toward the tails of the distribution.
For a symmetric distribution (in SAS):
Kurtosis = 0, Normal;Kurtosis < 0, platykurtic;Kurtosis > 0 leptokurtic
Picturing your data
82
12345
Question 10.
a) Heavy-tailed b) Light-tailedc) Right-skewedd) Normale) Left-skewed
Now that you understand the many different ways your data could be distributed, match each example with its appropriate shape.The blue curve on each histogram represents the shape of the normal distribution with the mean and standard deviation estimated from the sample.
dc
ae
b
Picturing your data
83
Mean kurtosisMedianStandard deviationvariancemodeSkewnessIQR(interquartile range)
Question 11.
a) Location
b) Spread
c) Shape
Match the term to what it measured on the right by dragging and dropping, or by selecting the correct letter.
ac
ba
bacb
Picturing your data: Normal Probability Plot
84
A Normal Probability Plot is a visual method for determining whether your data comes from a distribution that is approximately normal.
actual data values
expected percentile from standard normal
Normal reference line
Picturing your data: Normal Probability Plot
85
1. Normal distribution (the observed data follow the reference line)2. Skewed-to-the –right distribution3. Skewed-to-the –left distribution4. Light-tailed distribution5. Heavy-tailed distribution
Picturing your data: Box Plot
86
1.5
* IQ
R
Outliers> 1.5IQR from the boxLargest point <=1.5IQR from the box
The 75th percentile
Smallest point <=1.5IQR from the box
The 50th percentile
The 25th percentile
The Mean is denoted by a
Picturing your data
87
a) Check if the mean and median are equalb) Check if the skewness and kurtosis are close to 0c) Create histograms and other graphical tools to
visually assess the data d) All of above
Question 12.
Answer: d
Which of the following are ways to check the normality of your data?
Picturing your data
88
Scenario: Assessing the normality
Check the SAT score of XXX school, 80 random selected, the average=1200?
Range FrequencyShape
Normal distribution Mean, Median
Skewness, Kurtosis
Outlier
DemoTask > Describe> Distribution Analysis
Agenda
• 0. Lesson overview• 1. Basic Statistical Concepts• 2. Descriptive Statistics • 3. Picturing your data• 4. Confidence Intervals for the mean• 5. Hypothesis Testing• 6. Summary
89
Confidence Intervals for the mean: Objectives
90
In this topic, we will learn to do the following:
– Define the distribution of sample means and the central limit theorem
– Calculate and intercept standard error of the mean and confidence intervals for the mean
– Use the Summary Statistics task to generate the standard error of the mean and the confidence intervals for the mean
Confidence Intervals for the mean
91
Point estimator
Parameters
Statistics
sr
s2
σρ μ
σ2
μEstimates
s Estimates σ
Point estimator
Confidence Intervals for the mean
92
Variability
=1.0
=1.2 =1.3
=0.8
How much the estimator varies from sample to sample?
Confidence Intervals for the mean
93
Variability
Standard Error
Standard Deviation
Variability of a sample statistic
Variability of data
Confidence Intervals for the mean
94
Variability
Standard Error of the mean
Variability of sample means
𝑠𝒙=𝑠
√𝑛Where S is the sample standard deviationn is the sample size
What will be if n is larger? The larger sampleThe smaller standard errorThe more precise sample mean
Confidence Intervals for the mean
95
Variability
Distribution of sample means
The distribution of sample means is less variable than data
Confidence Intervals for the mean
96
Confidence Interval
95% Confidence
A 95% confidence interval represents a range of values within which you are 95% certain the true population mean exists.• One interpretation is that if 100 different samples were drawn
from the same population and 100 intervals were calculated, approximately 95 of them would contain the population mean.
μ
( )
Confidence Intervals for the mean
97
Confidence Interval
95% Confidence
μ
( )
A confidence interval. Is a range of values that you believe to contain the population parameter of interest.. Is defined by an upper and lower bound around a sample statistic.
To construct a confidence interval, a significant level must be chosen.
Question: Why not 99%?
Answer: The larger interval gives less useful information.
Confidence Intervals for the mean
98
Confidence Interval for the mean
𝑠𝒙=𝑠
√𝑛
𝒙± 𝒕 ∙𝑠 𝒙
,
is the sample meant is the value corresponding to the confidence
level and n-1 degrees of freedom, where n is the sample size.is the standard error of the mean
Or
Student’s t-distribution (or simply the t-distribution) is a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.
Confidence Intervals for the mean
99
Confidence Interval for the mean
𝑠𝒙=𝑠
√𝑛
𝒙± 𝒕 ∙𝑠 𝒙
,
is the sample meant is the value corresponding to the confidence
level and n-1 degrees of freedom, where n is the sample size.is the standard error of the mean
Or
Question: how to decrease the width of the interval?
Answer: way 1: increase sample size nway 2:decrease confidence level
Confidence Intervals for the mean
100
Confidence Interval Sometimes Miss
( )
( ) ( )
Confidence Intervals for the mean
101
Normality and the Central Limit Theorem
To satisfy the assumption of normality, you can do one of the following:• Verify that the population distribution is approximately
normal• Apply the Central Limit Theorem
– The central limit theorem states that the distribution of sample means is approximately normal. Regardless of the population’s shape. If the sample size is large enough.
– “Large enough” is usually about 30 observations. It is more if the data are heavily skewed, and fewer if the data are symmetric.
Confidence Intervals for the mean
102
Normality and the Central Limit Theorem
Data from an exponential distribution 1000 samples of size 5
1000 samples of size 10 1000 samples of size 30
Confidence Intervals for the mean
103
a) The true average SAT scores is significantly different from 1200.b) The true average SAT scores is not significantly different from
1200.c) The true average SAT scores is less than 1200.d) None of the above. You cannot determine statistical significance
from confidence interval.
Question 13.
Answer: b
A 95% confidence interval for SAT scores is (1157.90, 1223.35). What can you conclude from this information?
Confidence Intervals for the mean:Demo
104
Task > Summary Statistics > Statistics > Additional > Confidence Intervals of the mean, 95%
Agenda
• 0. Lesson overview• 1. Basic Statistical Concepts• 2. Descriptive Statistics • 3. Picturing your data• 4. Confidence Intervals for the mean• 5. Hypothesis Testing• 6. Summary
105
Hypothesis Testing
106
Hypothesis Test
Parameters
Statistics
μ= μ0
?
Hypothesis Testing
107
In this topic, we will learn to do the following:
– Design and conduct a hypothesis test– Use the p-value to determine statistical significance– Use the Distribution Analysis task to perform a statistical
hypothesis test– Perform a one-sample, two-sided t-test to determine if the
population mean is significantly different from a known value.
Hypothesis Testing: Decision making process
108
Use a decision rule to make a judgment. If the evidence contradicting the null hypothesis is
– Sufficiently strong to meet the burden if proof (significance level), then reject the null hypothesis
– Not strong enough to meet the burden if proof , then fail to reject the null hypothesis. Be aware that failing to prove guilty does not mean that the defendant is proven innocent. It could mean that the prosecuting attorney did not build a strong enough case to meet the burden of proof.
Define null alternative hypotheses
Specify significance level
Collect data
Reject or fail to reject the null hypothesis
Hypothesis Testing: Coin example
109
Define null alternative hypotheses
Specify significance level
Collect data
Reject or fail to reject the null hypothesis
H0: null hypothesis
The coin is fair
Ha or H1 :alternative hypothesis
The coin is not fair
Hypothesis Testing: Coin example
110
Define null alternative hypotheses
Specify significance level
Collect data
Reject or fail to reject the null hypothesis Amount of evidence needed to reject the null hypothesis
Significant level
𝛼=0.05
Hypothesis Testing: Coin example
111
Define null alternative hypotheses
Specify significance level
Collect data
Reject or fail to reject the null hypothesis
Two headsThree tails
Question: is it possible to get 100 tails?
Hypothesis Testing: Coin example
112
Define null alternative hypotheses
Specify significance level
Collect data
Reject or fail to reject the null hypothesis
Two headsThree tails
100 tails
H0: coin is fair
Not reject H0
Reject H0
Hypothesis Testing: Type of error and Power
113
ACTUAL
Decision H0 is True H0 is False
Fail to Reject Null Correct Type II Error
Reject Null Type I Error Correct
Probability of Type I Error = H0 : coin is fair
Conclude: coin is not fair
Probability of Type II Error =
Probability of Correct Rejection= =Power
An unfair coin is regarded fair
𝛼 𝛽
Hypothesis Testing: Type of error and Power
114
a) Probability the manager concludes that the rate is indeed above 1%, when in reality it is not.
b) Probability the manager concludes that there is no evidence that the rate is above 1%, when in reality it is above 1%.
Question 14.
a
A bank manager is concerned that the percent of loans that contain errors has increased above the acceptable amount of 1%. A significance test is conducted to test his concern (H0 : p<=.01, Ha : p>=.01)
Match these items between left and right:
Type I Error
Type II Error b
Hypothesis Testing:Coin Experiment – Effect Size Influence
115
The effect size refers to the magnitude of the difference in sampled population from the null hypothesis. In this example, the null hypothesis of a fair coin suggests 50% heads 50% tails. If the true coin flipped were actually weighted to give 55% heads, the effect size would be 5%.
55 Heads45 Tails
P-value =.3682
40 Heads60 Tails
P-value =.0569
37 Heads63 Tails
P-value =.0120
15 Heads85 Tails
P-value =.0001
Flip a coin 100 times and decide whether it is fair
A P-value measures the probability of observing a value as extreme or more extreme than the one observed.
Hypothesis Testing:Coin Experiment – Sample Size Influence
116
When the effect size keep constant, the P-value decreases along with the sample size increases.
4 Heads6 Tails
P-value =.7539
16 Heads24 Tails
P-value =.2682
40 Heads60 Tails
P-value =.0569
160 Heads240 Tails
P-value <.0001
Flip a coin and get 40% heads, decide whether it is fair
Hypothesis Testing: Statistical Hypothesis Test
117
Statistical Hypothesis Test
Define null alternative hypotheses
Specify significance level
Collect data
Reject or fail to reject the null hypothesis
H0 : equality, H1 : difference
Set α, 5% for example
Collect data
Reject : P-value < αFail to reject: P-value α
The value of α the probability of Type I error, is specified by the experimenter before collecting data.The strength of the evidence is measured by the a P-value.The P-value is calculated from the collected data.
Hypothesis Testing: Performing a Hypothesis Test
118
To test the null hypothesis H0: μ=μ0 , SAS software calculates the Student’s t statistic value:
The null hypothesis is rejected when the calculated value is more extreme (either positive or negative) than would be expected by chance if H0 were true.
For the test score example:
𝑡=𝑥−μ0𝑠𝑥
𝑡=1190.625−1200
16.4416=−0.5702
Hypothesis Testing: Performing a Hypothesis Test
119
Two sided test of a hypothesis: each tail takes 2.5% for a 5% α
Hypothesis Testing: Performing a Hypothesis Test
120
Why T –test?
a. Normal distributionb. σ unknownc. Small sample < 30
Hypothesis Testing: Performing a Hypothesis Test
121
T-test 1 by Distribution Analysis:
Hypothesis Testing: Performing a Hypothesis Test
122
T-test 2 by Task>ANOVA>T test:
Agenda
• 0. Lesson overview• 1. Basic Statistical Concepts• 2. Descriptive Statistics • 3. Picturing your data• 4. Confidence Intervals for the mean• 5. Hypothesis Testing• 6. Summary
123
Home Work: Exercise 1.1.1 Calculating Basic Statistics in the Summary Statistics Task
The data in Norm Temp comes from an article in the Journal of Statistics Education by Dr. Allen L. Shoemaker. The data questions the notion that the true mean body temperature is 98.6. There are 65 males and 65 females. There is also some question about whether mean body temperatures for women are the same as for men. The variables in the data set are as follows:
ID Identification numberBodyTemp Body temperature (degree Fahrenheit) HeartRate Coded (Male, Female)
a. Open a new process and the NormTemp dataset. Use Summary Statistics Task to answer the next two questions.b. What is the overall mean and standard deviation of BodyTemp in the sample?c. What is the interquartile range of BodyTemp?
124
Home Work: Exercise 2.2.1 Producing Descriptive Statistics Use the NormTemp dataset to answer the following: a. What are the minimum, the maximum, the
mean, and the standard deviation for BodyTemp? Does the variable appear to be normally distributed?
b. Create box plots for BodyTemp. Does the average BodyTemp seem to be 98.6 degrees?
2.2 Multiple Choice PollIn the NormTemp dataset, the distribution of BodyTemp seemed :c. Close to normald. Left skewede. Right skewedf. To have high positive kurtosisg. To have high negative kurtosis
125
BodyTempMinimum Maximum Mean Standard deviation Skewness KurtosisDistribution: Normal Yes/No
Home Work: Exercise 3.3.1 Producing Confidence Intervalsa. Generate the 95% confidence interval for the mean of BodyTemp in
the NormTemp dataset b. Is the assumption of normality met to produce a confidence interval
for this data?c. What are the bounds of the confidence interval?
3.2 Multiple Answer PollThe distribution of sample means is approximately normal if which of the following are true?d. The population is normal.e. The sample size is “large enough.”f. The sample standard deviation is small
126
Home Work: Exercise 4.4.1 Performing a One-Sample t-Testa. Performing a One-Sample t-Test to determine whether the mean of body temperatures
(the variable BodyTemp in the NormTemp dataset) is truly 98.6b. What is the value of the t statistic and the corresponding p-value?c. Produce a confidence interval plot of BodyTemp.d. Do you reject or fail to reject the null hypothesis at the .05 level that the average
temperature is 98.6 degrees?
4.2 Multiple Choice PollA 95% confidence interval for SAT scores is (1157.90, 1223.35). From this, what can you conclude, at alpha=0.05?e. The true average SAT score is significantly different from 1200.f. The true average SAT score is not significantly different from 1200.g. The true average SAT score is less than 1200.h. Non of the above – you cannot determine statistical significance from confidence
intervals.
127
Thank you!
top related