introductory statistical concepts
TRANSCRIPT
-
7/28/2019 Introductory Statistical Concepts
1/118
Introductory StatisticalConcepts
F. Michael Speed, Ph.D.
Department of Statistics
Texas A&M University
-
7/28/2019 Introductory Statistical Concepts
2/118
2
Disclaimer I am not an expert SAS programmer.
Nothing that I say is confirmed or denied by TexasA&M University.
-
7/28/2019 Introductory Statistical Concepts
3/118
3
Why Are We Here?Deming
To Learn
To Have Fun
Question: Who was Deming?
-
7/28/2019 Introductory Statistical Concepts
4/118
4
Poll: What type of organization do you workfor?[PlaceWare Multiple Choice Poll. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]
BusinessGovernment
Education
Nonprofit
Other
-
7/28/2019 Introductory Statistical Concepts
5/118
5
Purpose of These LecturesA review of the statistical concepts used in most of the
SAS Analytics Lecture Series.We will look at questions such as the following:
What is the nature of statistical analyses?
Why are population parameters so important?
What is really being tested when you see ap-value?
Why does regression handle missing data so well?
What are residual analyses?
-
7/28/2019 Introductory Statistical Concepts
6/118
Descriptive Statistics
-
7/28/2019 Introductory Statistical Concepts
7/1187
(Very impo rtant concepts)
Variable of Interest
The Distribution
Parameters
Mean Mode Range
Median Variance
Etc
The Population
-
7/28/2019 Introductory Statistical Concepts
8/1188
Learning OutcomesYou will learn
basic statistical concepts the definition of mean, median, mode and standard deviation
the difference between populations and samples
the difference between parameters and estimates
about confidence intervals
how to test a statistical hypothesis
how to run a regression analysis
-
7/28/2019 Introductory Statistical Concepts
9/1189
ParametersCharacteristics of the variable of interest
It is how we describe the variable of interest
Parameters are unknown
-
7/28/2019 Introductory Statistical Concepts
10/11810
Parameters
(Characteristics)
Central Tendency
Mode
Median
Mean
Measures of Variability
Range
Variance
Standard Deviation
Click Here for more information on Mode Mean Median
Click Here for an applet
http://dist.stat.tamu.edu/pub/speed/mean_mode.htmhttp://www.stat.tamu.edu/~west/ph/meanmedian.htmlhttp://www.stat.tamu.edu/~west/ph/meanmedian.htmlhttp://dist.stat.tamu.edu/pub/speed/mean_mode.htm -
7/28/2019 Introductory Statistical Concepts
11/118
Variability
Change in the Data
-
7/28/2019 Introductory Statistical Concepts
12/11812
What is an Index ?
How SUNNY is SUNNY?
THE UV Index
Click Here
http://www.epa.gov/sunwise/uviscale.htmlhttp://www.epa.gov/sunwise/uviscale.html -
7/28/2019 Introductory Statistical Concepts
13/11813
Air Quality Index
What Does It Mean?
-
7/28/2019 Introductory Statistical Concepts
14/11814
DOW JONES INDUSTRIAL AVERAGE INDEX
What does 10,971.16 really mean?
What is better a DJIA of 10,000
Or a DJIA of 12,000?
-
7/28/2019 Introductory Statistical Concepts
15/11815
Variability IndexA Simple One
Find the Largest Value
Find the Smallest Value
Let Range = R = Largest Smallest
-
7/28/2019 Introductory Statistical Concepts
16/11816
A More Complex Variation Index
The Standard Deviation
Statisticians use this index to indicate variability
You will see it written as
Widely available from SAS, Excel, and other statistical packages
or S or s
-
7/28/2019 Introductory Statistical Concepts
17/118
17
Details of the More Complex IndexExample Suppose that we observe the following three numbers
1 4 7
The mean of these number is:
( 1 +4+7)/3 = 4
We now subtract the mean from each number and square it
(1-4)*(1-4) + (4-4)*(4-4) +(7-4)*(7-4) = 18
The Standard Deviation = sqrt(18/2) = 3
-
7/28/2019 Introductory Statistical Concepts
18/118
18
What does this Mean?By itself , it may be confusing to some.
Comparing populations, we can use it to say which
population varies the most.
Let us look at an appletClick Here
http://www.stat.tamu.edu/~west/ph/stddev.htmlhttp://www.stat.tamu.edu/~west/ph/stddev.html -
7/28/2019 Introductory Statistical Concepts
19/118
19
Using Graphs to Determine Variability
Box Plot
Click Here
3535N =
State
NEW_YORKCALIFORN
400000
300000
200000
100000
0
http://www.netmba.com/statistics/plot/box/http://www.netmba.com/statistics/plot/box/http://www.netmba.com/statistics/plot/box/ -
7/28/2019 Introductory Statistical Concepts
20/118
20
Describe What Is Happening
You are giving the parameters of the picture
-
7/28/2019 Introductory Statistical Concepts
21/118
21
Example Using SAS
-
7/28/2019 Introductory Statistical Concepts
22/118
Distributions
-
7/28/2019 Introductory Statistical Concepts
23/118
23
Known DistributionWith a known distribution, we know the following:
the shape
the mean
the variability (standard deviation)
and/or some other information
-
7/28/2019 Introductory Statistical Concepts
24/118
24
Classical DistributionsNormal
-
7/28/2019 Introductory Statistical Concepts
25/118
25
NormalOverlay
-
7/28/2019 Introductory Statistical Concepts
26/118
26
Classical DistributionsUniform
-
7/28/2019 Introductory Statistical Concepts
27/118
27
UniformOverlay
-
7/28/2019 Introductory Statistical Concepts
28/118
28
Classical DistributionsChi-Square
-
7/28/2019 Introductory Statistical Concepts
29/118
29
SurveyThe following are called parameters of the population:
mean, median, mode
variance, standard deviation, range, inter-quartile
range (IQR)
In general, are these known or unknown?
Known = yes (select using your seat indicator)
Unknown = no (select using your seat indicator)
-
7/28/2019 Introductory Statistical Concepts
30/118
30
Generate a Sample from a Known Distribution
Why? This is a simulation.
It helps us to understand a process or analyses.
It helps to see if we are getting expected results.
It is fun.
-
7/28/2019 Introductory Statistical Concepts
31/118
31
MPG ExampleSuppose we want to simulate mpg for a car that weighs
3000 lbs.
Let us assume that the mean mpg=24.
Let us assume that the standard deviation=1 mpg.
We will generate a number from the normal distributionwith mean 0 and standard deviation=1.
We will then add (subtract) that number from 24.
-
7/28/2019 Introductory Statistical Concepts
32/118
32
MPGComposition
Let us generate 1000 mpg.
Observed +/-= Essential Part24 LeftoversN(0,1)
-
7/28/2019 Introductory Statistical Concepts
33/118
-
7/28/2019 Introductory Statistical Concepts
34/118
34
Simulated MPG
-
7/28/2019 Introductory Statistical Concepts
35/118
35
MPGHistogram
Compare withtrue values !
-
7/28/2019 Introductory Statistical Concepts
36/118
36
Simulated SampleIn this example, we simulated taking a sample of size
1000 from one population of cars weighing 3000 poundswith a normal distribution with mean=24 and standard
deviation=1.
You can practice this after class.
-
7/28/2019 Introductory Statistical Concepts
37/118
37
After Class PracticeSimulate 1000 data points for each of the following five
populations. Run and explore your data.
-
7/28/2019 Introductory Statistical Concepts
38/118
38
SAS Code 4: Generate a Normal withMean 0 and Standard Deviation of s=1.5
data mpg1;s=1.5;mean = 24;do i=1 to 1000;
lo =s*normal(-1);
mpg = mean + lo;output;end;
run;
-
7/28/2019 Introductory Statistical Concepts
39/118
39
This demonstration illustrates how to simulate
data for a given population.
Simulating DataSAS_Code4.sas
-
7/28/2019 Introductory Statistical Concepts
40/118
40
View/Application Share: Demo: Simulation[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]
-
7/28/2019 Introductory Statistical Concepts
41/118
41
Summary
-
7/28/2019 Introductory Statistical Concepts
42/118
42
-
7/28/2019 Introductory Statistical Concepts
43/118
Section 1.2
Populations and Samples
-
7/28/2019 Introductory Statistical Concepts
44/118
44
Objectives Understand the relationships between
populations and samples parameters and estimates.
Look at an overview of hypotheses testing.
-
7/28/2019 Introductory Statistical Concepts
45/118
45
Population
Mean, Variance, Median,
Mode, Distribution,
Parameters
-
7/28/2019 Introductory Statistical Concepts
46/118
46
ExampleMpg of American-made cars that weigh between 2000
and 3500 pounds and were built in the 1970s.
Parameters mean, variance, and so on
In general, we do not know the parameters.
-
7/28/2019 Introductory Statistical Concepts
47/118
47
Purpose of Statistical Analyses Estimate the parameters. (Make guesses.)
Example: What is the population mean?
Test hypothesis about the parameters. (Ask questions.)
Example: Is the population mean=30mpg?
-
7/28/2019 Introductory Statistical Concepts
48/118
48
Role of Samples Taking a sample of the population enables you to
make estimates of the population parameters answer the questions about the population
parameters.
-
7/28/2019 Introductory Statistical Concepts
49/118
49
Population and Sample
Mean, Variance, Median,
Mode, Distribution,
Parameters
Sample mean
Sample variance
Sample
S
Inference:
Estimates
Test of hypotheses
-
7/28/2019 Introductory Statistical Concepts
50/118
-
7/28/2019 Introductory Statistical Concepts
51/118
51
Results of Summary Statistics
-
7/28/2019 Introductory Statistical Concepts
52/118
52
Results of Histogram
continued...
-
7/28/2019 Introductory Statistical Concepts
53/118
53
Results of Histogram
-
7/28/2019 Introductory Statistical Concepts
54/118
54
This demonstration illustrates how to estimate
and plot the sampling distribution of variousstatistics.
Sampling Distribution Appletsampling_dist
-
7/28/2019 Introductory Statistical Concepts
55/118
55
View/Application Share: Demo: SamplingDistributions Applet[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]
-
7/28/2019 Introductory Statistical Concepts
56/118
56
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.h...[PlaceWare Web Page. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]
-
7/28/2019 Introductory Statistical Concepts
57/118
57
Confidence Intervals on the Population MeanLevel of Comfort
50% {21.57 to 22.21}
95% {20.96 to 22.82}
99.9% {20.30 to 23.48}
What does this mean?
-
7/28/2019 Introductory Statistical Concepts
58/118
58
Test That the Population Mean = 30 mpgUse t-test One Sample t-test
Requirements for running this test:
Large n > 35
Or leftovers are normal
What is thep-value or sig value?
-
7/28/2019 Introductory Statistical Concepts
59/118
59
Testing Mean = 30
: 30
: 30
o mpg
A mpg
H
H
-
7/28/2019 Introductory Statistical Concepts
60/118
60
Conclusions of the TestChoose an alpha level, usually alpha=.05.
If sig
-
7/28/2019 Introductory Statistical Concepts
61/118
61
Sig andp-valuesWhen you see a sig value orp-value:
You know that some hypothesis is being tested. You know whether or not the hypothesis is being
rejected.
You probably do not know what the hypothesis really
is.
Ask yourself these questions:
What are the population parameters being tested?
How is what is being tested related to those
parameters?
-
7/28/2019 Introductory Statistical Concepts
62/118
62
Requirements for Doing This TestLarge n n > 35
Or leftovers are normally distributed.
Use Histogram to test for normality.
-
7/28/2019 Introductory Statistical Concepts
63/118
63
This demonstration illustrates the testing of
hypotheses using the data setcars_american.
Testing Hypotheses
Vi /A li i Sh D T i
-
7/28/2019 Introductory Statistical Concepts
64/118
64
View/Application Share: Demo: TestingHypotheses[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]
-
7/28/2019 Introductory Statistical Concepts
65/118
65
P l ti Whi h O Si il ?
-
7/28/2019 Introductory Statistical Concepts
66/118
66
PopulationsWhich Ones are Similar?
P l ti Whi h O Si il ?
-
7/28/2019 Introductory Statistical Concepts
67/118
67
PopulationsWhich Ones are Similar?Take samples.
T k S l
-
7/28/2019 Introductory Statistical Concepts
68/118
68
Take SamplesUse the samples to answer this question:
Which populations are similar?
Statistical translations:
Which populations are similar? is the same as asking
Are the following the same:
distribution?
mean?
variance?
B k d/R i t
-
7/28/2019 Introductory Statistical Concepts
69/118
69
Background/RequirementsBefore we jump into the analysis, we must ask the
following questions: How many populations are there?
How many population parameters are we interested in
and what are they?
What tests do we want to do, and what are therequirements for doing those?
Are we using everything we know?
E l
-
7/28/2019 Introductory Statistical Concepts
70/118
70
ExampleSuppose that we are interested in the mpg of American
and European cars. How many populations are there?
American Cars
Mpg
DistributionMean
Variance
European Cars
Mpg
DistributionMean
Variance
P ll H l ti th ?
-
7/28/2019 Introductory Statistical Concepts
71/118
71
Poll: How many populations are there?[PlaceWare Multiple Choice Poll. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]
One - MPG
Two - American and European
Depends on the sample size
P t
-
7/28/2019 Introductory Statistical Concepts
72/118
72
Parameters
Population 1 Population 2
American Cars European Cars
Variable of interest: mpg Variable of interest: mpg
Distribution: Normal? Distribution: Normal?
Mean: Mean:
Variance: Variance:
A
E
2
A
2
E
A l
-
7/28/2019 Introductory Statistical Concepts
73/118
73
Analyses1. We want to look at the distributions.
2. We want to estimate the parameters.3. We want to answer these questions:
Are the populations means the same?
Are the population variances the same?
Example: Our Data Set car am eu
-
7/28/2019 Introductory Statistical Concepts
74/118
74
Example: Our Data Set car_am_euSuppose that we are interested in the mpg of American
and European cars.
Sample
American Cars
Mpg
DistributionMean
Variance
European Cars
Mpg
DistributionMean
Variance
Sample
Results from the Sample
-
7/28/2019 Introductory Statistical Concepts
75/118
75
Results from the Sample
continued...
-
7/28/2019 Introductory Statistical Concepts
76/118
Box Plots
-
7/28/2019 Introductory Statistical Concepts
77/118
77
Box Plots
American European
Histograms
-
7/28/2019 Introductory Statistical Concepts
78/118
78
Histograms
American
European
Poll: Are the populations the same?
-
7/28/2019 Introductory Statistical Concepts
79/118
79
Poll: Are the populations the same?[PlaceWare Yes/No Poll. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]
Yes
No
Conclusion Based on Sample Numbers and
-
7/28/2019 Introductory Statistical Concepts
80/118
80
Conclusion Based on Sample Numbers andGraphs
Easy -- Based on the samples, the populations are
differentno statistical jargon
But I must have ap-value for my boss, for my paper, and
so on.
Formal Tests
-
7/28/2019 Introductory Statistical Concepts
81/118
81
Formal TestsThe classical approach in determining whether two
populations are the same is to test to see whether the twopopulation means are equal.
But first we check to see whether the two population
variances are equal:
2 2:o A EH :
o A E
H
continued...
Formal Tests
-
7/28/2019 Introductory Statistical Concepts
82/118
82
Formal TestsWe use t-test Two Sample.
Test 2
Test 1
-
7/28/2019 Introductory Statistical Concepts
83/118
83
This demonstration shows how to compare two
populations using the data set car_am_eu.
Comparing Two Populations
View/Application Share: Demo: Comparing
-
7/28/2019 Introductory Statistical Concepts
84/118
84
View/Application Share: Demo: ComparingTwo populations[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]
Example
-
7/28/2019 Introductory Statistical Concepts
85/118
85
Example
1. Run summary statistics.
2. Ask for histogram and box plot.
What do you get?
data temp1;x = 1;output;
run;
-
7/28/2019 Introductory Statistical Concepts
86/118
-
7/28/2019 Introductory Statistical Concepts
87/118
87
-
7/28/2019 Introductory Statistical Concepts
88/118
Section 1.3
Simple Linear Regression
Objectives
-
7/28/2019 Introductory Statistical Concepts
89/118
89
Objectives Identify the following:
the population parameters the appropriate model
number of populations sampled
the correct hypotheses
what should be tested for normality
what equal variances means.
MPG Example
-
7/28/2019 Introductory Statistical Concepts
90/118
90
MPG Example
Weight = 3000
1
2
1
3
2
3
2
2
2
4
2
4
Weight = 2600
Weight = 2900Weight = 2300
Take a sample of
size 1 from each
population!
Data
-
7/28/2019 Introductory Statistical Concepts
91/118
91
DataWe should be in deep trouble with one sample from each
population.We have eight unknown population parameters.
Can you name them?
But what do we know?
Survey
-
7/28/2019 Introductory Statistical Concepts
92/118
92
SurveyName the population parameters.
Essential Part and Leftovers
-
7/28/2019 Introductory Statistical Concepts
93/118
93
Essential Part and LeftoversWe want to model the data as follows:
MPG = Essential Part + Leftover
or
MPG = Mean + Leftover
Know or Assumptions
-
7/28/2019 Introductory Statistical Concepts
94/118
94
Know or AssumptionsFirst, we know that
Second, each population mean is related to weight by the
following:
The population means fall on a straight line!!
How many unknowns are there now?
2 2 2 2 21 2 3 4
i
*
ia b weight
Poll: How many unknowns are there?
-
7/28/2019 Introductory Statistical Concepts
95/118
95
Poll: How many unknowns are there?[PlaceWare Multiple Choice Poll. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]
1
2
3
4
5n
Graph
-
7/28/2019 Introductory Statistical Concepts
96/118
96
Graph
Observed, Essential Part, Leftover
-
7/28/2019 Introductory Statistical Concepts
97/118
97
Observed, Essential Part, Leftover
The Official Regression Model
-
7/28/2019 Introductory Statistical Concepts
98/118
98
or
or
or
mpg = a + b*weight+leftover
The Official Regression Model
The errors are known
to be normal with mean0 and variance .2
mpg = a + b*weight+error
mpg = a + b*weight+
o 1
mpg = + *weight+
-
7/28/2019 Introductory Statistical Concepts
99/118
-
7/28/2019 Introductory Statistical Concepts
100/118
100
This demonstration illustrates the fundamental
concepts of simple linear regression.
Assumptions for Simple
Linear RegressionAppendix A
View/Application Share: Demo: Linear.doc
-
7/28/2019 Introductory Statistical Concepts
101/118
101
View/Application Share: Demo: Linear.doc[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]
How Can We Estimate the Unknown
-
7/28/2019 Introductory Statistical Concepts
102/118
102
Parameters?
The Principle of Least Squares:
or
or
Now, choose a andb so that is as small as
possible.
or
Minimize .
i i i
leftover mpg (a+b*weight )
i i i
Let leftover mpg (essential part)
i i i
r mpg (a+b*weight )
2 2 2 2
1 2 3 4r r r r
1(r2 2 2 2
2 3 4r r r )
-
7/28/2019 Introductory Statistical Concepts
103/118
-
7/28/2019 Introductory Statistical Concepts
104/118
104
This demonstration references David Lanes
applet at Rice University.
Regression AppletReg_by_eye
View/Application Share: Demo: David Lane's
-
7/28/2019 Introductory Statistical Concepts
105/118
105
ppApplet[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]
http://www.ruf.rice.edu/~lane/stat_sim/reg_by_
-
7/28/2019 Introductory Statistical Concepts
106/118
106
p _ g_ y_eye[PlaceWare Web Page. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]
-
7/28/2019 Introductory Statistical Concepts
107/118
View/Application Share: Demo: Output of SAS
-
7/28/2019 Introductory Statistical Concepts
108/118
108
pp pRegression[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]
OUTPUT_0
-
7/28/2019 Introductory Statistical Concepts
109/118
109
OUTPUT
-
7/28/2019 Introductory Statistical Concepts
110/118
110
-
7/28/2019 Introductory Statistical Concepts
111/118
OUTPUT_2
-
7/28/2019 Introductory Statistical Concepts
112/118
112
OUTPUT_3
-
7/28/2019 Introductory Statistical Concepts
113/118
113
OUTPUT_4
-
7/28/2019 Introductory Statistical Concepts
114/118
114
Missing Values
-
7/28/2019 Introductory Statistical Concepts
115/118
115
Suppose that we want to estimate the mean mpg when
weight=2500.
Predicted (Estimated) Mean MPG = 44.05 - .0078*weight
Why does this work?
Survey
-
7/28/2019 Introductory Statistical Concepts
116/118
116
Can anyone explain why this works?
-
7/28/2019 Introductory Statistical Concepts
117/118
-
7/28/2019 Introductory Statistical Concepts
118/118