![Page 1: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/1.jpg)
Stats 845
Applied Statistics
![Page 2: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/2.jpg)
This Course will cover:
1. Regression– Non Linear Regression– Multiple Regression
2. Analysis of Variance and Experimental Design
![Page 3: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/3.jpg)
The Emphasis will be on:
1. Learning Techniques through example:
2. Use of common statistical packages.• SPSS• Minitab• SAS• SPlus
![Page 4: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/4.jpg)
What is Statistics?
It is the major mathematical tool of scientific inference - the art of drawing conclusion from data. Data that is to some extent corrupted by some component of random variation (random noise)
![Page 5: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/5.jpg)
An analogy can be drawn to data that is affected by random components of variation to signals that are corrupted by noise.
![Page 6: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/6.jpg)
Quite often sounds that are heard or received by some radio receiver can be thought of as signals with superimposed noise.
![Page 7: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/7.jpg)
The objective in signal theory is to extract the signal from the received sound (i.e. remove the noise to the greatest extent possible). The same is true in data analysis.
![Page 8: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/8.jpg)
Example A:
Suppose we are comparing the effect of three different diets on weight loss.
![Page 9: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/9.jpg)
An observation on weight loss can be thought of as being made up of two components:
![Page 10: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/10.jpg)
1. A component due to the effect of the diet being applied to the subject (the signal)
2. A random component due to other factors affecting weight loss not considered (initial weight of the subject, sex of the subject, metabolic makeup of the subject.) random noise.
![Page 11: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/11.jpg)
Note:
that random assignment of subjects to diets will ensure that this component will be a random effect.
![Page 12: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/12.jpg)
Example B In this example we again are comparing the effect of three diets on weight gain. Subjects are randomly divided into three groups. Diets are randomly distributed amongst the groups. Measurements on weight gain are taken at the following times -
- one month- two months - 6 months and - 1 year
after commencement of the diet.
![Page 13: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/13.jpg)
In addition to both the factors Time and Diet effecting weight gain there are two random sources of variation (noise)
- between subject variation and
- within subject variation
![Page 14: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/14.jpg)
This can be illustrated in a schematic fashion as follows:
Deterministic factorsDietTime
Random Noisewithin subject
between subject
Responseweight gain
![Page 15: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/15.jpg)
The circle of Research
Questions arise about a phenomenon
A decision is made to collect data
A decision is made as how to collect the
data
The data is collected
The data is summarized and
analyzed
Conclusion are drawn from the analysis
StatisticsStatistics
![Page 16: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/16.jpg)
Notice the two points on the circle where statistics plays an important role:1.The analysis of the collected data.
2.The design of a data collection procedure
![Page 17: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/17.jpg)
The analysis of the collected data.
• This of course is the traditional use of statistics. • Note that if the data collection procedure is well
thought out and well designed, the analysis step of the research project will be straightforward.
• Usually experimental designs are chosen with the statistical analysis already in mind.
• Thus the strategy for the analysis is usually decided upon when any study is designed.
![Page 18: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/18.jpg)
• It is a dangerous practice to select the form of analysis after the data has been collected ( the choice may to favour certain pre-determined conclusions and therefore in a considerable loss in objectivity )
• Sometimes however a decision to use a specific type of analysis has to be made after the data has been collected (It was overlooked at the design stage)
![Page 19: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/19.jpg)
The design of a data collection procedure
• the importance of statistics is quite often ignored at this stage.
• It is important that the data collection procedure will eventually result in answers to the research questions.
![Page 20: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/20.jpg)
• And will result in the most accurate answers for the resources available to research team.
• Note the success of a research project should not depend on the answers that it comes up with but the accuracy of the answers.
• This fact is usually an indicator of a valuable research project..
![Page 21: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/21.jpg)
Some definitions
important to Statistics
![Page 22: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/22.jpg)
A population:
this is the complete collection of subjects (objects) that are of interest in the study.
There may be (and frequently are) more than one in which case a major objective is that of comparison.
![Page 23: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/23.jpg)
A case (elementary sampling unit):
This is an individual unit (subject) of the population.
![Page 24: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/24.jpg)
A variable:
a measurement or type of measurement that is made on each individual case in the population.
![Page 25: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/25.jpg)
Types of variables Some variables may be measured on a numerical scale while others are measured on a categorical scale.
The nature of the variables has a great influence on which analysis will be used. .
![Page 26: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/26.jpg)
For Variables measured on a numerical scale the measurements will be numbers.
Ex: Age, Weight, Systolic Blood Pressure
For Variables measured on a categorical scale the measurements will be categories.
Ex: Sex, Religion, Heart Disease
![Page 27: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/27.jpg)
Types of variables
In addition some variables are labeled as dependent variables and some variables are labeled as independent variables.
![Page 28: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/28.jpg)
This usually depends on the objectives of the analysis.
Dependent variables are output or response variables while the independent variables are the input variables or factors.
![Page 29: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/29.jpg)
Usually one is interested in determining equations that describe how the dependent variables are affected by the independent variables
![Page 30: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/30.jpg)
A sample:
Is a subset of the population
![Page 31: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/31.jpg)
Types of Samples
different types of samples are determined by how the sample is selected.
![Page 32: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/32.jpg)
Convenience Samples
In a convenience sample the subjects that are most convenient to the researcher are selected as objects in the sample.
This is not a very good procedure for inferential Statistical Analysis but is useful for exploratory preliminary work.
![Page 33: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/33.jpg)
Quota samples
In quota samples subjects are chosen conveniently until quotas are met for different subgroups of the population.
This also is useful for exploratory preliminary work.
![Page 34: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/34.jpg)
Random Samples
Random samples of a given size are selected in such that all possible samples of that size have the same probability of being selected.
![Page 35: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/35.jpg)
Convenience Samples and Quota samples are useful for preliminary studies. It is however difficult to assess the accuracy of estimates based on this type of sampling scheme.
Sometimes however one has to be satisfied with a convenience sample and assume that it is equivalent to a random sampling procedure
![Page 36: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/36.jpg)
A population statistic (parameter):
Any quantity computed from the values of variables for the entire population.
![Page 37: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/37.jpg)
A sample statistic:
Any quantity computed from the values of variables for the cases in the sample.
![Page 38: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/38.jpg)
Statistical Decision Making
![Page 39: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/39.jpg)
• Almost all problems in statistics can be formulated as a problem of making a decision .
• That is given some data observed from some phenomena, a decision will have to be made about the phenomena
![Page 40: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/40.jpg)
Decisions are generally broken into two types:
• Estimation decisions
and
• Hypothesis Testing decisions.
![Page 41: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/41.jpg)
Probability Theory plays a very important role in these decisions and the assessment of error made by these decisions
![Page 42: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/42.jpg)
Definition:
A random variable X is a numerical quantity that is determined by the outcome of a random experiment
![Page 43: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/43.jpg)
Example :
An individual is selected at random from a population
and
X = the weight of the individual
![Page 44: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/44.jpg)
The probability distribution of a random variable (continuous) is describe by:
its probability density curve f(x).
![Page 45: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/45.jpg)
i.e. a curve which has the following properties :• 1. f(x) is always positive.
• 2. The total are under the curve f(x) is one.
• 3. The area under the curve f(x) between a and b is the probability that X lies between the two values.
![Page 46: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/46.jpg)
0
0.005
0.01
0.015
0.02
0.025
0 20 40 60 80 100 120
f(x)
![Page 47: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/47.jpg)
Examples of some important Univariate distributions
![Page 48: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/48.jpg)
1.The Normal distribution A common probability density curve is the “Normal” density curve - symmetric and bell shaped Comment: If = 0 and = 1 the distribution is called the standard normal distribution
0
0.005
0.01
0.015
0.02
0.025
0.03
0 20 40 60 80 100 120
Normal distribution with = 50 and =15
Normal distribution with = 70 and =20
![Page 49: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/49.jpg)
f(x) 1
2e
x 2
2 2
![Page 50: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/50.jpg)
2.The Chi-squared distribution with degrees of freedom
0 x if2
1)( 2/2/)2(
2/2
xexxf
![Page 51: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/51.jpg)
2 4 6 8 10 12 14
0.1
0.2
0.3
0.4
0.5
![Page 52: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/52.jpg)
Comment: If z1, z2, ..., z are
independent random variables each having a standard normal distribution then
U =
has a chi-squared distribution with degrees of freedom.
222
21 zzz
![Page 53: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/53.jpg)
3. The F distribution with degrees of freedom in the
numerator and degrees of
freedom in the denominator if x 0
where K =
f(x) K x (1 2)2 11
2
x
12 / 2
1 2
2
1
2
1 / 2
1
2
2
2
![Page 54: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/54.jpg)
F dist
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 2 3 4 5 6
![Page 55: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/55.jpg)
Comment: If U1 and U2 are independent random variables each having Chi-squared distribution with 1 and 2 degrees of freedom respectively then
F =
has a F distribution with degrees of freedom in the numerator and degrees of freedom in the denominator
U1 1
U 2 2
![Page 56: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/56.jpg)
4.The t distribution with degrees of freedom
where K =
f(x) K 1x2
1 / 2
12
2
![Page 57: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/57.jpg)
-4 -2 2 4
0.1
0.2
0.3
0.4
![Page 58: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/58.jpg)
Comment: If z and U are independent random variables, and z has a standard Normal distribution while U has a Chi-squared distribution with degrees of freedom then
t =
has a t distribution with degrees of freedom.
z
U
![Page 59: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/59.jpg)
• An Applet showing critical values and tail probabilities for various distributions
1. Standard Normal
2. T distribution
3. Chi-square distribution
4. Gamma distribution
5. F distribution
![Page 60: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/60.jpg)
The Sampling distribution of a statistic
![Page 61: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/61.jpg)
A random sample from a probability distribution, with density function f(x) is a collection of n independent random variables, x1, x2, ...,xn with a
probability distribution described by f(x).
![Page 62: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/62.jpg)
If for example we collect a random sample of individuals from a population and
– measure some variable X for each of those individuals,
– the n measurements x1, x2, ...,xn will
form a set of n independent random variables with a probability distribution equivalent to the distribution of X across the population.
![Page 63: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/63.jpg)
A statistic T is any quantity computed from the random observations x1, x2, ...,xn.
![Page 64: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/64.jpg)
• Any statistic will necessarily be also a random variable and therefore will have a probability distribution described by some probability density function fT(t).
• This distribution is called the sampling distribution of the statistic T.
![Page 65: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/65.jpg)
• This distribution is very important if one is using this statistic in a statistical analysis.
• It is used to assess the accuracy of a statistic if it is used as an estimator.
• It is used to determine thresholds for acceptance and rejection if it is used for Hypothesis testing.
![Page 66: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/66.jpg)
Some examples of Sampling distributions of statistics
![Page 67: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/67.jpg)
Distribution of the sample mean for a
sample from a Normal popululation
Let x1, x2, ...,xn is a sample from a normal
population with mean and standard deviation
Let
x x i
i
n
![Page 68: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/68.jpg)
Than
has a normal sampling distribution with mean
and standard deviation
x x i
i
n
x
x n
![Page 69: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/69.jpg)
0
20 40 60 80 100
![Page 70: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/70.jpg)
Distribution of the z statistic
Let x1, x2, ...,xn is a sample from a normal
population with mean and standard deviation
Let
Then z has a standard normal distibution
n
xz
![Page 71: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/71.jpg)
Comment:
Many statistics T have a normal distribution with mean T and standard deviation T. Then
will have a standard normal distribution.
z T T
T
![Page 72: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/72.jpg)
Distribution of the 2 statistic for sample variance
Let x1, x2, ...,xn is a sample from a normal population with mean and standard deviation Let
= sample variance
and
= sample standard deviation
1
2
2
n
xxs i
i
1
2
n
xxs i
i
![Page 73: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/73.jpg)
Let
Then 2 has chi-squared distribution with = n-1 degrees of freedom.
2 x i x 2
i
2 (n 1)s2
2
![Page 74: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/74.jpg)
0
0.5
0 4 8 12 16 20 24
The chi-squared distribution
![Page 75: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/75.jpg)
Distribution of the t statistic
Let x1, x2, ...,xn is a sample from a normal population with mean and standard deviation Let
then t has student’s t distribution with = n-1 degrees of freedom
t x s
n
![Page 76: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/76.jpg)
Comment:
If an estimator T has a normal distribution with mean T and standard deviation T.
If sT is an estimatior of T based on degrees of freedom Then
will have student’s t distribution with degrees of freedom. .
t T T
s T
![Page 77: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/77.jpg)
t distribution
standard normal distribution
![Page 78: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/78.jpg)
Point estimation
• A statistic T is called an estimator of the parameter if its value is used as an estimate of the parameter .
• The performance of an estimator T will be determined by how “close” the sampling distribution of T is to the parameter, , being estimated.
![Page 79: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/79.jpg)
• An estimator T is called an unbiased estimator of if T, the mean of the
sampling distribution of T satisfies T = .
• This implies that in the long run the average value of T is .
![Page 80: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/80.jpg)
• An estimator T is called the Minimum Variance Unbiased estimator of if T is an unbiased estimator and it has the smallest standard error T amongst all unbiased
estimators of .
• If the sampling distribution of T is normal, the standard error of T is extremely important. It completely describes the variability of the estimator T.
![Page 81: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/81.jpg)
Interval Estimation (confidence intervals)
• Point estimators give only single values as an estimate. There is no indication of the accuracy of the estimate.
• The accuracy can sometimes be measured and shown by displaying the standard error of the estimate.
![Page 82: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/82.jpg)
• There is however a better way.
• Using the idea of confidence interval estimates
• The unknown parameter is estimated with a range of values that have a given probability of capturing the parameter being estimated.
![Page 83: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/83.jpg)
• The interval TL to TU is called a (1 - ) 100 % confidence interval for the parameter , if the probability that lies in the range TL to TU is equal to 1 -
• Here , TL to TU , are
– statistics – random numerical quantities calculated from
the data.
Confidence Intervals
![Page 84: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/84.jpg)
Examples Confidence interval for the mean of a Normal population
(based on the z statistic).
is a (1 - ) 100 % confidence interval for , the mean of a normal population.
Here z/2 is the upper /2 100 % percentage point of the
standard normal distribution.
TL x z / 2
n
to TU x z / 2
n
![Page 85: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/85.jpg)
More generally if T is an unbiased estimator of the parameter and has a normal sampling distribution with known standard error T then
is a (1 - ) 100 % confidence interval for .
TL T z / 2T to TU T z / 2 T
![Page 86: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/86.jpg)
Confidence interval for the mean of a Normal population (based on the t statistic).
is a (1 - ) 100 % confidence interval for , the mean of a normal population.
Here t/2 is the upper /2 100 % percentage point of the Student’s t distribution with = n-1 degrees of freedom.
TL x t / 2
s
n to TU x t / 2
s
n
![Page 87: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/87.jpg)
More generally if T is an unbiased estimator of the parameter and has a normal sampling distribution with estmated standard error sT, based on n degrees of freedom, then
is a (1 - ) 100 % confidence interval for .
TL T t / 2s T to TU T t / 2s T
![Page 88: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/88.jpg)
Situation Confidence interval Sample form the Normal distribution with unknown mean and known variance (Estimating ) (n large) n
zx 02/
Sample form the Normal distribution with unknown mean and unknown variance (Estimating )(n small)
n
stx 2/
Estimation of a binomial probability p
n
ppzp
)ˆ1(ˆˆ 2/
Two independent samples from the Normal distribution with unknown means and known variances (Estimating 1 - 2) (n,m large) m
s
n
szyx yx
22
2/
Two independent samples from the Normal distribution with unknown means and unknown but equal variances. (Estimating 1 - 2) ) (n,m small) mn
styx Pooled
112/
Estimation of a the difference between two binomial probabilities, p1-p2
2
22
1
112/21
)ˆ1(ˆ)ˆ1(ˆˆˆ
n
pp
n
ppzpp
Common Confidence intervals
![Page 89: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/89.jpg)
Multiple Confidence intervals
In many situations one is interested in estimating not only a single parameter, , but a collection of parameters, 1, 2, 3, ... .
A collection of intervals, TL1 to TU1, TL2 to TU2, TL3 to
TU3, ... are called a set of (1 - ) 100 % multiple
confidence intervals if the probability that all the intervals capture their respective parameters is 1 -
![Page 90: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/90.jpg)
Hypothesis Testing
• Another important area of statistical inference is that of Hypothesis Testing.
• In this situation one has a statement (Hypothesis) about the parameter(s) of the distributions being sampled and one is interested in deciding whether the statement is true or false.
![Page 91: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/91.jpg)
• In fact there are two hypotheses – The Null Hypothesis (H0) and
– the Alternative Hypothesis (HA).
• A decision will be made either to – Accept H0 (Reject HA) or to
– Reject H0 (Accept HA). The following table
gives the different possibilities for the decision and the different possibilities for the correctness of the decision
![Page 92: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/92.jpg)
• The following table gives the different possibilities for the decision and the different possibilities for the correctness of the decision
Accept H0 Reject H0
H0
is true
Correct Decision
Type I error
H0
is false
Type II error
Correct Decision
![Page 93: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/93.jpg)
• Type I error - The Null Hypothesis H0 is
rejected when it is true.
• The probability that a decision procedure makes a type I error is denoted by , and is sometimes called the significance level of the test.
• Common significance levels that are used are = .05 and = .01
![Page 94: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/94.jpg)
• Type II error - The Null Hypothesis H0 is
accepted when it is false.
• The probability that a decision procedure makes a type II error is denoted by .
• The probability 1 - is called the Power of the test and is the probability that the decision procedure correctly rejects a false Null Hypothesis.
![Page 95: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/95.jpg)
A statistical test is defined by
• 1. Choosing a statistic for making the decision to Accept or Reject H0. This
statisitic is called the test statistic.
• 2. Dividing the set of possible values of the test statistic into two regions - an Acceptance and Critical Region.
![Page 96: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/96.jpg)
• If upon collection of the data and evaluation of the test statistic, its value lies in the Acceptance Region, a decision is made to accept the Null Hypothesis H0.
• If upon collection of the data and evaluation of the test statistic, its value lies in the Critical Region, a decision is made to reject the Null Hypothesis H0.
![Page 97: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/97.jpg)
• The probability of a type I error, , is usually set at a predefined level by choosing the critical thresholds (boundaries between the Acceptance and Critical Regions) appropriately.
![Page 98: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/98.jpg)
• The probability of a type II error, , is decreased (and the power of the test, 1 - , is increased) by
1. Choosing the “best” test statistic.
2. Selecting the most efficient experimental design.
3. Increasing the amount of information (usually by increasing the sample sizes involved) that the decision is based.
![Page 99: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/99.jpg)
Situation Test Statistic H0 HA Critical Region
z < -z/2 or z > z/2
z > z
Sample form the Normal distribution with unknown mean and known variance (Testing ) (n large)
s
xnz 0
z <-z
t < -t/2 or t > t/2
t > t
Sample form the Normal distribution with unknown mean and unknown variance (Testing ) (n small)
s
xnt 0
t < -t
pp z < -z/2 or z > z/2
pp z > z
Testing of a binomial probability p
n
pp
ppz
)1(
ˆ
00
0
pp
pp z < -z
21
z < -z/2 or z > z/2
21
z > z
Two independent samples from the Normal distribution with unknown means and known variances (Testing 1 - 2) (n, m largel)
m
s
n
s
yxz
yx
22
21
21
z < -z
21
t < -t/2 or t > t/2
21
t > t
Two independent samples from the Normal distribution with unknown means and unknown but equal variances. (Testing 1 - 2)
mns
yxt
Pooled
11
21
21
t < -t
21 pp
z < -z/2 or z > z/2
21 pp
z > z
Estimation of a the difference between two binomial probabilities, p1-p2
21
21
11)ˆ1(ˆ
ˆˆ
nnpp
ppz
21 pp
21 pp z < -z
Some common Tests
![Page 100: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/100.jpg)
The p-value approach to Hypothesis Testing
![Page 101: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/101.jpg)
1. A test statistic
2. A Critical and Acceptance region for the test statistic
In hypothesis testing we need
The Critical Region is set up under the sampling distribution of the test statistic.
Area = (0.05 or 0.01) above the critical region. The critical region may be one tailed or two tailed
![Page 102: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/102.jpg)
The Critical region:
1 when trueAccept 2/2/0 zzzPHP
/2
0 z
/2
2/z 2/z
Accept H0
Reject H0 Reject H0
2/2/0 or when trueReject zzzzPHP
![Page 103: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/103.jpg)
1. Computing the value of the test statistic
2. Making the decisiona. Reject if the value is in the Critical
region and b. Accept if the value is in the
Acceptance region.
In test is carried out by
![Page 104: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/104.jpg)
The value of the test statistic may be in the Acceptance region but close to being in the Critical region, or
The it may be in the Critical region but close to being in the Acceptance region.
To measure this we compute the p-value.
![Page 105: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/105.jpg)
Definition – Once the test statistic has been computed form the data the p-value is defined to be:
p-value = P[the test statistic is as or more extreme than the observed value of the test statistic]
more extreme means giving stronger evidence to rejecting H0
![Page 106: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/106.jpg)
Example – Suppose we are using the z –test for the mean m of a normal population and = 0.05.
Z0.025 = 1.960
p-value = P[the test statistic is as or more extreme than the observed value of the test statistic]
= P [ z > 2.3] + P[z < -2.3]
= 0.0107 + 0.0107 = 0.0214
Thus the critical region is to reject H0 if
Z < -1.960 or Z > 1.960 .
Suppose the z = 2.3, then we reject H0
![Page 107: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/107.jpg)
p - value
2.3-2.3
Graph
![Page 108: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/108.jpg)
p-value = P[the test statistic is as or more extreme than the observed value of the test statistic]
= P [ z > 1.2] + P[z < -1.2]
= 0.1151 + 0.1151 = 0.2302
If the value of z = 1.2, then we accept H0
23.02% chance that the test statistic is as or more extreme than 1.2. Fairly high, hence 1.2 is not very extreme
![Page 109: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/109.jpg)
p - value
1.2-1.2
Graph
![Page 110: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/110.jpg)
Properties of the p -value1. If the p-value is small (<0.05 or 0.01) H0 should be
rejected.
2. The p-value measures the plausibility of H0.
3. If the test is two tailed the p-value should be two tailed.
4. If the test is one tailed the p-value should be one tailed.
5. It is customary to report p-values when reporting the results. This gives the reader some idea of the strength of the evidence for rejecting H0
![Page 111: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/111.jpg)
Multiple testing
Quite often one is interested in performing collection (family) of tests of hypotheses.
1. H0,1 versus HA,1.
2. H0,2 versus HA,2.
3. H0,3 versus HA,3.
etc.
![Page 112: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/112.jpg)
• Let * denote the probability that at least one type I error is made in the collection of tests that are performed.
• The value of *, the family type I error rate, can be considerably larger than , the type I error rate of each individual test.
• The value of the family error rate, *, can be controlled by altering the thresholds of each individual test appropriately.
• A testing procedure of this nature is called a Multiple testing procedure.
![Page 113: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/113.jpg)
Independent variables
Dependent Variables
Categorical Continuous Continuous & Categorical
Categorical Multiway frequency Analysis(Log Linear Model)
Discriminant Analysis Discriminant Analysis
Continuous ANOVA (single dep var)MANOVA (Mult dep var)
MULTIPLE REGRESSION(single dep variable)MULTIVARIATEMULTIPLE REGRESSION (multiple dependent variable)
ANACOVA (single dep var)MANACOVA (Mult dep var)
Continuous & Categorical
?? ?? ??
A chart illustrating Statistical Procedures
![Page 114: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/114.jpg)
Comparing k Populations
Means – One way Analysis of Variance (ANOVA)
![Page 115: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/115.jpg)
The F test
![Page 116: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/116.jpg)
The F test – for comparing k means
Situation
• We have k normal populations
• Let i and denote the mean and standard deviation of population i.
• i = 1, 2, 3, … k.
• Note: we assume that the standard deviation for each population is the same.
1 = 2 = … = k =
![Page 117: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/117.jpg)
We want to test
kH 3210 :
against
jiH jiA ,pair oneleast at for :
![Page 118: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/118.jpg)
To test kH 3210 :
against jiH jiA ,pair oneleast at for :
knsn
kxxn
s
sF
k
ii
k
iii
k
iii
Error
Between
11
2
1
2
2
2
1
1use the test statistic
where mean for the sample.thix i
standard deviation for the samplethis i
1 1
1
overall meank k
k
n x n xx
n n
![Page 119: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/119.jpg)
is called the Between Sum of Squares and is denoted by SSBetween
It measures the variability between samples
the statistic 2
1
k
i ii
n x x
k – 1 is known as the Between degrees of freedom and
is called the Between Mean Square and is denoted by MSBetween
2
1
1k
i ii
n x x k
![Page 120: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/120.jpg)
is called the Error Sum of Squares and is denoted by SSError
the statistic
is known as the Error degrees of freedom and
is called the Error Mean Square and is denoted by MSError
2
1 1
1k k
i i ii i
n s n k
2
1
1k
i ii
n s
1
k
ii
n k N k
![Page 121: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/121.jpg)
then
Error
Between
MS
MSF
![Page 122: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/122.jpg)
The Computing formula for F:
k
i
n
jij
i
x1 1
2
Compute
ixTin
jiji samplefor Total
1
Total Grand 1 11
k
i
n
jij
k
ii
i
xTG
size sample Total1
k
iinN
k
i i
i
n
T
1
2
1)
2)
3)
4)
5)
![Page 123: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/123.jpg)
Then
1)
2)
k
i i
ik
i
n
jijError n
TxSS
i
1
2
1 1
2
BetweenSS
k
i i
i
N
G
n
T
1
22
3) kNSS
kSSF
Error
Between
1
![Page 124: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/124.jpg)
We reject
kH 3210 :
FF if
F is the critical point under the F distribution with 1 = k - 1degrees of freedom in the numerator and 2 = N – k degrees of freedom in the denominator
The critical region for the F test
![Page 125: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/125.jpg)
Example
In the following example we are comparing weight gains resulting from the following six diets
1. Diet 1 - High Protein , Beef
2. Diet 2 - High Protein , Cereal
3. Diet 3 - High Protein , Pork
4. Diet 4 - Low protein , Beef
5. Diet 5 - Low protein , Cereal
6. Diet 6 - Low protein , Pork
![Page 126: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/126.jpg)
Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork)
Diet 1 2 3 4 5 6
73 98 94 90 107 49 102 74 79 76 95 82 118 56 96 90 97 73 104 111 98 64 80 86 81 95 102 86 98 81 107 88 102 51 74 97 100 82 108 72 74 106 87 77 91 90 67 70 117 86 120 95 89 61 111 92 105 78 58 82
Mean 100.0 85.9 99.5 79.2 83.9 78.7 Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55
x 1000 859 995 792 839 787 x2 102062 75819 100075 64462 72613 64401
![Page 127: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/127.jpg)
Hence
4794321 1
2
k
i
n
jij
i
x
60 size sample Total1
k
iinN
4678461
2
k
i i
i
n
T
i 1 2 3 4 5 6 Total (G )T i 1000 859 995 792 839 787 5272
![Page 128: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/128.jpg)
Thus
115864678464794321
2
1 1
2
k
i i
ik
i
n
jijError n
TxSS
i
BetweenSS 933.461260
5272467846
2
1
22
k
i i
i
N
G
n
T
3.4
56.214
6.922
54/11586
5/933.46121
kNSS
kSSF
Error
Between
54 and 5 with 386.2 2105.0 F
Thus since F > 2.386 we reject H0
![Page 129: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/129.jpg)
The ANOVA Table
A convenient method for displaying the calculations for the
F-test
![Page 130: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/130.jpg)
Source d.f. Sum of Squares
Mean Square
F-ratio
Between k - 1 SSBetween MSBetween MSB /MSE
Within N - k SSError MSError
Total N - 1 SSTotal
Anova Table
![Page 131: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/131.jpg)
Source d.f. Sum of Squares
Mean Square
F-ratio
Between 5 4612.933 922.587 4.3
Within 54 11586.000 214.556 (p = 0.0023)
Total 59 16198.933
The Diet Example
![Page 132: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/132.jpg)
Using SPSS
Note: The use of another statistical package such as Minitab is similar to using SPSS
![Page 133: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/133.jpg)
Assume the data is contained in an Excel file
![Page 134: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/134.jpg)
Each variable is in a column
1. Weight gain (wtgn)
2. diet
3. Source of protein (Source)
4. Level of Protein (Level)
![Page 135: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/135.jpg)
After starting the SSPS program the following dialogue box appears:
![Page 136: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/136.jpg)
If you select Opening an existing file and press OK the following dialogue box appears
![Page 137: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/137.jpg)
The following dialogue box appears:
![Page 138: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/138.jpg)
If the variable names are in the file ask it to read the names. If you do not specify the Range the program will identify the Range:
Once you “click OK”, two windows will appear
![Page 139: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/139.jpg)
One that will contain the output:
![Page 140: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/140.jpg)
The other containing the data:
![Page 141: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/141.jpg)
To perform ANOVA select Analyze->General Linear Model-> Univariate
![Page 142: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/142.jpg)
The following dialog box appears
![Page 143: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/143.jpg)
Select the dependent variable and the fixed factors
Press OK to perform the Analysis
![Page 144: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/144.jpg)
Tests of Between-Subjects Effects Dependent Variable: wtgn
Source Type III Sum of
Squares df Mean Square F Sig. Corrected Model 4612.933(a) 5 922.587 4.300 .002
Intercept 463233.067 1 463233.067 2159.036 .000
diet 4612.933 5 922.587 4.300 .002
Error 11586.000 54 214.556
Total 479432.000 60
Corrected Total 16198.933 59
a R Squared = .285 (Adjusted R Squared = .219)
The Output
![Page 145: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/145.jpg)
Comments
• The F-test H0: 1 = 2 = 3 = … = k against HA: at least one pair of means are different
• If H0 is accepted we know that all means are equal (not significantly different)
• If H0 is rejected we conclude that at least one pair of means is significantly different.
• The F – test gives no information to which pairs of means are different.
• One now can use two sample t tests to determine which pairs means are significantly different
![Page 146: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/146.jpg)
Fishers LSD (least significant difference) procedure:
1. Test H0: 1 = 2 = 3 = … = k against HA: at least one pair of means are different, using the ANOVA F-test
2. If H0 is accepted we know that all means are equal (not significantly different). Then stop in this case
3. If H0 is rejected we conclude that at least one pair of means is significantly different, then follow this by• using two sample t tests to determine which pairs
means are significantly different
![Page 147: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/147.jpg)
Linear Regression
Hypothesis testing and Estimation
![Page 148: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/148.jpg)
Assume that we have collected data on two variables X and Y. Let
(x1, y1) (x2, y2) (x3, y3) … (xn, yn)
denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)
![Page 149: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/149.jpg)
The Statistical Model
![Page 150: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/150.jpg)
Each yi is assumed to be randomly generated from a normal distribution with
mean i = + xi and standard deviation . (, and are unknown)
yi
+ xi
xi
Y = + X
slope =
![Page 151: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/151.jpg)
The Data The Linear Regression Model
• The data falls roughly about a straight line.
0
20
40
60
80
100
120
140
160
40 60 80 100 120 140
Y = + X
unseen
![Page 152: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/152.jpg)
The Least Squares Line
Fitting the best straight line
to “linear” data
![Page 153: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/153.jpg)
LetY = a + b X
denote an arbitrary equation of a straight line.a and b are known values.This equation can be used to predict for each value of X, the value of Y.
For example, if X = xi (as for the ith case) then the predicted value of Y is:
ii bxay ˆ
![Page 154: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/154.jpg)
The residual
can be computed for each case in the sample,
The residual sum of squares (RSS) is
a measure of the “goodness of fit of the line
Y = a + bX to the data
iiiii bxayyyr ˆ
,ˆ,,ˆ,ˆ 222111 nnn yyryyryyr
n
iii
n
iii
n
ii bxayyyrRSS
1
2
1
2
1
2 ˆ
![Page 155: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/155.jpg)
The optimal choice of a and b will result in the residual sum of squares
attaining a minimum.
If this is the case than the line:
Y = a + bX
is called the Least Squares Line
n
iii
n
iii
n
ii bxayyyrRSS
1
2
1
2
1
2 ˆ
![Page 156: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/156.jpg)
The equation for the least squares line
Let
n
iixx xxS
1
2
n
iiyy yyS
1
2
n
iiixy yyxxS
1
![Page 157: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/157.jpg)
n
x
xxxS
n
iin
ii
n
iixx
2
1
1
2
1
2
n
yx
yx
n
ii
n
iin
iii
11
1
n
y
yyyS
n
iin
ii
n
iiyy
2
1
1
2
1
2
n
iiixy yyxxS
1
Computing Formulae:
![Page 158: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/158.jpg)
Then the slope of the least squares line can be shown to be:
n
ii
n
iii
xx
xy
xx
yyxx
S
Sb
1
2
1
![Page 159: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/159.jpg)
and the intercept of the least squares line can be shown to be:
xS
Syxbya
xx
xy
![Page 160: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/160.jpg)
The residual sum of Squares
22
1 1
ˆn n
i i i ii i
RSS y y y a bx
2
xy
yyxx
SS
S
Computing formula
![Page 161: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/161.jpg)
Estimating , the standard deviation in the regression model :
22
ˆ1
2
1
2
n
bxay
n
yys
n
iii
n
iii
xx
xyyy S
SS
n
2
2
1
This estimate of is said to be based on n – 2 degrees of freedom
Computing formula
![Page 162: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/162.jpg)
Sampling distributions of the estimators
![Page 163: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/163.jpg)
The sampling distribution slope of the least squares line :
n
ii
n
iii
xx
xy
xx
yyxx
S
Sb
1
2
1
It can be shown that b has a normal distribution with mean and standard deviation
n
ii
xx
bb
xxS
1
2
and
![Page 164: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/164.jpg)
Thus
has a standard normal distribution, and
b
b
xx
b bz
S
b
b
xx
b bt
ssS
has a t distribution with df = n - 2
![Page 165: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/165.jpg)
(1 – )100% Confidence Limits for slope :
t/2 critical value for the t-distribution with n – 2 degrees of freedom
xxS
st ˆ
2/
![Page 166: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/166.jpg)
Testing the slope
The test statistic is:
0 0 0: vs : AH H
0
xx
bt
sS
- has a t distribution with df = n – 2 if H0 is true.
![Page 167: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/167.jpg)
The Critical Region
Reject
0 0 0: vs : AH H
0/ 2 / 2if or
xx
bt t t t
sS
df = n – 2
This is a two tailed tests. One tailed tests are also possible
![Page 168: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/168.jpg)
The sampling distribution intercept of the least squares line :
It can be shown that a has a normal distribution with mean and standard deviation
n
ii
aa
xx
x
n
1
2
21 and
ˆ xy
xx
Sa y bx y x
S
![Page 169: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/169.jpg)
Thus
has a standard normal distribution and
2
2
1
1
a
a
n
ii
a az
xn x x
2
2
1
1
a
a
n
ii
a at
s xs
n x x
has a t distribution with df = n - 2
![Page 170: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/170.jpg)
(1 – )100% Confidence Limits for intercept :
t/2 critical value for the t-distribution with n – 2 degrees of freedom
1
ˆ2
2/xxS
x
nst
![Page 171: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/171.jpg)
Testing the intercept
The test statistic is:
0 0 0: vs : AH H
- has a t distribution with df = n – 2 if H0 is true.
0
2
2
1
1
n
ii
at
xs
n x x
![Page 172: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/172.jpg)
The Critical Region
Reject
0 0 0: vs : AH H
0/ 2 / 2if or
a
at t t t
s
df = n – 2
![Page 173: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/173.jpg)
Example
![Page 174: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/174.jpg)
The following data showed the per capita consumption of cigarettes per month (X) in various countries in 1930, and the death rates from lung cancer for men in 1950. TABLE : Per capita consumption of cigarettes per month (Xi) in n = 11 countries in 1930, and the death rates, Yi (per 100,000), from lung cancer for men in 1950.
Country (i) Xi Yi
Australia 48 18Canada 50 15Denmark 38 17Finland 110 35Great Britain 110 46Holland 49 24Iceland 23 6Norway 25 9Sweden 30 11Switzerland 51 25USA 130 20
![Page 175: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/175.jpg)
Australia
CanadaDenmark
Finland
Great Britain
Holland
Iceland
NorwaySweden
Switzerland
USA
0
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120 140
deat
h ra
tes f
rom
lung
can
cer
(195
0)
Per capita consumption of cigarettes
![Page 176: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/176.jpg)
404,541
2
n
iix
914,161
n
iii yx
018,61
2
n
iiy
Fitting the Least Squares Line
6641
n
iix
2261
n
iiy
![Page 177: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/177.jpg)
55.1432211
66454404
2
xxS
73.1374
11
2266018
2
yyS
82.3271
11
22666416914 xyS
Fitting the Least Squares Line
First compute the following three quantities:
![Page 178: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/178.jpg)
Computing Estimate of Slope (), Intercept () and standard deviation (),
288.055.14322
82.3271
xx
xy
S
Sb
756.611
664288.0
11
226
xbya
35.8
2
12
xx
xyyy S
SS
ns
![Page 179: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/179.jpg)
95% Confidence Limits for slope :
t.025 = 2.262 critical value for the t-distribution with 9 degrees of freedom
xxS
st ˆ
2/
0.0706 to 0.3862
8.350.288 2.262
1432255
![Page 180: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/180.jpg)
95% Confidence Limits for intercept :
1
ˆ2
2/xxS
x
nst
-4.34 to 17.85
t.025 = 2.262 critical value for the t-distribution with 9 degrees of freedom
2664 111
6.756 2.262 8.35 11 1432255
![Page 181: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/181.jpg)
Iceland
NorwaySweden
DenmarkCanada
Australia
HollandSwitzerland
Great Britain
Finland
USA
0
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120 140
Per capita consumption of cigarettes
deat
h ra
tes
from
lung
can
cer
(195
0)
Y = 6.756 + (0.228)X
95% confidence Limits for slope 0.0706 to 0.3862
95% confidence Limits for intercept -4.34 to 17.85
![Page 182: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/182.jpg)
Testing the positive slope
The test statistic is:
0 : 0 vs : 0 AH H
0
xx
bt
sS
![Page 183: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/183.jpg)
The Critical Region
Reject
0 : 0 in favour of : 0 AH H
0.05
0if =1.833
xx
bt t
sS
df = 11 – 2 = 9
A one tailed test
![Page 184: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/184.jpg)
and conclude
0 : 0 H
0Since
xx
bt
sS
0.28841.3 1.833
8.351432255
we reject
: 0 AH
![Page 185: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/185.jpg)
Confidence Limits for Points on the Regression Line
• The intercept is a specific point on the regression line.
• It is the y – coordinate of the point on the regression line when x = 0.
• It is the predicted value of y when x = 0.• We may also be interested in other points on the
regression line. e.g. when x = x0
• In this case the y – coordinate of the point on the regression line when x = x0 is + x0
![Page 186: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/186.jpg)
x0
+ x0
y = + x
![Page 187: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/187.jpg)
(1- )100% Confidence Limits for + x0 :
1 20
2/0xxS
xx
nstbxa
t/2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
![Page 188: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/188.jpg)
Prediction Limits for new values of the Dependent variable y
• An important application of the regression line is prediction.
• Knowing the value of x (x0) what is the value of y?
• The predicted value of y when x = x0 is:
• This in turn can be estimated by:.
ˆ 0xy
00 ˆˆˆ bxaxy
![Page 189: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/189.jpg)
The predictor
• Gives only a single value for y. • A more appropriate piece of information would
be a range of values.• A range of values that has a fixed probability of
capturing the value for y.• A (1- )100% prediction interval for y.
00 ˆˆˆ bxaxy
![Page 190: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/190.jpg)
(1- )100% Prediction Limits for y when x = x0:
11
20
2/0xxS
xx
nstbxa
t/2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
![Page 191: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/191.jpg)
ExampleIn this example we are studying building fires in a city and interested in the relationship between:
1. X = the distance of the closest fire hall and the building that puts out the alarm
and
2. Y = cost of the damage (1000$)
The data was collected on n = 15 fires.
![Page 192: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/192.jpg)
The DataFire Distance Damage
1 3.4 26.22 1.8 17.83 4.6 31.34 2.3 23.15 3.1 27.56 5.5 36.07 0.7 14.18 3.0 22.39 2.6 19.610 4.3 31.311 2.1 24.012 1.1 17.313 6.1 43.214 4.8 36.415 3.8 26.1
![Page 193: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/193.jpg)
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
0.0 2.0 4.0 6.0 8.0
Distance (miles)
Dam
age
(100
0$)
Scatter Plot
![Page 194: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/194.jpg)
ComputationsFire Distance Damage
1 3.4 26.22 1.8 17.83 4.6 31.34 2.3 23.15 3.1 27.56 5.5 36.07 0.7 14.18 3.0 22.39 2.6 19.6
10 4.3 31.311 2.1 24.012 1.1 17.313 6.1 43.214 4.8 36.415 3.8 26.1
2.491
n
iix
2.3961
n
iiy
16.1961
2
n
iix
5.113761
2
n
iiy
65.14701
n
iii yx
![Page 195: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/195.jpg)
Computations Continued
28.3152.491
n
xx
n
ii
4133.26152.3961
n
yy
n
ii
![Page 196: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/196.jpg)
Computations Continued
784.34152.4916.196
2
2
1
1
2
n
xxS
n
iin
iixx
517.911152.3965.11376
2
2
1
1
2
n
yyS
n
iin
iiyy
n
yxyxS
n
ii
n
iin
iiixy
11
1
114.171152.3962.4965.1470
![Page 197: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/197.jpg)
Computations Continued
92.4784.34
114.171ˆ xx
xy
S
Sb
28.1028.3919.44133.26ˆ xbya
2
2
n
SS
Ss xx
xyyy
316.213
784.34114.171517.911
2
![Page 198: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/198.jpg)
95% Confidence Limits for slope :
t.025 = 2.160 critical value for the t-distribution with 13 degrees of freedom
xxS
st ˆ
2/
4.07 to 5.77
![Page 199: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/199.jpg)
95% Confidence Limits for intercept :
1
ˆ2
2/xxS
x
nst
7.21 to 13.35
t.025 = 2.160 critical value for the t-distribution with 13 degrees of freedom
![Page 200: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/200.jpg)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
0.0 2.0 4.0 6.0 8.0
Distance (miles)
Dam
age
(100
0$)
Least Squares Line
y=4.92x+10.28
![Page 201: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/201.jpg)
(1- )100% Confidence Limits for + x0 :
1 20
2/0xxS
xx
nstbxa
t/2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
![Page 202: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/202.jpg)
95% Confidence Limits for + x0 :
x 0 lower upper
1 12.87 17.522 18.43 21.803 23.72 26.354 28.53 31.385 32.93 36.826 37.15 42.44
![Page 203: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/203.jpg)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
0.0 2.0 4.0 6.0 8.0
Distance (miles)
Dam
age
(100
0$)
95% Confidence Limits for + x0
Confidence limits
![Page 204: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/204.jpg)
(1- )100% Prediction Limits for y when x = x0:
11
20
2/0xxS
xx
nstbxa
t/2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
![Page 205: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/205.jpg)
95% Prediction Limits for y when x = x0
x 0 lower upper
1 9.68 20.712 14.84 25.403 19.86 30.214 24.75 35.165 29.51 40.246 34.13 45.45
![Page 206: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/206.jpg)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
0.0 2.0 4.0 6.0 8.0
Distance (miles)
Dam
age
(100
0$)
95% Prediction Limits for y when x =x0
Prediction limits
![Page 207: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/207.jpg)
Linear RegressionSummary
Hypothesis testing and Estimation
![Page 208: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/208.jpg)
(1 – )100% Confidence Limits for slope :
t/2 critical value for the t-distribution with n – 2 degrees of freedom
xxS
st ˆ
2/
![Page 209: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/209.jpg)
Testing the slope
The test statistic is:
0 0 0: vs : AH H
0
xx
bt
sS
- has a t distribution with df = n – 2 if H0 is true.
![Page 210: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/210.jpg)
(1 – )100% Confidence Limits for intercept :
t/2 critical value for the t-distribution with n – 2 degrees of freedom
1
ˆ2
2/xxS
x
nst
![Page 211: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/211.jpg)
Testing the intercept
The test statistic is:
0 0 0: vs : AH H
- has a t distribution with df = n – 2 if H0 is true.
0
2
2
1
1
n
ii
at
xs
n x x
![Page 212: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/212.jpg)
(1- )100% Confidence Limits for + x0 :
1 20
2/0xxS
xx
nstbxa
t/2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
![Page 213: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/213.jpg)
(1- )100% Prediction Limits for y when x = x0:
11
20
2/0xxS
xx
nstbxa
t/2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
![Page 214: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/214.jpg)
Comparing k Populations
Proportions
The 2 test for independence
![Page 215: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/215.jpg)
The 2 test for independence
![Page 216: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/216.jpg)
Situation
• We have two categorical variables R and C.
• The number of categories of R is r.
• The number of categories of C is c.
• We observe n subjects from the population and count xij = the number of subjects for which R = i and C = j.
• R = rows, C = columns
![Page 217: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/217.jpg)
Example
Both Systolic Blood pressure (C) and Serum Cholesterol (R) were meansured for a sample of n = 1237 subjects.
The categories for Blood Pressure are:
<126 127-146 147-166 167+
The categories for Cholesterol are:
<200 200-219 220-259 260+
![Page 218: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/218.jpg)
Table: two-way frequency
Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total
<200 117 121 47 22 307 200-219 85 98 43 20 246 220-259 119 209 68 43 439
260+ 67 99 46 33 245 Total 388 527 204 118 1237
![Page 219: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/219.jpg)
The 2 test for independence
Define Total row 1
thc
jiji ixR
Totalcolumn 1
thc
iiji jxC
n
CRE ji
ij
= Expected frequency in the (i,j) th cell in the case of independence.
![Page 220: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/220.jpg)
Use test statistic
r
i
c
j ij
ijij
E
Ex
1 1
2
2
Eij= Expected frequency in the (i,j) th cell in the case of independence.
H0: R and C are independent
against
HA: R and C are not independent
Then to test
xij= observed frequency in the (i,j) th cell
i jR C
n
![Page 221: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/221.jpg)
Sampling distribution of test statistic when H0 is true
r
i
c
j ij
ijij
E
Ex
1 1
2
2
- 2 distribution with degrees of freedom = (r - 1)(c - 1)
Critical and Acceptance Region
Reject H0 if : 2
Accept H0 if : 2
![Page 222: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/222.jpg)
Table Expected frequencies, Observed frequencies, Standardized Residuals
Serum Systolic Blood pressure
Cholesterol <127 127-146 147-166 167+ Total <200 96.29 130.79 50.63 29.29 307 (117) (121) (47) (22) 2.11 -0.86 -0.51 -1.35 200-219 77.16 104.80 40.47 23.47 246 (85) (98) (43) (20) 0.86 -0.66 0.38 -0.72 220-259 137.70 187.03 72.40 41.88 439 (119) (209) (68) (43) -1.59 1.61 -0.52 0.17 260+ 76.85 104.38 40.04 23.37 245 (67) (99) (46) (33) -1.12 -0.53 0.88 1.99 Total 388 527 204 118 1237
2 = 20.85
![Page 223: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/223.jpg)
Standardized residuals
ij
ijijij
E
Exr
85.20
1 1
2
1 1
2
2
r
i
c
jij
r
i
c
j ij
ijij rE
Ex
degrees of freedom = (r - 1)(c - 1) = 9
919.1605.0
Test statistic
Reject H0 using = 0.05
![Page 224: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/224.jpg)
Another ExampleThis data comes from a Globe and Mail study examining the attitudes of the baby boomers.Data was collected on various age groups
Age group Total
Echo (Age 20 – 29) 398Gen X (Age 30 – 39) 342
Younger Boomers (Age 40 – 49) 378Older Boomers (Age 50 – 59) 286
Pre Boomers (Age 60+) 445Total 1849
![Page 225: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/225.jpg)
One question with responsesIn an average week, how many times would you drink alcohol?
Age group never once twice
three or four times
five more times Total
Echo (Age 20 – 29) 115 135 64 48 36 398 Gen X (Age 30 – 39) 130 123 38 31 20 342 Younger Boomers (Age 40 – 49) 136 87 64 57 34 378 Older Boomers (Age 50 – 59) 109 74 40 43 20 286
Pre Boomers (Age 60+) 218 80 45 40 62 445
Total 708 499 251 219 172 1849
Are there differences in weekly consumption of alcohol related to age?
![Page 226: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/226.jpg)
Table: Expected frequencies
Age group never once twice
three or four times
five more times Total
Echo (Age 20 – 29) 152.40 107.41 54.03 47.14 37.02 398 Gen X (Age 30 – 39) 130.96 92.30 46.43 40.51 31.81 342 Younger Boomers (Age 40 – 49) 144.74 102.01 51.31 44.77 35.16 378 Older Boomers (Age 50 – 59) 109.51 77.18 38.82 33.87 26.60 286
Pre Boomers (Age 60+) 170.39 120.09 60.41 52.71 41.40 445
Total 708 499 251 219 172 1849
![Page 227: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/227.jpg)
Table: Residuals
Conclusion: There is a significant relationship between age group and weekly alcohol use
Age group never once twice
three or four times
five more times
Echo (Age 20 – 29) -3.029 2.662 1.357 0.125 -0.168 Gen X (Age 30 – 39) -0.083 3.196 -1.237 -1.494 -2.095 Younger Boomers (Age 40 – 49) -0.726 -1.486 1.771 1.828 -0.196 Older Boomers (Age 50 – 59) -0.049 -0.362 0.189 1.568 -1.280
Pre Boomers (Age 60+) 3.647 -3.659 -1.982 -1.750 3.203
ij
ijijij
E
Exr
2
2 2
1 1 1 1
93.97r c r c
ij ij
iji j i jij
x Er
E
2.05 26.296 for 4 4 16 .d f
![Page 228: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/228.jpg)
Examining the Residuals allows one to identify the cells that indicate a departure from independence
• Large positive residuals indicate cells where the observed frequencies were larger than expected if independent Large negative residuals indicate cells where the observed frequencies were smaller than expected if independent
Age group never once twice
three or four times
five more times
Echo (Age 20 – 29) -3.029 2.662 1.357 0.125 -0.168 Gen X (Age 30 – 39) -0.083 3.196 -1.237 -1.494 -2.095 Younger Boomers (Age 40 – 49) -0.726 -1.486 1.771 1.828 -0.196 Older Boomers (Age 50 – 59) -0.049 -0.362 0.189 1.568 -1.280
Pre Boomers (Age 60+) 3.647 -3.659 -1.982 -1.750 3.203
![Page 229: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/229.jpg)
Another question with responses
Are there differences in weekly internet use related to age?
Age group never 1 to 4 times
5 to 9 times
10 or more times Total
Echo (Age 20 – 29) 48 72 100 178 398 Gen X (Age 30 – 39) 51 82 92 117 342 Younger Boomers (Age 40 – 49) 79 128 76 95 378 Older Boomers (Age 50 – 59) 92 63 57 74 286
Pre Boomers (Age 60+) 276 71 67 31 445
Total 546 416 392 495 1849
In an average week, how many times would you surf the internet?
![Page 230: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/230.jpg)
Table: Expected frequencies
Age group never 1 to 4 times
5 to 9 times
10 or more times Total
Echo (Age 20 – 29) 117.53 89.54 84.38 106.55 398 Gen X (Age 30 – 39) 100.99 76.95 72.51 91.56 342
Younger Boomers (Age 40 – 49) 111.62 85.04 80.14 101.20 378 Older Boomers (Age 50 – 59) 84.45 64.35 60.63 76.57 286
Pre Boomers (Age 60+) 131.41 100.12 94.34 119.13 445
Total 546 416 392 495 1849
![Page 231: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/231.jpg)
Table: Residuals
Conclusion: There is a significant relationship between age group and weekly internet use
ij
ijijij
E
Exr
2
2 2
1 1 1 1
406.29r c r c
ij ij
iji j i jij
x Er
E
2.05 21.03 for 4 3 12 .d f
Age group never 1 to 4 times
5 to 9 times
10 or more times
Echo (Age 20 – 29) -6.41 -1.85 1.70 6.92 Gen X (Age 30 – 39) -4.97 0.58 2.29 2.66
Younger Boomers (Age 40 – 49) -3.09 4.66 -0.46 -0.62 Older Boomers (Age 50 – 59) 0.82 -0.17 -0.47 -0.29
Pre Boomers (Age 60+) 12.61 -2.91 -2.82 -8.07
![Page 232: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/232.jpg)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
never 1 to 4 times 5 to 9 times 10 or more times
Echo (Age 20 – 29)
![Page 233: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/233.jpg)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
never 1 to 4 times 5 to 9 times 10 or more times
Gen X (Age 30 – 39)
![Page 234: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/234.jpg)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
never 1 to 4 times 5 to 9 times 10 or more times
Younger Boomers (Age 40 – 49)
![Page 235: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/235.jpg)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
never 1 to 4 times 5 to 9 times 10 or more times
Older Boomers (Age 50 – 59)
![Page 236: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/236.jpg)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
never 1 to 4 times 5 to 9 times 10 or more times
Pre Boomers (Age 60+)
![Page 237: Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental](https://reader035.vdocuments.us/reader035/viewer/2022081421/56649edc5503460f94bed355/html5/thumbnails/237.jpg)
Next topic: Fitting equations to data
Link