004
DESCRIPTION
Lesson 4 Correlational AnalysisTRANSCRIPT
IBS Statistics Year 1Dr. Ning DING
Table of content• Review
• Learning Goals
• Chapter 12: Simple Regression and Correlation
• Exercises
Chapter 3: Describing Data
Review
Find the interquartile range: 146014711637172117581787194020382047205420972205228723112406
Interquartile Range=Q3-Q1
=2205-1721=484
Correction of EXCEL Exercise 5
L=(8+1)*25%=2.25L=(8+1)*25%=2.25
Q1=133.5Q1=133.5
L=(8+1)*75%=6.75L=(8+1)*75%=6.75
Q3=274.5Q3=274.5
Interquartile Range=274.5-133.5=141
Interquartile Range=274.5-133.5=141
BoxplotBoxplotBoxplotBoxplot
12245789
12
12245789
12
Median1224
1224
789
12
789
12
Quartile
Q1=2
Q3=8.5
5InterquartileInterquartile
RangeRange
Decile
1st D
9th D
Percentile
http://cnx.org/content/m11192/latest/
How to interpret?How to interpret?
BoxplotBoxplotBoxplotBoxplot
The distribution is skewed to __________ because the mean is __________the median.
the right larger than
http://cnx.org/content/m11192/latest/
€ 20 € 2000Q1= € 250 Q3= € 850Median= € 350
Mean= € 450Mean= € 450a b
0.81.01.01.21.21.31.51.72.02.02.12.24.0
0.81.01.01.21.21.31.51.72.02.02.12.24.0 2.0
3.23.63.74.04.24.24.54.54.64.85.05.0
2.03.23.63.74.04.24.24.54.54.64.85.05.0
Mean > MedianMean > Median
Mean < MedianMean < Median
Positively skewedPositively skewed
Negatively skewedNegatively skewedhttp://qudata.com/online/statcalc/
This means that the data is symmetrically distributed.
Zero skewness
mode=median=mean
Zero skewness
mode=median=mean
Learning Goals• Chapter 12:
– Learn how many business decisions depend on knowing the specific relationship between two or more variables
– Use scatter diagrams to visualize the relationship between two variables
– Use regression analysis to estimate the relationship between two variables
– Use the least-squares estimating equation to predict future values of the dependent variable
– Learn how correlation analysis describes the degree to which two variables are linearly related to each other
– Understand the coefficient of determination as a measure of the strength of the relationship between two variables
– Learn limitations of regression and correlation analyses and caveats about their use.
1. IntroductionChapter 12: Sim Reg & Corr
Regression and Correlation Analyses:
– How to determine both the nature and the strength of a relationship between variables.
1. IntroductionChapter 12: Sim Reg & Corr
Scatter Diagram:
28
Describing Relationship between Two Variables – Scatter Diagram Examples
Positive correlationPositive correlation
1. IntroductionChapter 12: Sim Reg & Corr
Scatter Diagram:
Negative correlationNegative correlation
28
Describing Relationship between Two Variables – Scatter Diagram Examples
1. IntroductionChapter 12: Sim Reg & Corr
Scatter Diagram:
No correlationNo correlation
28
Describing Relationship between Two Variables – Scatter Diagram Examples
2. Types of RelationshipsChapter 12: Sim Reg & Corr
Variables: – Independent variables: known– Dependent variables: to predict
Independent VariableIndependent Variable
Dependent VariableDependent Variable
28
Describing Relationship between Two Variables – Scatter Diagram Examples
28
Describing Relationship between Two Variables – Scatter Diagram Examples
2. Types of RelationshipChapter 12: Sim Reg & Corr
Correlation & Cause Effect?
• The relationships found by regression to be relationships of association
• Not necessarilly of cause and effect.
Independent VariableIndependent Variable
Dependent VariableDependent Variable
28
Describing Relationship between Two Variables – Scatter Diagram Examples
28
Describing Relationship between Two Variables – Scatter Diagram Examples
2. Estimation Using the Regression Line
Chapter 12: Sim Reg & Corr
Scatter Diagrams:• Patterns indicating that the variables are related• If related, we can describe the relationship
Strong & Positivecorrelation
Strong & Negativecorrelation
Weak & Positivecorrelation
Weak & Negativecorrelation
Nocorrelation
Chapter 12: Sim Reg & Corr
Scatter Diagrams:
2. Estimation Using the Regression Line
Chapter 12: Sim Reg & Corr
Simple Linear Regression:• The dependent variable Y is determined by the independent variable
X
2. Estimation Using the Regression Line
Ŷ = a + bXŶ = a + bX
YX
Independent VariableIndependent Variable
Dependent VariableDependent Variable
Ŷ = a + bXŶ = a + bX
Chapter 12: Sim Reg & Corr
Simple Linear Regression:• The dependent variable Y is determined by the independent variable
X
2. Estimation Using the Regression Line
Ŷ = a + bXŶ = a + bX
Chapter 12: Sim Reg & Corr
Slope of the Best-Fitting Regression Line:
2. Estimation Using the Regression Line
xn-x
y xn-xy=b
22
Y = a + bX a = Y - bX
Chapter 12: Sim Reg & Corr
2. Estimation Using the Regression Line
75.09*444
6*3*478
-
-=b
Y = a + bX a = Y - bX
the relationship between the age of a truck and the annual repair expense?
X=3 Y=6
xn-x
y xn-xy=b
22
a = 6 - 0.75*3 = 3.75 Ŷ = 3.75 + 0.75 XŶ = 3.75 + 0.75 X
If the city has a truck that is 4 years old,
the director could use the equation to predict $675 annually in repairs.
6.75 = 3.75 + 0.75 * 46.75 = 3.75 + 0.75 * 4
Chapter 12: Sim Reg & Corr
Example:• To find the simple/linear regression of Personal Income (X) and Auto Sales (Y)
Exercise
Count the number of values. Step 1:
Find XY, X2 See the below tableStep 2:
N = 5N = 5
X=64 what about Y?
Chapter 12: Sim Reg & Corr
Exercise
Step 3:
Step 4:
Find ΣX, ΣY, ΣXY, ΣX2. ΣX = 311 Mean = 62.2 ΣY = 18.6 Mean = 3.72 ΣXY = 1159.7 ΣX2 = 19359
xn-x
y xn-xy=b
22 Substitute in the above slope formula given.
Slope(b) = = 0.19 1159.7-5*62.2*3.72
19359-5*62.2*62.2
Chapter 12: Sim Reg & Corr
Exercise
Step 5:
Then substitute these values in regression equation formula Regression Equation(Ŷ) = a + bX
Ŷ = -8.098 + 0.19X.
Step 6:
Slope(b) = 0.19
Now, again substitute in the above intercept formula given.
Intercept(a) = Y - bX = 3.72- 0.19 * 62.2= -8.098
Suppose if we want to know the approximate y value for the variable X = 64. Then we can substitute the value in the above equation.
Regression Equation:Ŷ = a + bX = -8.098 + 0.19(64). = -8.098 + 12.16 = 4.06
Regression Equation:Ŷ = a + bX = -8.098 + 0.19(64). = -8.098 + 12.16 = 4.06
Chapter 12: Sim Reg & Corr
Least Squares Method:Minimize the sum of the squares of the errors to measure thegoodness of fit of a line
2. Estimation Using the Regression Line
ei = residuali
Chapter 12: Sim Reg & Corr
Least Squares Method:
2. Estimation Using the Regression Line
Chapter 12: Sim Reg & Corr
Example:
2. Estimation Using the Regression Line
Chapter 12: Sim Reg & Corr
Example Solution:
2. Estimation Using the Regression Line
Chapter 12: Sim Reg & Corr
Correlation Analysis:describe the degree to which one variable is linearly related
to another.
3. Correlation Analysis
Coefficient of Determination:Measure the extent, or strength, of the association that existsbetween two variables.
Coefficient of Correlation:Square root of coefficient of determination
r 2r 2
rr
Chapter 12: Sim Reg & Corr
3. Correlation Analysis
Coefficient of Determination:Measure the extent, or strength, of the association that existsbetween two variables.
• 0 ≤ r2 ≤ 1.• The larger r2 , the stronger the linear relationship.• The closer r2 is to 1, the more confident we are in our
prediction.
Yn-YYn-XYb+Ya
=r 22
22
Chapter 12: Sim Reg & Corr
3. Correlation Analysis
Coefficient of Correlation:
Chapter 12: Sim Reg & Corr
3. Correlation Analysis
Coefficient of Determination:
Chapter 12: Sim Reg & Corr
Example Solution:
3. Correlation Analysis
Chapter 12: Sim Reg & Corr
Example Solution:
3. Correlation Analysis
Chapter 3: Describing Data
Review
Which value of r indicates a stronger correlation than 0.40? A. -0.30B. -0.50C. +0.38D. 0
If all the plots on a scatter diagram lie on a straight line, what is the standard error of estimate? A. -1B. +1C. 0D. Infinity
Chapter 3: Describing Data
Review
In the least squares equation, Ŷ = 10 + 20X the value of 20 indicates A. the Y intercept.B. for each unit increase in X, Y increases by 20.C. for each unit increase in Y, X increases by 20.D. none of these.
Chapter 3: Describing Data
Exercise
A sales manager for an advertising agency believes there is a relationship between the number of contacts and the amount of the sales. To verify this belief, the following data was collected:
What is the Y-intercept of the linear equation? A. -12.201B. 2.1946C. -2.1946D. 12.201
Chapter 12: Sim Reg & Corr
Exercise
Ŷ = -1.8182 + 0.1329XŶ = -1.8182 + 0.1329X Sample Exam P.4
Chapter 12: Sim Reg & Corr
Exercise
Sample Exam P.4
Chapter 12: Sim Reg & Corr
Exercise
Sample Exam P.4
Ŷ = -1.8182 + 0.1329XŶ = -1.8182 + 0.1329X
SummaryChapter 1: What is Statistics?
• Chapter 3: – Calculate the arithmetic mean, weighted mean, median,
mode, and geometric mean– Explain the characteristics, uses, advantages, and
disadvantages of each measure of location– Identify the position of the mean, median, and mode for
both symmetric and skewed distributions– Compute and interpret the range, mean deviation,
variance, and standard deviation– Understand the characteristics, uses, advantages, and
disadvantages of each measure of dispersion– Understand Chebyshev’s theorem and the Empirical Rule
as they relate to a set of observations