introduction to correlation and regression
DESCRIPTION
Introduction to Correlation and Regression. Rahul Jain. Outline. Introduction Linear Correlation Regression Simple Linear Regression Model/Formulas. Outline continued. Applications Real-life Applications Practice Problems Internet Resources Applets Data Sources. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/1.jpg)
Introduction toIntroduction to
Correlation and RegressionCorrelation and Regression
Rahul JainRahul Jain
![Page 2: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/2.jpg)
OutlineOutline Introduction Introduction
Linear CorrelationLinear Correlation
Regression Regression Simple Linear Simple Linear
Regression Regression Model/FormulasModel/Formulas
![Page 3: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/3.jpg)
Outline continuedOutline continued ApplicationsApplications
Real-life ApplicationsReal-life Applications Practice ProblemsPractice Problems
Internet Resources Internet Resources Applets Applets Data SourcesData Sources
![Page 4: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/4.jpg)
CorrelationCorrelation Correlation Correlation
A measure of association between A measure of association between two numerical variables.two numerical variables.
Example (positive correlation)Example (positive correlation) Typically, in the summer as the Typically, in the summer as the
temperature increases people are temperature increases people are thirstier.thirstier.
![Page 5: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/5.jpg)
Specific Example Specific Example
For seven For seven random summer random summer days, a person days, a person recorded the recorded the temperaturetemperature and and their their water water consumptionconsumption, , during a three-hour during a three-hour period spent period spent outside. outside.
Temperature (F)
Water Consumption
(ounces)
75 16
83 20
85 25
85 27
92 32
97 48
99 48
![Page 6: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/6.jpg)
How would you describe the graph?How would you describe the graph?
![Page 7: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/7.jpg)
How “strong” is the linear relationship?How “strong” is the linear relationship?
![Page 8: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/8.jpg)
Measuring the RelationshipMeasuring the Relationship
Pearson’s Sample Pearson’s Sample Correlation Coefficient, Correlation Coefficient, rr
measures the measures the directiondirection and the and the strengthstrength of the linear association of the linear association
between two numerical paired between two numerical paired variables.variables.
![Page 9: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/9.jpg)
Direction of AssociationDirection of Association
Positive CorrelationPositive Correlation Negative CorrelationNegative Correlation
![Page 10: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/10.jpg)
Strength of Linear AssociationStrength of Linear Association
r value Interpretation
1perfect positive linear
relationship
0 no linear relationship
-1perfect negative linear
relationship
![Page 11: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/11.jpg)
Strength of Linear AssociationStrength of Linear Association
![Page 12: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/12.jpg)
Other Strengths of AssociationOther Strengths of Association
r value Interpretation
0.9 strong association
0.5 moderate association
0.25 weak association
![Page 13: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/13.jpg)
Other Strengths of AssociationOther Strengths of Association
![Page 14: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/14.jpg)
FormulaFormula
= the sum n = number of paired items
xi = input variableyi
= output variable
x = x-bar = mean of x’s
y = y-bar = mean of y’s
sx= standard deviation of x’s
sy= standard deviation of y’s
![Page 15: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/15.jpg)
403.7 15
Coefficient of Determination: RCoefficient of Determination: R22 A quantification of the significance of
estimated model is denoted by R2.
R2 > 85% = significant model R2 < 85% = model is perceived as
inadequate Low R2 will suggest a need for additional
predictors for modeling the mean of Y
bxay ˆ
![Page 16: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/16.jpg)
RegressionRegression
RegressionRegression
Specific statistical methodsSpecific statistical methods for for finding the “line of best fit” for one finding the “line of best fit” for one response (dependent) numerical response (dependent) numerical variable based on one or more variable based on one or more explanatory (independent) explanatory (independent) variables.variables.
![Page 17: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/17.jpg)
Curve Fitting vs. RegressionCurve Fitting vs. Regression
RegressionRegression
Includes using statistical methods Includes using statistical methods to assess the "goodness of fit" of to assess the "goodness of fit" of the model. (ex. Correlation the model. (ex. Correlation Coefficient)Coefficient)
![Page 18: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/18.jpg)
Regression: 3 Main PurposesRegression: 3 Main Purposes
To describeTo describe (or model) (or model)
To predictTo predict ( (or estimate) or estimate)
To controlTo control (or administer) (or administer)
![Page 19: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/19.jpg)
Simple Linear RegressionSimple Linear Regression
Statistical method for findingStatistical method for finding the “line of best fit” the “line of best fit”
for one response (dependent) for one response (dependent) numerical variable numerical variable
based on one explanatory based on one explanatory (independent) variable. (independent) variable.
![Page 20: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/20.jpg)
Least Squares RegressionLeast Squares Regression GOAL GOAL - -
minimize the minimize the sum of the sum of the square of square of the errors of the errors of the data the data points.points.
This minimizes the This minimizes the Mean Square ErrorMean Square Error
![Page 21: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/21.jpg)
ExampleExample
Plan an outdoor party.Plan an outdoor party.
EstimateEstimate number of soft drinks to buy number of soft drinks to buy per person, based on how hot the per person, based on how hot the weather is.weather is.
Use Temperature/Water data and Use Temperature/Water data and regressionregression..
![Page 22: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/22.jpg)
Steps to Reaching a SolutionSteps to Reaching a Solution Draw a scatterplot of the data.Draw a scatterplot of the data.
![Page 23: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/23.jpg)
Steps to Reaching a SolutionSteps to Reaching a Solution Draw a scatterplot of the data.Draw a scatterplot of the data. Visually, consider the strength of the Visually, consider the strength of the
linear relationship.linear relationship.
![Page 24: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/24.jpg)
Steps to Reaching a SolutionSteps to Reaching a Solution Draw a scatterplot of the data.Draw a scatterplot of the data. Visually, consider the strength of the Visually, consider the strength of the
linear relationship.linear relationship. If the relationship appears relatively If the relationship appears relatively
strong, find the correlation coefficient strong, find the correlation coefficient as a numerical verification.as a numerical verification.
![Page 25: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/25.jpg)
Steps to Reaching a SolutionSteps to Reaching a Solution Draw a scatterplot of the data.Draw a scatterplot of the data. Visually, consider the strength of the Visually, consider the strength of the
linear relationship.linear relationship. If the relationship appears relatively If the relationship appears relatively
strong, find the correlation coefficient strong, find the correlation coefficient as a numerical verification.as a numerical verification.
If the correlation is still relatively If the correlation is still relatively strong, then find the simple linear strong, then find the simple linear regression line.regression line.
![Page 26: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/26.jpg)
Interpreting and VisualizingInterpreting and Visualizing Interpreting the result: Interpreting the result:
y = ax + by = ax + b
The valueThe value ofof aa is the is the slopeslope The value of The value of bb is the is the y-intercepty-intercept rr is the is the correlation coefficientcorrelation coefficient rr22 is the is the coefficient of determinationcoefficient of determination
![Page 27: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/27.jpg)
5. Interpreting and Visualizing5. Interpreting and Visualizing Write down the equation of the Write down the equation of the
line in slope intercept form. line in slope intercept form. Press Press Y=Y= and enter the equation and enter the equation
under Y1. (Clear all other under Y1. (Clear all other equations.) equations.)
Press Press GRAPHGRAPH and the line will and the line will be graphed through the data be graphed through the data points.points.
![Page 28: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/28.jpg)
Questions ???Questions ???
![Page 29: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/29.jpg)
Interpretation in ContextInterpretation in Context
Regression Equation: Regression Equation:
y=1.5*x - 96.9y=1.5*x - 96.9
Water Consumption = Water Consumption = 1.5*Temperature - 96.9 1.5*Temperature - 96.9
![Page 30: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/30.jpg)
Interpretation in ContextInterpretation in Context
Slope = 1.5 (ounces)/(degrees F)Slope = 1.5 (ounces)/(degrees F)
for each 1 degree F increase in for each 1 degree F increase in temperature, you expect an increase temperature, you expect an increase of 1.5 ounces of water drank.of 1.5 ounces of water drank.
![Page 31: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/31.jpg)
Interpretation in ContextInterpretation in Context
y-intercept = -96.9y-intercept = -96.9
For this example, For this example, when the temperature is 0 degrees F, when the temperature is 0 degrees F, then a person would drink about -97 then a person would drink about -97 ounces of water. ounces of water.
That does not make any sense! That does not make any sense! Our model is not applicable for x=0. Our model is not applicable for x=0.
![Page 32: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/32.jpg)
Prediction ExamplePrediction Example
Predict Predict the amount of the amount of water a person would drink when the water a person would drink when the temperature is temperature is 95 degrees F.95 degrees F.
Solution:Solution: Substitute the value of x=95 Substitute the value of x=95 (degrees F) into the regression equation (degrees F) into the regression equation and solve for y (water consumption).and solve for y (water consumption).
If x=95, y=1.5*95 - 96.9 = If x=95, y=1.5*95 - 96.9 = 45.6 ounces. 45.6 ounces.
![Page 33: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/33.jpg)
Simple Linear Regression ModelSimple Linear Regression Model
The model for The model for simple linear regression issimple linear regression is
There are mathematical assumptions There are mathematical assumptions
behind the concepts thatbehind the concepts that we are covering today.we are covering today.
![Page 34: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/34.jpg)
FormulasFormulas
Prediction Equation: Prediction Equation:
![Page 35: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/35.jpg)
Standard error of the estimateStandard error of the estimate
![Page 36: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/36.jpg)
Real Life ApplicationsReal Life Applications
Cost Estimating for Future Space Cost Estimating for Future Space Flight Vehicles (Multiple Flight Vehicles (Multiple
Regression)Regression)
![Page 37: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/37.jpg)
Real Life ApplicationsReal Life Applications Estimating Seasonal Sales for Estimating Seasonal Sales for
Department Stores (Periodic)Department Stores (Periodic)
![Page 38: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/38.jpg)
Real Life ApplicationsReal Life Applications Predicting Student Grades Based Predicting Student Grades Based
on Time Spent Studyingon Time Spent Studying
![Page 39: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/39.jpg)
Real Life ApplicationsReal Life Applications
. . .. . .
What ideas can you think of?What ideas can you think of?
What ideas can you think of that What ideas can you think of that your students will relate to?your students will relate to?
![Page 40: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/40.jpg)
Practice ProblemsPractice Problems
Is there any correlation between Is there any correlation between shoe size and height? shoe size and height?
Does gender make a difference Does gender make a difference in this analysis?in this analysis?
![Page 41: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/41.jpg)
Practice ProblemsPractice Problems Can the number of points Can the number of points
scored in a basketball game be scored in a basketball game be predicted by predicted by The time a player plays in The time a player plays in
the game?the game?
By the player’s height?By the player’s height?
Idea modified from Steven King, Aiken, Idea modified from Steven King, Aiken, SC. NCTM presentation 1997.)SC. NCTM presentation 1997.)
![Page 42: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/42.jpg)
ResourcesResources Data Analysis and StatisticsData Analysis and Statistics. .
Curriculum and Evaluation Curriculum and Evaluation Standards for School Standards for School Mathematics. Addenda Series, Mathematics. Addenda Series, Grades 9-12. NCTM. 1992.Grades 9-12. NCTM. 1992.
Data and Story LibraryData and Story Library. . Internet Website. Internet Website. http://lib.stat.cmu.edu/DASL/ http://lib.stat.cmu.edu/DASL/ 2001. 2001.
![Page 43: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/43.jpg)
Internet ResourcesInternet Resources CorrelationCorrelation
Guessing CorrelationsGuessing Correlations - An - An interactive site that allows you to interactive site that allows you to try to match correlation coefficients try to match correlation coefficients to scatterplots. University of Illinois, to scatterplots. University of Illinois, Urbanna Champaign Statistics Urbanna Champaign Statistics Program. Program. http://www.stat.uiuc.edu/~stat100/jhttp://www.stat.uiuc.edu/~stat100/java/guess/GCApplet.htmlava/guess/GCApplet.html
![Page 44: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/44.jpg)
Internet ResourcesInternet Resources
RegressionRegression Effects of adding an Effects of adding an
OutlierOutlier. .
W. West, University of South W. West, University of South Carolina. Carolina.
http://www.stat.sc.edu/~west/http://www.stat.sc.edu/~west/javahtml/Regression.htmljavahtml/Regression.html
![Page 45: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/45.jpg)
Internet ResourcesInternet Resources RegressionRegression
Estimate the Regression LineEstimate the Regression Line. . Compare the mean square error Compare the mean square error from different regression lines. Can from different regression lines. Can you find the minimum mean square you find the minimum mean square error? Rice University Virtual error? Rice University Virtual Statistics Lab. Statistics Lab. http://www.ruf.rice.edu/~lane/stat_sihttp://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.htmlm/reg_by_eye/index.html
![Page 46: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/46.jpg)
Internet Resources: Data SetsInternet Resources: Data Sets Data and Story Library. Data and Story Library.
Excellent source for small data sets. Excellent source for small data sets. Search for specific statistical methods Search for specific statistical methods (e.g. boxplots, regression) or for data (e.g. boxplots, regression) or for data concerning a specific field of interest concerning a specific field of interest (e.g. health, environment, sports). (e.g. health, environment, sports). http://lib.stat.cmu.edu/DASL/http://lib.stat.cmu.edu/DASL/
![Page 47: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/47.jpg)
Internet ResourcesInternet Resources OtherOther
Statistics Applets. Using Web Statistics Applets. Using Web Applets to Assist in Statistics Applets to Assist in Statistics Instruction. Robin Lock, St. Instruction. Robin Lock, St. Lawrence University. Lawrence University. http://it.stlawu.edu/~rlock/maa99http://it.stlawu.edu/~rlock/maa99//
![Page 48: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/48.jpg)
Internet ResourcesInternet Resources OtherOther
Ten Websites Every Statistics Ten Websites Every Statistics Instructor Should Bookmark. Instructor Should Bookmark. Robin Lock, St. Lawrence Robin Lock, St. Lawrence University. University. http://it.stlawu.edu/~rlock/10sithttp://it.stlawu.edu/~rlock/10sites.htmles.html
![Page 49: Introduction to Correlation and Regression](https://reader033.vdocuments.us/reader033/viewer/2022051216/5681521d550346895dc06154/html5/thumbnails/49.jpg)
For More Information…For More Information…
On-line version of this presentationOn-line version of this presentationhttp://www.mtsu.edu/~statshttp://www.mtsu.edu/~stats
/corregpres/index.html/corregpres/index.html
More information about regressionMore information about regressionVisit Visit STATS @ MTSUSTATS @ MTSU web site web site
http://www.mtsu.edu/~statshttp://www.mtsu.edu/~stats