describing relationships …. relationships between... talk to the person next to you. think of two...

Post on 27-Dec-2015

229 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DESCRIBING RELATIONSHIPS …

RELATIONSHIPS BETWEEN ...

Talk to the person next to you. Think of two things that you believe may be related. For example, height and weight are generally related... The taller the person, generally, the more they weigh.

Write your two numerical categories that you believe are related on the board.

DO YOU BELIEVE THERE IS THERE A RELATIONSHIP BETWEEN...

•TIME SPENT STUDYING AND GPA?

•# OF CIGARETTES SMOKED DAILY & LIFE EXPECTANCY

•SALARY AND EDUCATION LEVEL?

•AGE AND HEIGHT?

•AGE OF AUTOMOBILE AND VALUE OF AUTOMOBILE VALUE?

RELATIONSHIPS

When we consider (possible) relationships between 2 (numeric) variables, the data is referred to as bi-variate data.

There may or may not exist a relationship/an association between the 2 variables.

Does one variable ‘cause’ the other? Caution!

Does one variable influence the other? Or is the relationship influenced by another variable(s) that we are unaware of?

BIVARIATE DATA

Proceed similarly as uni-variate distributions …

Still graph (use model to describe data; scatter plot; LSRL)

Still look at overall patterns and deviations from those patterns (DOFS; Direction, Outlier(s), Form, Strength; or Trends, Strength, Shape)

Still analyze numerical summary (descriptive statistics)

BIVARIATE DISTRIBUTIONSExplanatory variable, x, ‘factor,’ may help predict or explain changes in response variable; usually on horizontal axis

Response variable, y, measures an outcome of a study, usually on vertical axis

BIVARIATE DATA DISTRIBUTIONS

For example ... Alcohol (explanatory) and body temperature (response). Generally, the more alcohol consumed, the higher the body temperature. Still use caution with ‘cause.’

Sometimes we don’t have variables that are clearly explanatory and response.

Sometimes there could be two ‘explanatory’ variables.

Examples: Discuss with a partner for 1 minute

EXPLANATORY & RESPONSE OR TWO EXPLANATORY VARIABLES?

ACT Score and SAT Score

Activity level and physical fitness

SAT Math and SAT Verbal Scores

GRAPHICAL MODELS…

Many graphing models display uni-variate data exclusively (review). Discuss for 30 seconds and share out.

Main graphical representation used to display bivariate data (two quantitative variables) is scatterplot.

SCATTERPLOTSScatterplots show relationship between two quantitative variables measured on the same individuals

Each individual in data appears as a point (x, y) on the scatterplot.

Plot explanatory variable (if there is one) on horizontal axis. If no distinction between explanatory and response, either can be plotted on horizontal axis.

Label both axes. Scale both axes with uniform intervals (but scales don’t have to match)

LABEL & SCALE SCATTERPLOTVARIABLES: CLEARLY EXPLANATORY AND RESPONSE??

CREATING & INTERPRETING SCATTERPLOTS

Let’s collect some data

On board, write your height (in inches) and your weight (in pounds)

Input into Minitab (graph, scatterplot)

INTERPRETING SCATTERPLOTS

Look for overall patterns (DOFS) including:

•direction: up or down, + or – association?

•outliers/deviations: individual value(s) falls outside overall pattern; no outlier rule for bi-variate data –unlike uni-variate data

•form: linear? curved? clusters? gaps?

•strength: how closely do the points follow a clear form? Strong, weak, moderate?

SCATTERPLOTS: NOTE

Might be asked to graph a scatterplot from data

Might need to sketch what’s on Minitab

Doesn’t have to be 100% exactly accurate; do your best

Scaling, labeling: a must!

MEASURING LINEAR ASSOCIATIONScatterplots (bi-variate data) show direction, outliers/ deviation(s), form, strength of relationship between two quantitative variables

Linear relationships are important; common, simple pattern

Linear relationship is strong if points are close to a straight line; weak if scattered about

Other relationships (quadratic, logarithmic, etc.)

HOW STRONG ARE THESE RELATIONSHIPS? WHICH ONE IS STRONGER?

MEASURING LINEAR ASSOCIATION: CORRELATION OR “R”

Eyes are not a good judge

Need to specify just how strong or weak a linear relationship is

Need a numeric measure

Correlation or ‘r’

MEASURING LINEAR ASSOCIATION: CORRELATION OR “R”* Correlation (r) is a numeric measure of direction and strength of a linear relationship between two quantitative variables

• Correlation (r) is always between -1 and 1

• Correlation (r) is not resistant (look at formula; based on mean)

• R doesn’t tell us about individual data points, but rather trends in the data

* Never calculate by formula; use Minitab (dependent on having raw data)

1r1

MEASURING LINEAR ASSOCIATION: CORRELATION OR “R”

r ≈0 not strong linear relationship

r close to 1 strong positive linear relationship

r close to -1 strong negative linear relationship

GUESS THE CORRELATIONWWW.ROSSMANCHANCE.COM/APPLETS

‘March Madness’ bracket-style Guess the Correlation tournament

Number off; randomly choose numbers to match up head-to-head competition/rounds

Look at a scatterplot, each write down your guess on notecards and reveal at same time

Student who is closest survives until the next round

CAUTION… INTERPRETING CORRELATIONNote: be careful when addressing form in scatterplots

Strong positive linear relationship ► correlation ≈ 1

But

Correlation ≈ 1 does not necessarily mean relationship is linear; always plot data!

R ≈ 0.816 FOR EACH OF THESE

CALCULATING CORRELATION “R”

n, x1, x2, etc., , y1, y2, etc., , sx, sy, …

CALCULATING CORRELATION “R”

Let’s calculate r for our height & weight data and determine how weak or strong the linear relationship is with our data

Stat, regression, fitted line

FACTS ABOUT CORRELATION

Correlation doesn’t care which variables is considered explanatory and which is considered response

Can switch x & y

Still same correlation (r) value

CAUTION! Switching x & y WILL change your scatterplot… just not ‘r’

FACTS ABOUT CORRELATION

r is in standard units, so r doesn’t change if units are changed

If we change from yards to feet, r is not effected

+ r, positive association

- r, negative association

FACTS ABOUT CORRELATION

Correlation is always between -1 & 1

Makes no sense for r = 13 or r = -5

r = 0 means very weak linear relationship

r = 1 or -1 means strong linear association

FACTS ABOUT CORRELATION

Both variables must be quantitative, numerical. Doesn’t make any sense to discuss r for qualitative or categorical data

Correlation is not resistant (like mean and SD). Be careful using r when outliers are present

FACTS ABOUT CORRELATION

r isn’t enough! … mean, standard deviation, graphical representation

Correlation does not imply causation; i.e., # students who own cell phones and # students passing AP exams

ABSURD EXAMPLES… CORRELATION DOES NOT IMPLY CAUSATION…

Did you know that eating chocolate makes winning a Nobel Prize more likely? The correlation between per capita chocolate consumption and the number of Nobel laureates per 10 million people for 23 selected countries is r = 0.791

Did you know that statistics is causing global warming? As the number of statistics courses offered has grown over the years, so has the average global temperature!

LEAST SQUARES REGRESSION

Last section… scatterplots of two quantitative variables

r measures strength and direction of linear relationship of scatterplot

LEAST SQUARES REGRESSION

BETTER model to summarize overall pattern by drawing a line on scatterplot

Not any line; we want a best-fit line over scatterplot

Least Squares Regression Line (LSRL)

LEAST-SQUARES REGRESSION LINE

LEAST SQUARES REGRESSION (PREDICTS VALUES)

LSRL Model:

is predicted value of response variable

a is y-intercept of LSRL

b is slope of LSRL; slope is predicted (expected) rate of change

x is explanatory variable

LEAST SQUARES REGRESSION(PREDICTS VALUES)Often will be asked to interpret slope of LSRL & y-intercept, in context

Caution: Interpret slope of LSRL as the predicted or average change or expected change in the response variable given a unit change in the explanatory variable

NOT change in y for a unit change in x; LSRL is a model; models are not perfect

LSRL: OUR DATA

Go back to whole-class data on height and weight

Now let’s put our LSRL on our scatterplot & determine the equation of the LSRL

Minitab: stat, regression, fitted line plot

LSRL: OUR DATALook at graph of our LSRL for our data

Look at our LSRL equation for our data

Our line fits scatterplot well (best fit) but not perfectly

Make some predictions… what if our height was … what if our weight was …

Interpret our y-intercept; does it make sense? Interpretation of our slope?

ANOTHER EXAMPLE… VALUE OF A TRUCK

TRUCK EXAMPLE…

Suppose we were given the LSRL equation for our truck data as

We want to find a more precise estimation of the value if we have driven 100,000 miles. Use the LSRL equation.

Using graph, estimate price if we have driven 40,000 miles. Then use the above LSRL equation to calculate the predicted value of the truck.

AGES & HEIGHTS…Age (years) Height (inches)

0 18

1 28

4 40

5 42

8 49

LET’S REVIEW FOR A MOMENT, SHALL WE …

Input into Minitab

Create scatterplot and describe scatterplot (what do we include in a description?)

Calculate r (btw, different from slope; why?), equation of LSRL; interpret equation of LSRL in context; does y-intercept make sense?

Based on this data, make a prediction as to the height of a person at age 25.

LSRL: OUR DATA

Extrapolation: Use of a regression line for prediction outside the range of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate.

Friends don’t let friends extrapolate!

CALCULATING THE EQUATION OF THE LSRL: WHAT IF WE DON’T HAVE THE RAW DATA?

We still can calculate the equation for the LSRL, but a little more time consuming

Note: Every LSRL goes through the point (, )

Formula for slope of LSRL:

LSRL:

CALCULATING THE EQUATION FOR THE LSRL: WHAT IF WE DON’T HAVE THE RAW DATA?

Equation of LSRL:

If you do not have raw data, but still need to calculate a LSRL, you will be given:

, ,

Remember, ( is an ordered pair that is on the graph of the LSRL

EXAMPLE: CREATING EQUATION OF LSRL (WITHOUT RAW DATA)

•= a + b (# of beers consumed)

(equation of LSRL in context – better than x & y)

Remember, slope formula of LSRL:

Givens:

Calculate slope for equation of LSRL

EXAMPLE: CREATING EQUATION OF LSRL (WITHOUT RAW DATA)

= a + b (# of beers consumed)

Givens:

,

So, slope = b = .0179

Remember, equations of all LSRL’s go through … so what’s next?

EXAMPLE: CREATING EQUATION OF LSRL (WITHOUT RAW DATA)

= a + b (# of beers consumed)

Givens:

,

Substitute ( into equation

EXAMPLE: CREATING EQUATION OF LSRL(WITHOUT RAW DATA)

0.07375 = a + (.0179) ( 4.8125) and solve for ‘a’

= a + b (# of beers consumed)

= -0.0123 + 0.0179 (# of beers consumed)

INTERPRETING SOFTWARE OUTPUT…

Age vs. Gesell Score

DETOUR… MEMORY MONDAY (OR WAY-BACK WEDNESDAY)…

What is r? What is r’s range?

r tells us how linear (and direction) scatterplot is. ‘r’ ranges from -1 to 1. ‘r’ describes the scatterplot only (not LSRL)

NOW…

We need a numerical measurement that tells us how well the LSRL fits

Coefficient of Determination, or

COEFFICIENT OF DETERMINATION …

Do all the points on the scatterplot fall exactly on the LSRL?

Sometimes too high and sometimes too low

Is LSRL a good model to

use for a particular data

set?

How well does our model

fit our data?

COEFFICIENT OF DETERMINATION OR

“R-sq” software output

Always

Never calculate by hand; always use Minitab

No need to memorize formula; trust me … it’s ugly

COEFFICIENT OF DETERMINATION OR

Remember “r” correlation, direction and strength of linear relationship of scatterplot

, coefficient of determination, fraction of the variation in the values of y that are explained by LSRL, describes to LSRL

COEFFICIENT OF DETERMINATION OR

Interpretation of :

We say, “x% of variation in (y variable) is explained by LSRL relating (y variable) to (x variable).”

FACTS TO REMEMBER ABOUT LSRL

Distinction between explanatory and response variables.

If switched, scatterplot changes and LSRL changes (but what doesn’t change?)

LSRL minimizes distances from data points to line only vertically

FACTS TO REMEMBER ABOUT LSRL

Close relationship between correlation (r) and slope of LSRL; but r and b are (often) not the same; when would r and b have the same value?

LSRL always passes through (

Don’t have to have raw data to identify the equation of LSRL

FACTS TO REMEMBER ABOUT LSRL

Correlation (r) describes direction and strength of straight-line relationships in scatterplots

Coefficient of determination () is the fraction of variation in values of y explained by LSRL

CORRELATION & REGRESSION WISDOM

Which of the following scatterplots has the highest correlation?

CORRELATION & REGRESSION WISDOM

All r = 0.816; all have same exact LSRL equation

Lesson: Always graph your data! … because correlation and

regression describe only linear relationships

CORRELATION & REGRESSION WISDOM

Correlation and regression describe only linear relationships

CORRELATION & REGRESSION WISDOMCorrelation is not causation! Association does not imply causation… want a Nobel Prize? Eat some chocolate! How about Methodist ministers & rum imports?

Year Number of Methodist Ministers in New England

Cuban Rum Imported to Boston (in # of barrels)

1860 63 8,376

1865 48 6,506

1870 53 7,005

1875 64 8,486

1890 85 11,265

1900 80 10,547

1915 140 18,559

BEWARE OF NONSENSE ASSOCIATIONS…r = 0.9749, but no economic relationship between these variables

Strong association is due entirely to the fact that both imports & health spending grew rapidly in these years.

Common year is other variable.

Any two variables that both increase

over time will show a strong association.

Doesn’t mean one explains the other

or influences the other

CORRELATION & REGRESSION WISDOM

Correlation is not resistant; always plot data and look for unusual trends.

… what if Bill Gates walked into a bar?

CORRELATION & REGRESSION WISDOM

Extrapolation! Don’t do it… ever.

Example: Growth data from children from age 1 month to age 12 years … LSRL

What is the predicted height of a 40-year old?

OUTLIERS & INFLUENTIAL POINTS

All influential points are outliers, but not all outliers are influential points.

OUTLIERS & INFLUENTIAL POINTSOutlier: observation lies outside overall pattern

Points that are outliers in the ‘y’ direction of scatterplot have large residuals.

Points that are outliers in the

‘x’ direction of scatterplot may

not necessarily have large

residuals.

OUTLIERS & INFLUENTIAL POINTS

Influential points/observations: If removed would significantly change LSRL (slope and/or y-intercept)

CLASS ACTIVITY…Groups of 2 or 3; measure each other’s head circumferences & arm spans (both in inches, rounded to the nearest ½ “). Write data on board

Create scatterplot and describe the association between head circumference and arm span.

Is a regression line appropriate for our data? Why or why not? If so, create LSRL graph & equation, calculate the correlation and the coefficient of determination

Interpret the slope and the y-intercept of the LSRL

What does it mean if a point falls above the LSRL? Below the LSRL?

top related