what types of data are collected? what kinds of question can be asked of those data? do people who...

17
What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data? Do people who say they study for more hours also think they’ll finish their doctorate earlier? Are computer literates less anxious about statistics? …. ? Are men more likely to study part-time? Are women more likely to enroll in CCE? …. ? Questions that Require Us To Examine Relationships Between Features of the Participants. How tall are class members, on average? How many hours a week do class members report that they study? …. ? How many members of the class are women? What proportion of the class is fulltime? …. ? Questions That Require Us To Describe Single Features of the Participants “Continuous” Data “Categorical” Data Research Is A Partnership Of Questions And Data © Willett, Harvard University Graduate School of Education, 05/13/22 S010Y/C10 – Slide 1 S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

Upload: trevor-fletcher

Post on 18-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

What Types Of Data Are Collected?

What Kinds Of Question Can Be

Asked Of Those Data?

Do people who say they study for more hours also think they’ll finish their doctorate earlier?

Are computer literates less anxious about statistics?

…. ?

Are men more likely to study part-time?

Are women more likely to enroll in CCE?

…. ?

Questions that Require Us To

Examine Relationships

Between Features of the

Participants.

How tall are class members, on average?

How many hours a week do class members report that they study?

…. ?

How many members of the class are women?

What proportion of the class is fulltime?

…. ?

Questions That Require Us To

DescribeSingle Features

of the Participants

“Continuous”

Data

“Categorical”

Data

Research Is A Partnership Of

Questions And Data

Research Is A Partnership Of

Questions And Data

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 1

S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

Page 2: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 2

S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

Just to remind you, here’s the codebook for the WALLCHT data …Just to remind you, here’s the codebook for the WALLCHT data …

Dataset WALLCHT.txt

OverviewSummary information on selected aspects of state educational performance outcomes, resource inputs, and population characteristics, in 1988.

SourceUS Department of Education and the National Center for Education Statistics.

Sample Size 50 states

Updated December 5, 2003

Col Variable

Name Description Metric

1 STATE Name of the State. Words

2 TCHRSAL Average teacher salary in the State. dollars

3 STRATIOAverage number of students per teacher statewide.

ratio

4 PPEXPENDAverage expenditure per pupil in the State.

dollars

5 HSGRADRTAverage high-school graduation rate statewide

%age

Page 3: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 3

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

This is my “best guess” for a summary linear trend line to represent the HSGRADRT vs.

STRATIO relationship.

I obtained it by a mysterious process called ordinary least-squares (OLS)

regression analysis..

66.066.0

24.724.713.313.3

78.878.8

… and the output from the analysis gives me its best prediction for

the values of HSGRADRT

(the “predicted values”).

After I have conducted my “OLS Regression Analysis,” I just pick some sensible values

of STRATIO … the MIN and MAX perhaps?

And the line that joins up the predicted

values is known as the “fitted regression

line”

Page 4: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 4

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

The “OLS” method that was actually used by the regression analysis to provide this “best

guess” for the trend …

Both the thumbtack and elastic band and the ordinary

least-squares regression approaches find that fitted linear trend line for which

the sum of the squared vertical distances of the data points from the fitted line is

the least.

Page 5: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 5

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

Here’s a couple of things to help you develop better intuition about

the nature of fitted trend lines produced by OLS Regression

Analysis.

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

A simulation that lets you try out the OLS regression

fitting algorithm for yourself.

A simulation that: Provides data examples, Lets you draw your own

version of the fitted trend line,

Then shows you what an OLS regression analysis would produce, by way of comparison.

Page 6: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 6

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data';TITLE2 'Class 10/Handout 1: Summarizing Relationships Between Continuous Variables';TITLE3 'The Infamous Wallchart Data';TITLE4 'Data in WALLCHT.txt'; *--------------------------------------------------------------------------------*Input data, name and label variables in the dataset*--------------------------------------------------------------------------------*; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; *--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; *--------------------------------------------------------------------------------*Plotting the relationship between HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10;RUN;

OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data';TITLE2 'Class 10/Handout 1: Summarizing Relationships Between Continuous Variables';TITLE3 'The Infamous Wallchart Data';TITLE4 'Data in WALLCHT.txt'; *--------------------------------------------------------------------------------*Input data, name and label variables in the dataset*--------------------------------------------------------------------------------*; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; *--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; *--------------------------------------------------------------------------------*Plotting the relationship between HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10;RUN;

Of course, you can also get PC-SAS to tell you where the OLS-fitted regression line is …Of course, you can also get PC-SAS to tell you where the OLS-fitted regression line is …

Here are the usual data

input statements

Here are the PC-SAS regression

analysis commands – we dissect them in

detail on the next slide

Creates another scatterplot of the data for use later

Page 7: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 7

*--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO;

*--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO;

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

Here’s the part of the PC_SAS program that deals specifically with the OLS Regression Analysis of the HSGRADRT versus STRATIO relationship …Here’s the part of the PC_SAS program that deals specifically with the OLS Regression Analysis of the HSGRADRT versus STRATIO relationship …

You request an OLS Regression Analysis by specifying a “Regression Model” that identifies the “Outcome” and the “Predictor(s)” to include in the analysis:

Model HSGRADRT = STRATIO

You identify the outcome variable (HSGRADRT) by

placing it to the left of the “equals” sign, in

the MODEL statement

You identify the predictor variable

(STRATIO) by placing it to the right of the “equals” sign, in the MODEL statement

PROC REG is the command in PC-SAS that requests an OLS Regression Analysis

Page 8: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 8

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

The REG Procedure Model: MODEL1 Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate  Analysis of Variance  Sum of Mean Source DF Squares Square F Value Pr > F  Model 1 337.52168 337.52168 6.07 0.0174 Error 48 2669.04952 55.60520 Corrected Total 49 3006.57120   Root MSE 7.45689 R-Square 0.1123 Dependent Mean 74.27600 Adj R-Sq 0.0938 Coeff Var 10.03943   Parameter Estimates  Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46  Parameter Estimates  Variable Label DF Pr > |t|  Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174

Here’s output from the OLS Regression Analysis of Outcome HSGRADRT on Predictor STRATIO…..Here’s output from the OLS Regression Analysis of Outcome HSGRADRT on Predictor STRATIO…..

This is the major part of the

regression output.I unpack it on the next several slides

Ignore this part of the output. When you go on to S030, you’ll learn what it

all means

Page 9: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 9

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate   Parameter Estimates  Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46  Parameter Estimates  Variable Label DF Pr > |t|  Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174

Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate   Parameter Estimates  Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46  Parameter Estimates  Variable Label DF Pr > |t|  Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174

The core part of the OLS Regression Output describes the fitted regression line..The core part of the OLS Regression Output describes the fitted regression line..

But, how do you

work with this “Fitted

Model”?

These “Parameter Estimates” tell you where PROC REG thinks that the fitted trend line should be drawn … by listing them, it’s telling you that the fitted trend line has the following algebraic

equation:

STRATIOHSGRADRTof)12.1(69.93of value

Observed value

Predicted

Page 10: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 10

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

Let’s try a couple .. Remember that the fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…

1. When STRATIO = 13.3 (the minimum value of STRATIO),

Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = 93.69 – 14.90 = 78.8

1. When STRATIO = 13.3 (the minimum value of STRATIO),

Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = 93.69 – 14.90 = 78.8

2. When STRATIO = 24.7 (the maximum value of STRATIO),

Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = 93.69 – 27.66 = 66.0

2. When STRATIO = 24.7 (the maximum value of STRATIO),

Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = 93.69 – 27.66 = 66.0

You substitute reasonable values for predictor, STRATIO, into the fitted equation and then use it to compute the best predictions – or predicted values -- for HSGRADRT, as follows:You substitute reasonable values for predictor, STRATIO, into the fitted equation and then use it to compute the best predictions – or predicted values -- for HSGRADRT, as follows:

Recognize these

values?

Page 11: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 11

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

66.066.0

24.724.713.313.3

78.878.8

Here they are … and, of course, by choosing other values of

STRATIO, the fitted equation can also tell us the location of every other point on the fitted

line in between.

To reproduce the fitted line, I just need to: Systematically

substitute all-possible values of STRATIO into the fitted equation, and

Compute corresponding predicted values of HSGRADRT.

Then, if I plotted them all, this is what I’d see

Page 12: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 12

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…

1. When STRATIO = 0 (this is a value of STRATIO that does not exist in the dataset, but provides an interesting anchor point nevertheless),

Predicted value of HSGRADRT = (93.69) + (-1.12)(0) = 93.69 – 0 = 93.69 Recognize

this value?

Page 13: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 13

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…

1. When STRATIO = 20 (or any other ad-hoc value of STRATIO that is within the sample range),

Predicted value of HSGRADRT = (93.69) + (-1.12)(20) = 93.69 – 22.4 = 71.29

2. When STRATIO = 21 (notice that this is just one unit higher than the previous value of 20)

Predicted value of HSGRADRT = (93.69) + (-1.12)(21) = 93.69 – 23.52 = 70.17

Recognize the difference in these values?

= (70.17 – 71.29)

= -1.12

Page 14: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 14

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

STRATIOADRTRHSG )12.1(69.93ˆ STRATIOADRTRHSG )12.1(69.93ˆ

This means that each term in the fitted regression model has a specific interpretation …This means that each term in the fitted regression model has a specific interpretation …

This is the predicted value of HSGRADRT, based on the OLS regression fit: Its “hat” indicates

that it is a prediction.

The predicted value represents the value of HSGRADRT that you

would expect for a State, based solely on its value

of STRATIO.

This is the estimated intercept of the fitted regression line: It tells you the

predicted value of HSGRADRT, when STRATIO is zero.

In the current context, it doesn’t make much sense to interpret it

(why?).

This is the estimated slope of the fitted regression line: It summarizes the

relationship between HSGRADRT and STRATIO.

It tells you the difference in the predicted value of HSGRADRT per unit difference in STRATIO.

Here, slope is negative, meaning

that States with student/teacher

ratios that are one child bigger will

have a graduation rate that is 1.12% lower, on average

This represents the actual values

of predictor, STRATIO

Page 15: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 15

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

It’s the estimated slope in a regression analysis that captures the relationship between outcome & predictor…..It’s the estimated slope in a regression analysis that captures the relationship between outcome & predictor…..

What would the scatterplot look like and what would the slope be, if states with larger student/teacher ratios

tended to have higher graduation rates?

What would the scatterplot look like and what would the slope be, if states with larger student/teacher ratios

tended to have higher graduation rates?

STRATIO

HSGRADRT

What would the scatterplot look like and what would the slope be, if there were no relationship between high

school graduation rate and student/teacher ratio?

What would the scatterplot look like and what would the slope be, if there were no relationship between high

school graduation rate and student/teacher ratio?

STRATIO

HSGRADRT

Here’s a simulation that let’s you create datasets with your mouse, and then shows you the OLS fitted line.Here’s a simulation that let’s you create datasets with your mouse, and then shows you the OLS fitted line.

Page 16: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 16

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate  Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46   Variable Label DF Pr > |t|  Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174

Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate  Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46   Variable Label DF Pr > |t|  Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174

Like in our categorical data analysis, we can ask whether we could have

reached this same conclusion by an accident of sampling.

Could we have gotten a slope value of –1.12 by sampling from a population in which there was no relationship between HSGRADRT and STRATIO (i.e., by sampling from a null population in which the slope was zero).

And, again, as in categorical data analysis, PROC REG provides a p-value to help you check out the effects of the idiosyncrasies of sampling:

The p-value for the HSGRADRT/STRATIO regression slope is 0.0174,

Since 0.0174 is less than .05, we can reject the null hypothesis that there is no relationship between HSGRADRT and STRATIO, in the population.

Page 17: What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll

© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 17

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

The Story So Far …In our investigation of state-level aggregate statistics, we have found that, on average, the percentage of seniors graduating from High School is lower in states with a higher student/teacher ratio.

When state-wide high-school graduation rate (HSGRADRT) is treated as outcome and state-wide student/teacher ratio (STRATIO) is treated as the predictor, we find that the trend-line estimated by ordinary least-squares regression analysis has a slope of –1.12 (p = 0.0174).

This tells us that two states whose student/teacher ratios differ by 1 student per teacher will tend to have graduation rates that differ by 1.12 percentage points, with states that enjoy lower student/teacher ratios tending to have the higher high-school graduation rates.