continuous outcome, dependent variable (y-axis) child’s height

24
Continuou s Continuous Categorical Histogram Scatter Boxplot Predic tor Variab le (X- Axis) Child’s Height Outcome, Dependent Variable (Y-Axis) Linear Regression Regression Model Parents Height Gender

Upload: melvyn-young

Post on 19-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Correlation

TRANSCRIPT

Continuous

Continuous

Categorical

Histogram

Scatter

Boxplot

Predictor Variable(X-Axis)

Child’s Height

Outcome, Dependent Variable

(Y-Axis)

LinearRegression

Regression Model

Parents Height

Gender

Correlation

Correlation Matrix

Analytics & History: 1st Regression Line

The first “Regression Line”

Slide 5

Describing a Straight Line

• bi• Regression coefficient for the predictor• Gradient (slope) of the regression line• Direction/strength of relationship

• b0• Intercept (value of Y when X = 0)• Point at which the regression line crosses the Y-

axis (ordinate)

0i i i iY b b X

Which line fits the best?

Sum of Squares• Total sum of squares• Model sum of squares• Residual sum of squares• F• R2

Slide 8

• SST• Total variability (variability between scores and the mean).

• SSR• Residual/error variability (variability between the regression

model and the actual data).

• SSM • Model variability (difference in variability between the model

and the mean).

Sum of Squares

Testing the Model: ANOVA

• If the model results in better prediction than using the mean, then we expect SSM to be much greater than SSR

SSR

Error in Model

SSM

Improvement Due to the Model

SST

Total Variance in the Data

Linear Model - Regression

• lm() function – lm stands for ‘linear model’.

Model <-lm(outcome ~ predictor(s), data = dataFrame, na.action = an action))

model.1 <- lm(childHeight~father, data = heights)

Correlation

Model 1

Slide 15

Testing the Model: R2

• R2

• The proportion of variance accounted for by the regression model.• The Pearson Correlation Coefficient Squared

M

T

SS2SS R

Residuals

Predictionpredict(model.1)

heights$model1 <- predict(model.1)

Compare Models

Model 1 2 12 3 4

Intercept 40.1 46.6 22.6 22.63 22.64

Father 0.385 0.36 0.01

Mom 0.314 0.29 NA

midparentHeight 0.637 0.538

R-squares 0.070 0.0395 0.105 0.102 0.1033

r 0.27 0.2 0.32

R^2 0.073 0.04 0.102

Box Plot

Descriptive Stats: Box Plot

Regression: Children Heights~Gendermodel.5 <- lm(childHeight~gender, data = h)

Linear Regression Comparison

Model 1 2 12 3 4 5 6 7

Intercept 40.1 46.6 22.6 22.6 22.6 64.1 16.5 16.5

Father 0.385 0.36 x 0.39

Mom 0.314 0.29 x 0.31

midparentHeight

0.637 0.538 0.687

Gender 5.13 5.21 5.21

R-squares 0.070 0.0395 0.105 0.102 0.1033 0.5137 0.632 0.634

r 0.27 0.2 0.32 0.717

R^2 0.073 0.04 0.102 0.5137

Model Specification & Prediction

Outcome = (Model) + Error

Height = 16.5 + 0.39*father + 0.21mother + 5.21Gender + error Gender:Male: 1Female: 0