dan gillen department of statistics university of ...dgillen/stat211/handouts/lecture1.pdf ·...

39
Lecture 1 Stat 211 - D. Gillen Course Syllabus Course Outline Goals and Steps of Data Analysis Scientific Investigation Statistical Goals and Modeling Strategies Examples Regression Modeling Towards a general framework Notation Generalized regression Generalized linear regression Examples of generalized linear regression models A prelude to estimation 1.1 Lecture 1 Introduction Statistics 211 - Statistical Methods II Presented January 8, 2018 Dan Gillen Department of Statistics University of California, Irvine

Upload: others

Post on 14-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.1

Lecture 1

IntroductionStatistics 211 - Statistical Methods II

Presented January 8, 2018

Dan GillenDepartment of Statistics

University of California, Irvine

Page 2: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.2

Logistics and Contact Information

Lectures: Monday and Wednesday,9:30-11:50, DBH 1423

Discussion: Tuesday, 11:00-11:50, SST 122

Course web site: http://www.ics.uci.edu/∼dgillen/STAT211

Instructor: Dan GillenProfessor and ChairDepartment of StatisticsOffice: 2038 Donald Bren HallTelephone: 4-9862E-mail: [email protected]

Office hours: Monday 11:00-12:00, &Tuesday 12:00-1:00,or by appointment

Page 3: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.3

Description and Textbooks

Prerequisites: Statistics 210 or equivalent, orpermission of instructor

Description: This course will introduce theory andmethods for analyzing non-normaloutcomes. We will focus on developinga general theoretical framework forregression models and using thatframework to address scientific questions.

Required text: Agresti A.;Foundations of Linearand Generalized Linear Models

Page 4: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.4

Description and Textbooks

Reference texts: McCullagh P. and Nelder, J.;Generalized Linear Models

Hastie T, Tibshirani R, and Friedman J.,Elements of Statistical Learning

Casella G. and Berger R.,Statistical Inference (2nd Ed)

Page 5: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.5

Software/Computing

I Examples that are presented in class are primarily doneusing the R statistical package

I R is free software and can be installed on multipleplatforms

I You can download R at

http://www.r-project.org/

I I (highly) recommend that you use R but you may chooseto use any other software package that allows you tocomplete the assigned coursework

Page 6: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.6

Assignments, Exams and Grading

Homework: There will be a total of 6-7 homeworkassignments. Assignments will typicallybe due 1-1.5 weeks from the date they arehanded out.

Midterm Exam: Tentatively scheduled for Wednesday, Feb21st. The exam will be in-class (closed-book, closed-note), and will cover materialthrough the Wednesday, Feb 14th lecture.

Final Exam: The final exam is scheduled for Wednesday,March 21st. The final exam will be a take-,home handed out on Wednesday, March14th and due on Wednesday, March 21stby 10am.

Grading: Homework: 35%Midterm: 30%Final: 35%

Page 7: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.7

Stat 211 Course Outline

Introduction to generalized regression (.5 week)

I Goals of scientific studiesI Towards a general regression framework

Part II - Review of linear regression (2.5 weeks)

I Ordinary least squares theoryI Implications of assumptionsI Performance of estimators when assumptions fail

I Regression modeling strategiesI PredictionI Association estimation/testing

I Weighted least squares

Page 8: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.8

Stat 211 Course Outline

Part III - Review of asymptotic theory (1 weeks)

I Review of theoretical toolsI Asymptotic resultsI Likelihood theoryI Asymptotic inference

Part IV - GLMs for non-normal data (5.5 weeks)

I Generalized linear modelsI Components of GLMsI Fitting GLMs

I Logistic regressionI Model assumptions and diagnosticsI Parameter interpretations

I Poisson regressionI Model assumptions and diagnosticsI Parameter interpretations

Page 9: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.9

Stat 211 Course Outline

Part IV - GLMs for non-normal data cont’d (5.5 weeks)

I OverdispersionI Quasi-likelihoodI Polytomous response methods

I Proportional odds modelI Multinomial logistic regression

Part V - Bayesian estimation for GLMs (time permitting)

I Inclusion of prior distributionsI Model fittingI Interpretation

Page 10: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.10

Scientific Investigation

First Stage of Scientific Investigation

I Hypothesis generationI Observation

I Measurement of existing populations or systemsI Disadvantages:

I ConfoundingI Limited ability to establish cause and effect

Further Stages of Scientific Investigation

I Refinement and confirmation of hypothesesI ExperimentI Intervention

I Elements of experimentI Overall goal and specific aims (hypotheses)I Materials and methodsI Collection of dataI AnalysisI Interpretation; Refinement of hypotheses

Page 11: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.11

Data Analysis

Common aims of data analysis

I A statistical analysis is often geared towards (at least) onethree common objectives

1. Hypothesis generation (description, exploration)

2. Hypothesis testing (inference about associations betweenvariables in a population)

3. Prediction (for future sampling units)

Page 12: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.12

Data Analysis

Distinctions without a difference?

I In application, not enough attention is given todistinguishing between these objectives

I This lack of distinction is because the same statisticaltools (ie. generalized regression models) can be used toaddress all of these objectives

I HOWEVER, the strategies used to address each goalshould be distinct and the interpretation of results isdependent upon the strategy employed!

Page 13: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.13

Data Analysis

Distinctions without a difference?

I Hypothesis generation requires data-driven modeling (thescientific method!) which hopefully yields a simplifiedmodel (hypothesis)

I Data-driven modeling implies that usual inferentialstatements (p-values, confidence intervals, posteriorprobabilities) are invalid due to multiple comparisons

I Hypothesis testing seeks to test a formal (typicallyparsimonious) hypothesis via valid inferential statements

I NO data-driven modeling!

I Prediction necessitates model building and data-drivenresults since simple parsimonious models (eg. only lineareffects) do not generally lead to a reliable prediction model

I Requires much stronger (generally untestable) assumptionsto make probability statements

Page 14: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.14

Hypothesis Generation

Example: Factors that influence median life expectancy in stageII breast cancer

I General goal: Want to identify factors that affect prognosisI Follow a cohort of newly diagnosed patients and measure

survival time (may be censored) along with covariates ofinterest

I What defines the cohort?I All patients from a particular healthcare plan?I All patients diagnosed at a particular hospital?I All patients in a certain location?

I “Best" estimator of the median survival (??)I Model building to identify most important factorsI Estimation of median survival for various combinations of

factors

Page 15: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.15

Hypothesis Testing

Example: Effect of serum albumin levels on risk of mortality inend-stage renal disease patients

I Randomly sample a cohort of ESRD patients and followuntil death (or study end)

I Hypothesis a first-order relationship between albumin andthe log-relative risk of death

I A priori specify additional adjustment variables: age,gender, ethnicity, duration of disease, etc.

Page 16: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.16

Prediction

Example 1: Spam filters

I Goal: Use email content to classify an email as valid orspam

I Potential factors include:

I Individual words or charactersI Message lengthI Attachments

I Likely to be a non-linear association between messagelength and the probability of spam

I May be interactions between types of words

I May be interactions between types of words, length ofmessage, and the presence of attachments

Page 17: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.17

Prediction!

"

#

$Calendar date

HR

T u

se

(p

er

1,0

00

)

Gaussian kernel: ! = 0.5

1997 1998 1999 2000 2001 2002 2003 2004

15

02

50

35

04

50

55

0

352 S. Haneuse; Biostat/Stat 572Example 2: Hormone replacement therapy use

I Hormone replacement therapy slows the decrease of bonedensity in post-menopausal women

I Observational studies also suggest decrease in CVDI Stock brokers would like to predict future use of HRTI Goal: Collect monthly HRT use over the past 7 years and

predict future use (if done in 1999...oops!)

Page 18: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.18

Data Analysis

Required steps in an inferential data analysis

1. Aims of a data analysis

I Description, exploration, confirmation, prediction

2. Establish the context of the analysis

I Statistics produces inference about a population basedupon a sample

I Need to understand the population sampled

I Understand the data collection procedure (true randomsample?)

I Understand the background science

I The scientific goals of the analysis/experiment

Page 19: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.19

Data Analysis

Required steps in an inferential data analysis

3. Develop a statistical model

I Clearly defined (measurable) outcome is essential

I Predictor(s) of interest

I If ‘we’ cannot decide which parameters would beappropriate when measurements are available on the entirepopulation then there is no chance that statistics can be ofhelp!

Page 20: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.20

Data Analysis

Required steps in an inferential data analysis

4. Evaluation of the properties of the design, model, andestimation procedure

I Essential that these aspects be addressed as completely aspossible prior to data analysis

I Clear specification of outcomes and predictors

I Use of robust statistical methods

I What is the cost of planning not to plan?

5. Computation

I Turn the handle...

Page 21: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.21

Data Analysis

Required steps in an inferential data analysis

6. Interpretation of results

I Present results clearly and precisely

I If applicable, present scientific justification for why resultsagree with the hypothesis (should have been done at thedesign stage)

I The most elegant experiment/data analysis is meaninglessunless it can be easily explained to the scientific communityit was designed to impact

Page 22: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.22

Summary

The basic distinction in strategies comes with respect to modelselection

I When making inferential statements, model selectionshould be avoided when possible

I “Know what you want to adjust for, and adjust for it."

I As long as robust statistical procedures are used, this willensure valid probability statements

I When performing data-driven modeling we need to clearlydefine our criteria for choosing the ‘best’ model and becareful with probability statements

I How to choose covariates?

I What functional form?

I What does a 95% confidence interval mean at the end ofthe day?

Page 23: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.23

Towards a general framework

Comparing distributions across subpopulations

I In many situations, the scientific question to be addressedby a statistical analysis can be viewed as comparing thedistribution of some random variable or vector (theresponse or dependent variable) across subpopulationsdefined by the values of other random variables (thepredictors or independent variables).

I Within each subpopulation, there is a distribution for theresponse variable.

I The question is whether that distribution is the same forevery subpopulation.

Page 24: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.24

Towards a general framework

General framework

I The standard methods for addressing such questionsdepends upon the type of response variable and thenumber of subpopulations being compared.

I In the case of a univariate response with independentobservations across subjects we can make a chart of thebasic type of analysis used to test hypotheses accordingto whether the response variable is binary, categorical,count data, continuous, or censored continuous.

I We can similary consider whether the number ofsubpopulations is one, two, several unordered, severalordered, or potentially infinite ordered.

Page 25: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.25

Our Setting

General framework

Response VariablePredictor Binary Categorical Discrete Continuous CensoredConstant Z-test χ2 Poisson t-test KMBinary χ2 χ2 Poisson t-test logrankCategorical χ2 χ2 log linear ANOVA k logrankDiscrete logistic polytomous Poisson linear prop hzdContinuous logistic polytomous Poisson linear prop hzd

(Note that the ‘predictor’ measuring which subpopulation asubject belongs to is, respectively, constant, binary,categorical, discrete, or continuous.)

I Proposition: Under appropriate assumptions, each of thelines in the above table can be regarded as special casesof the lines below it. (proof deferred)

Page 26: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.26

Regression modeling

Notation

I The general problem we will address is one where wehave for the i th subject:

1. measured variables Ri1,Ri2, . . . related to some quality tobe compared across the subpopulations, and

2. measured variables (factors) Pi1,Pi2, . . . related tosubpopulation membership

I We assume that analysis shall proceed by defining forsome functions fR and fPj , j = 1, . . . ,p

1. response variable Yi = fR(Ri1,Ri2, . . . )

2. predictor variables Xij = fPj (Pi1,Pi2, . . . )

Page 27: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.27

Regression modeling

Basic assumptions for methods development

I Statistical modeling will proceed by using the Xijs to modelsome functional of the distribution of the Yis.

I When presenting methodologic development, we willpresume knowledge of the form of the response andpredictors.

I When exploring the application of these methods and theirrobustness to departures from the underlying assumptionsof the theory, we will regard that we are free to explorealternative formulations of the predictors and response.

Page 28: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.28

Generalized regression

Generalized regression

I In generalized regression, we model the interrelationshipsamong subpopulations (as defined by predictor variables)with respect to the distribution of some response variable

I Unlike linear regression, we shall not always considermodelling the mean, and even when we do, we will notnecessarily model the mean as a linear function of theregression parameters

Page 29: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.29

Generalized regression

Generalized regression

I Definition: In the general regression model, we typicallyhave

I Response Y |~X ∼ F (y ; h(~X , ~β), φ) whereI ~β represents regression parameters andI φ is some nuisance parameter necessary to know the full

distribution of Y

The regression model is fully specified by providing theform of F taking arguments y , h, and φ.

If all of our parameters of interest are in ~β,(i) when φ is finite dimensional, we call the model parametric(ii) when φ is infinite dimensional, we call the model

semiparametric

If ~β does not contain all of our parameters of interest,(iii) when φ is infinite dimensional, we call the model

nonparametric

Page 30: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.30

Generalized regression

Generalized regression

I In generalized regression models, h(~X , ~β) most oftenmodels some parameter of F or some functional of thedistribution (e.g., the mean, the median, the hazard, etc.).

I It is quite often the case that we choose

h(~X , ~β) = h(~X T ~β)

that is, F depends on ~β only through some linearcombination of its elements. (Recall ~X may includearbitrary transformations of the measured factors).

Page 31: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.31

Generalized regression

Generalized linear regression

I Definition: A generalized linear regression model has

1. F an exponential family distribution with pmf or pdf

f (y ; θ, φ) = exp(

yθ − b(θ)

a(φ)+ c(y , φ)

)for some functions a(·), b(·), c(·, ·).

2. E(Y ) = µ = b′(θ) modelled by

g(µ) = η = ~X T ~β

where g(·) is termed the link function.

Page 32: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.32

Generalized regression

Generalized linear regression

I Definition: A regression model shall consist of

1. θ, a functional of the probability distribution of a responsevariable Y ,

2. η = h(~X , ~β), a predictor function based on the predictors ~Xand the regression parameters ~β. (Populations are definedas the set of individuals having the same value of η.

3. g(·), a link function describing the relationship between θand η: η = g(θ)

4. φ, a nuisance parameter vector that is required to be able tofully specify the distribution of the response variable Y inevery subpopulation. That is, the distribution of Y willdepend upon the predictor function η and the nuisanceparameters φ.

Page 33: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.33

Generalized regression

Examples of commonly used regression models

1. linear regression:

I F (y , h, φ) = Φ((y − h)/√φ) (the normal cdf)

I θ = E [Y ]

I g(·) is the identity link

I a(φ) = σ2

Page 34: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.34

Generalized regression

Examples of commonly used regression models

2. logistic regression:

I F (y , h, φ) = hy (1− h)1−y (the Bernoulli pmf)

I θ = logit(E [Y ]) (the log-odds of ‘success’)

I g(·) is the logit link

I a(φ) = 1

Page 35: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.35

Generalized regression

Examples of commonly used regression models

3. Poisson regression:

I F (y , h, φ) = e−hhy/y ! (the Poisson pmf)

I θ = log(E [Y ]) (with E [Y ] usually standardized as a rate)

I g(·) is the log link

I a(φ) = 1

Page 36: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.36

Generalized regression

Generalized linear regression

I Note: In order for a regression model to be useful, we must(at minimum) have a means of dealing with the nuisanceparameter φ and estimating the regression parameters ~β.

Page 37: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.37

A prelude to parameter estimation

Estimating equations

I Definition: In the abstract situation, if we have a function ofthe data and the unknown parameters that hasexpectation E [G(~X , β, φ)] = 0 for all ~β and φ then onepossible form of estimation is to use such a function as an(unbiased) estimating equation. That is, we will search for

the values of ~̂β and φ̂ such that G(~X , ~̂β, φ̂) = 0.

Page 38: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.38

A prelude to parameter estimation

Estimating equations

I There are many constraints that we need put on such anequation before it stands much chance of producing usefulestimates:

1. We would like the estimates produced to be unique as theinformation about the parameters goes to infinity in oursample

2. We would like the estimates to be efficient and consistent

3. We would like the estimates to be relatively easy to find

4. We would like to be able to estimate the distribution of theestimates

Page 39: Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/lecture1.pdf · Examples of generalized linear regression models A prelude to estimation 1.4 Description

Lecture 1

Stat 211 - D. Gillen

Course Syllabus

Course Outline

Goals and Steps ofData AnalysisScientific Investigation

Statistical Goals andModeling Strategies

Examples

Regression ModelingTowards a generalframework

Notation

Generalized regression

Generalized linearregression

Examples of generalizedlinear regression models

A prelude to estimation

1.39

A prelude to parameter estimation

Notes

I Note 1: Maximum likelihood leads to one such estimatingequation (the score equation), however we may chooseother estimating equations which do not necessarilycorrespond to a likelihood (more later)

I Note 2: A Bayesian estimation framework seeks toincorporate prior “information" in order to make directprobability statements regarding the regression modelparameters. Estimation will be carried out by specifying anappropriate likelihood and combing the likelihood with priordistributions on model parameters to form a posteriordistribution for the parameters