ch1. introduction - kocwcontents.kocw.net/kocw/document/2015/gachon/kimnamhyoung...ch1. introduction...

22
Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University [email protected] 1

Upload: others

Post on 04-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Ch1. Introduction

Namhyoung Kim

Dept. of Applied Statistics

Gachon University

[email protected]

1

Page 2: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

1.1 Categorical Response Data

β€’ A categorical variable has a measurement scale consisting of a set of categories

β€’ For example, political philosophy may be measured as β€œliberal”, β€œmoderate”, or β€œconservative”;

β€’ Commonly used in the social and health sciences for measuring attitudes, opinions and responses.

β€’ Behavior sciences, public health, zoology, education, marketing, engineering sciences and industrial quality control

2

Page 3: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Response/Explanatory Variable

β€’ Response variable(dependent variable or Y variable) β€’ Explanatory variable(independent variable or X variable) β€’ The subject of this course is the analysis of categorical

response variables. β€’ The explanatory variables can be categorical or

continuous.

3

Page 4: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Nominal/Ordinal Scale

β€’ Categorical variables have two main types of measurement scales – Ordinal variables: ordered scales like attitude

toward something, appraisal of a company’s inventory level, response to a medical treatment, and frequency of feeling symptoms of anxiety

– Nominal variables: unordered scales like religious affiliation, primary mode of transportation to work, favorite type of music, and favorite place to shop

4

Page 5: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Nominal/Ordinal Scale

β€’ Methods designed for ordinal variables cannot be used with nominal variables.

β€’ Methods designed for nominal variables can be used with nominal or ordinal variables, but they do not use the information about that ordering (serious loss of power)

5

Page 6: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Problems

β€’ 1.1 In the following examples, identify the response variable and the explanatory variables. – a. Attitude toward gun control(favor, oppose),

Gender(female, male), Mother’s education(high school, college)

– b. Heart disease(yes, no), Blood pressure, Cholesterol level

– c. Race(white, nonwhite), Religion(Catholic, Jewish, Protestant), Vote for president(Democrat, Republican, Other), Annual income

– d. Marital status (married, single, divorced, widowed), Quality of life(excellent, good, fair, poor)

6

Page 7: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Problems

β€’ 1.2 Which scale of measurement is most appropriate for the following variables –nominal, or ordinal? – a. Political party affiliation (Democrat, Republican,

unaffiliated). – b. Highest degree obtained (none, high school,

bachelor’s, master’s, doctorate). – c. Patient condition (good, fair, serious, critical). – d. Hospital location (London, Boston, Madison,

Rochester, Toronto). – e. Favorite beverage (beer, juice, milk, soft drink,

wine, other). – f. How often feel depressed (never, occasionally,

often, always). 7

Page 8: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

1.2 Probability Distributions for Categorical Data

β€’ Key distributions for categorical data: – binomial and – multinomial distribution

8

Page 9: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Binomial Distribution

β€’ n independent and identical trials with two possible outcomes, β€œsuccess” and β€œfailure”

β€’ Identical trials: the probability of success is the same for each trial

β€’ Independent trials: the response outcomes are independent random variables

Bernoulli trials

9

Page 10: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Binomial Distribution

β€’ Let Y denote the number of successes out of the 𝑛𝑛 trials with πœ‹πœ‹, the probability of success for a given trial.

β€’ The probability of outcome y for Y equals

𝑃𝑃 𝑦𝑦 =𝑛𝑛!

𝑦𝑦! 𝑛𝑛 βˆ’ 𝑦𝑦 !πœ‹πœ‹π‘¦π‘¦(1 βˆ’ πœ‹πœ‹)π‘›π‘›βˆ’π‘¦π‘¦ ,𝑦𝑦 = 0,1,2, … ,𝑛𝑛

β€’ For fixed 𝑛𝑛, it becomes more skewed as Ο€ moves toward 0 or 1

β€’ For fixed πœ‹πœ‹, it becomes more bell-shaped as 𝑛𝑛 increases.

β€’ When n is large, it can be approximated by a normal distribution with πœ‡πœ‡ = π‘›π‘›πœ‹πœ‹ and Οƒ= π‘›π‘›πœ‹πœ‹(1 βˆ’ πœ‹πœ‹)

10

Page 11: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Binomial Distribution β€’ Table 1.1. Binomial Dist. with 𝑛𝑛 =10 and πœ‹πœ‹ =0.20, 0.50, and 0.80.

The distribution is symmetric when πœ‹πœ‹ =0.5 y P(y) when Ο€=0.2 P(y) when Ο€=0.5 P(y) when Ο€=0.8

0 0.107 0.001 0.000

1 0.268 0.010 0.000

2 0.302 0.044 0.000

3 0.201 0.117 0.001

4 0.088 0.205 0.005

5 0.027 0.246 0.027

6 0.005 0.205 0.088

7 0.001 0.117 0.201

8 0.000 0.044 0.302

9 0.000 0.010 0.268

10 0.000 0.001 0.107 11

Page 12: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Multinomial Distribution

β€’ have more than two possible outcomes. β€’ Let c denote the number of outcome

categories. β€’ For 𝑛𝑛 independent observations, the

multinomial probability that 𝑛𝑛1 fall in category 1, 𝑛𝑛2 fall in category 2, …, 𝑛𝑛𝑐𝑐 fall in category c with their probabilities πœ‹πœ‹π‘—π‘— , where βˆ‘ πœ‹πœ‹π‘—π‘—π‘—π‘— = 1, equals

𝑃𝑃 𝑛𝑛1,𝑛𝑛2, … ,𝑛𝑛𝑐𝑐 = (𝑛𝑛!

𝑛𝑛1!𝑛𝑛2! …𝑛𝑛𝑐𝑐!)πœ‹πœ‹1𝑛𝑛1πœ‹πœ‹2𝑛𝑛2 β‹―πœ‹πœ‹π‘π‘π‘›π‘›π‘π‘

12

Page 13: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

1.3 Statistical Inference for a Proportion

β€’ In practice, the parameter values for the binomial and multinomial distributions are unknown.

β€’ Using sample data, we estimate the parameters.

β€’ In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model.

13

Page 14: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Likelihood Function

β€’ The probability of the observed data, expressed as a function of the parameter, is called the likelihood function.

β€’ For example, in n=10 trials, suppose a binomial count equals y=0.

β€’ From the binomial formula with parameter πœ‹πœ‹, the probability of this outcome equals

𝑃𝑃 𝑦𝑦 = 0 =10!

0! 10!πœ‹πœ‹0(1 βˆ’ πœ‹πœ‹)10 = (1 βˆ’ πœ‹πœ‹)10

14

Page 15: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Maximum Likelihood Estimation(MLE)

15

β€’ The maximum likelihood (ML)estimate of a parameter is the parameter value for which the probability of the observed data takes its greatest value.

Page 16: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Maximum Likelihood Estimation(MLE)

β€’ In general, for the binomial outcome of y successes in n trials, the ML estimate of πœ‹πœ‹ equals 𝑝𝑝 = 𝑦𝑦/𝑛𝑛 (the sample proportion of successes for the n trials)

β€’ The ML estimate is often denoted by the parameter symbol with a ^(a β€œhat”) over it.

16

Page 17: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Significance Test About a Binomial Proportion

β€’ The ML estimator for the parameter πœ‹πœ‹ is the sample proportion, 𝑝𝑝.

β€’ The sampling distribution of the sample proportion 𝑝𝑝 has mean and standard error

𝐸𝐸 𝑝𝑝 = πœ‹πœ‹, 𝜎𝜎 𝑝𝑝 = πœ‹πœ‹(1βˆ’πœ‹πœ‹)𝑛𝑛

β€’ The sampling distribution of 𝑝𝑝 is approximately normal for large n.

17

Page 18: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Significance Test About a Binomial Proportion

β€’ Null hypothesis 𝐻𝐻0: πœ‹πœ‹ = πœ‹πœ‹0 β€’ The test statistic

𝑧𝑧 =𝑝𝑝 βˆ’ πœ‹πœ‹0πœ‹πœ‹0(1 βˆ’ πœ‹πœ‹0)

𝑛𝑛

β€’ For large samples, the null sampling distribution of the z test statistic is the standard normal.

18

Page 19: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Example: Survey Results on Legalizing Abortion

β€’ Let πœ‹πœ‹ denote the proportion of the American adult population that responds β€œyes” to the question,

β€’ β€œPlease tell me whether or not you think it should be possible for a pregnant woman to obtain a legal abortion if she is married and does not want any more children.”

19

Page 20: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Example: Survey Results on Legalizing Abortion

β€’ Of 893 respondents to this question, 400 replied β€œyes” and 493 replied β€œno”

β€’ p=400/893=0.448 β€’ 𝐻𝐻0: πœ‹πœ‹ = 0.50, π»π»π‘Žπ‘Ž: πœ‹πœ‹ β‰  0.50

β€’ z=(0.448 βˆ’ 0.50)/ 0.50 0.50893

= βˆ’3.1

β€’ The two-sided P-value is 0.002

20

Page 21: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Confidence Intervals for a Binomial Proportion

β€’ 100(1-𝛼𝛼)% confidence interval for πœ‹πœ‹ 𝑝𝑝 Β± 𝑧𝑧𝛼𝛼

2𝑆𝑆𝐸𝐸 ,𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑆𝑆𝐸𝐸 = 𝑝𝑝(1 βˆ’ 𝑝𝑝)/𝑛𝑛

β€’ where 𝑧𝑧𝛼𝛼2 denotes the standard normal

percentile having right-tail probability equal to 𝛼𝛼

2

β€’ Unless πœ‹πœ‹ is close to 0.50, however, it does not work well unless n is very large.

21

Page 22: Ch1. Introduction - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung...Ch1. Introduction Namhyoung Kim Dept. of Applied Statistics Gachon University nhkim@gachon.ac.kr

Confidence Intervals for a Binomial Proportion

β€’ A better way to construct confidence intervals uses a duality with significance tests.

β€’ For given p and n, the πœ‹πœ‹0 values that have test statistic value 𝑧𝑧𝛼𝛼

2 are the solutions to

the equation |𝑝𝑝 βˆ’ πœ‹πœ‹0|

πœ‹πœ‹0(1 βˆ’ πœ‹πœ‹0)/𝑛𝑛= 𝑧𝑧𝛼𝛼

2

for πœ‹πœ‹0.

22