ch1. introduction - kocwcontents.kocw.net/kocw/document/2015/gachon/kimnamhyoung...ch1. introduction...
TRANSCRIPT
1.1 Categorical Response Data
β’ A categorical variable has a measurement scale consisting of a set of categories
β’ For example, political philosophy may be measured as βliberalβ, βmoderateβ, or βconservativeβ;
β’ Commonly used in the social and health sciences for measuring attitudes, opinions and responses.
β’ Behavior sciences, public health, zoology, education, marketing, engineering sciences and industrial quality control
2
Response/Explanatory Variable
β’ Response variable(dependent variable or Y variable) β’ Explanatory variable(independent variable or X variable) β’ The subject of this course is the analysis of categorical
response variables. β’ The explanatory variables can be categorical or
continuous.
3
Nominal/Ordinal Scale
β’ Categorical variables have two main types of measurement scales β Ordinal variables: ordered scales like attitude
toward something, appraisal of a companyβs inventory level, response to a medical treatment, and frequency of feeling symptoms of anxiety
β Nominal variables: unordered scales like religious affiliation, primary mode of transportation to work, favorite type of music, and favorite place to shop
4
Nominal/Ordinal Scale
β’ Methods designed for ordinal variables cannot be used with nominal variables.
β’ Methods designed for nominal variables can be used with nominal or ordinal variables, but they do not use the information about that ordering (serious loss of power)
5
Problems
β’ 1.1 In the following examples, identify the response variable and the explanatory variables. β a. Attitude toward gun control(favor, oppose),
Gender(female, male), Motherβs education(high school, college)
β b. Heart disease(yes, no), Blood pressure, Cholesterol level
β c. Race(white, nonwhite), Religion(Catholic, Jewish, Protestant), Vote for president(Democrat, Republican, Other), Annual income
β d. Marital status (married, single, divorced, widowed), Quality of life(excellent, good, fair, poor)
6
Problems
β’ 1.2 Which scale of measurement is most appropriate for the following variables βnominal, or ordinal? β a. Political party affiliation (Democrat, Republican,
unaffiliated). β b. Highest degree obtained (none, high school,
bachelorβs, masterβs, doctorate). β c. Patient condition (good, fair, serious, critical). β d. Hospital location (London, Boston, Madison,
Rochester, Toronto). β e. Favorite beverage (beer, juice, milk, soft drink,
wine, other). β f. How often feel depressed (never, occasionally,
often, always). 7
1.2 Probability Distributions for Categorical Data
β’ Key distributions for categorical data: β binomial and β multinomial distribution
8
Binomial Distribution
β’ n independent and identical trials with two possible outcomes, βsuccessβ and βfailureβ
β’ Identical trials: the probability of success is the same for each trial
β’ Independent trials: the response outcomes are independent random variables
Bernoulli trials
9
Binomial Distribution
β’ Let Y denote the number of successes out of the ππ trials with ππ, the probability of success for a given trial.
β’ The probability of outcome y for Y equals
ππ π¦π¦ =ππ!
π¦π¦! ππ β π¦π¦ !πππ¦π¦(1 β ππ)ππβπ¦π¦ ,π¦π¦ = 0,1,2, β¦ ,ππ
β’ For fixed ππ, it becomes more skewed as Ο moves toward 0 or 1
β’ For fixed ππ, it becomes more bell-shaped as ππ increases.
β’ When n is large, it can be approximated by a normal distribution with ππ = ππππ and Ο= ππππ(1 β ππ)
10
Binomial Distribution β’ Table 1.1. Binomial Dist. with ππ =10 and ππ =0.20, 0.50, and 0.80.
The distribution is symmetric when ππ =0.5 y P(y) when Ο=0.2 P(y) when Ο=0.5 P(y) when Ο=0.8
0 0.107 0.001 0.000
1 0.268 0.010 0.000
2 0.302 0.044 0.000
3 0.201 0.117 0.001
4 0.088 0.205 0.005
5 0.027 0.246 0.027
6 0.005 0.205 0.088
7 0.001 0.117 0.201
8 0.000 0.044 0.302
9 0.000 0.010 0.268
10 0.000 0.001 0.107 11
Multinomial Distribution
β’ have more than two possible outcomes. β’ Let c denote the number of outcome
categories. β’ For ππ independent observations, the
multinomial probability that ππ1 fall in category 1, ππ2 fall in category 2, β¦, ππππ fall in category c with their probabilities ππππ , where β ππππππ = 1, equals
ππ ππ1,ππ2, β¦ ,ππππ = (ππ!
ππ1!ππ2! β¦ππππ!)ππ1ππ1ππ2ππ2 β―ππππππππ
12
1.3 Statistical Inference for a Proportion
β’ In practice, the parameter values for the binomial and multinomial distributions are unknown.
β’ Using sample data, we estimate the parameters.
β’ In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model.
13
Likelihood Function
β’ The probability of the observed data, expressed as a function of the parameter, is called the likelihood function.
β’ For example, in n=10 trials, suppose a binomial count equals y=0.
β’ From the binomial formula with parameter ππ, the probability of this outcome equals
ππ π¦π¦ = 0 =10!
0! 10!ππ0(1 β ππ)10 = (1 β ππ)10
14
Maximum Likelihood Estimation(MLE)
15
β’ The maximum likelihood (ML)estimate of a parameter is the parameter value for which the probability of the observed data takes its greatest value.
Maximum Likelihood Estimation(MLE)
β’ In general, for the binomial outcome of y successes in n trials, the ML estimate of ππ equals ππ = π¦π¦/ππ (the sample proportion of successes for the n trials)
β’ The ML estimate is often denoted by the parameter symbol with a ^(a βhatβ) over it.
16
Significance Test About a Binomial Proportion
β’ The ML estimator for the parameter ππ is the sample proportion, ππ.
β’ The sampling distribution of the sample proportion ππ has mean and standard error
πΈπΈ ππ = ππ, ππ ππ = ππ(1βππ)ππ
β’ The sampling distribution of ππ is approximately normal for large n.
17
Significance Test About a Binomial Proportion
β’ Null hypothesis π»π»0: ππ = ππ0 β’ The test statistic
π§π§ =ππ β ππ0ππ0(1 β ππ0)
ππ
β’ For large samples, the null sampling distribution of the z test statistic is the standard normal.
18
Example: Survey Results on Legalizing Abortion
β’ Let ππ denote the proportion of the American adult population that responds βyesβ to the question,
β’ βPlease tell me whether or not you think it should be possible for a pregnant woman to obtain a legal abortion if she is married and does not want any more children.β
19
Example: Survey Results on Legalizing Abortion
β’ Of 893 respondents to this question, 400 replied βyesβ and 493 replied βnoβ
β’ p=400/893=0.448 β’ π»π»0: ππ = 0.50, π»π»ππ: ππ β 0.50
β’ z=(0.448 β 0.50)/ 0.50 0.50893
= β3.1
β’ The two-sided P-value is 0.002
20
Confidence Intervals for a Binomial Proportion
β’ 100(1-πΌπΌ)% confidence interval for ππ ππ Β± π§π§πΌπΌ
2πππΈπΈ ,π€π€π€π€π€π€π€ πππΈπΈ = ππ(1 β ππ)/ππ
β’ where π§π§πΌπΌ2 denotes the standard normal
percentile having right-tail probability equal to πΌπΌ
2
β’ Unless ππ is close to 0.50, however, it does not work well unless n is very large.
21
Confidence Intervals for a Binomial Proportion
β’ A better way to construct confidence intervals uses a duality with significance tests.
β’ For given p and n, the ππ0 values that have test statistic value π§π§πΌπΌ
2 are the solutions to
the equation |ππ β ππ0|
ππ0(1 β ππ0)/ππ= π§π§πΌπΌ
2
for ππ0.
22