c hapter 4: m ore on t wo v ariable d ata section 4.3 – relations in categorical data

10
CHAPTER 4: MORE ON TWO VARIABLE DATA Section 4.3 – Relations in Categorical Data

Upload: omari-melson

Post on 31-Mar-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data

CHAPTER 4: MORE ON TWO VARIABLE DATA Section 4.3 – Relations in Categorical Data

Page 2: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data

To analyze categorical data we use counts or percents of individuals that fall into various categories.

Two way tables contain two categorical variables, one represented in the rows and one in the columns.

ANALYZING CATEGORICAL DATA

Row variableColumn variable

Page 3: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data

Marginal distributions – The totals of each row and column that appear in the margins are referred to the marginal distributions.

Roundoff error – The difference between actual and the marginal distributions due to rounding of the sums. In the table the total of “35-54” age group doesn’t match the total list. This is

because the table is in thousands of persons and each is rounded to the nearest thousand.

DEFINITIONS

Marginal distributions

Page 4: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data

MARGINAL DISTRIBUTION Percents are often more informative than counts

You can represent marginal distributions using percents and a bar graph

Example Suppose you want to display the distribution of years of

schooling completed among people aged 25 years or older

Education Did not finish HS

Completed HS

1-3 years of college

≥4 years of college

Percent 15.9 33.1 25.4 25.6

27,853175,230

×100

Page 5: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data
Page 6: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data

EXAMPLE 1 – HOW COMMON IS COLLEGE?

Create a bar graph that compares the percents of the three age groups who have completed 4 or more years of college.

Page 7: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data

CONDITIONAL DISTRIBUTIONS In the last example we only compared the percents of

people who finished college (≥4 years) If you look back you can see that it’s simply a

distribution of percents. The percents don’t add up to 100%.

If we were to take the age group 25- 34-year olds, the percents would add up to 100% because all 25- to 34-year-olds would fall into one of the educational categories. These four percents all together would be a conditional distribution.

Conditional distribution – The total counts or percents of a given categorical variable.

Education for 25- 34-year-olds

Did not finish HS

Completed HS 1-3 years of college

≥4 years of college

Percent 11.8 30.6 28.3 29.3

Page 8: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data

EXAMPLE 2 – CONDITIONAL DISTRIBUTION

Find the conditional distribution of years of school completed among people age 55 and over.

Education for 55 years and over

Did not finish HS

Completed HS 1-3 years of college

≥4 years of college

Percent 25.4 35.8 19.9 18.9

Page 9: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data

READING A TWO-WAY TABLE GENERATED FROM SOFTWARE

Page 10: C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Section 4.3 – Relations in Categorical Data

Homework: p.245- #’s 53, 54, 59, 62-64, & 68