statistical measures categorical data

17
Statistical Measures Working with Categorical Data

Upload: jaflint718

Post on 29-Nov-2014

723 views

Category:

Education


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Statistical measures   categorical data

Statistical Measures

Working with Categorical Data

Page 2: Statistical measures   categorical data

Data

• In statistics we work with data• Statistical data comes in the form of tables• Each row of a table is called a case or a

subject• Each column of a table is a characteristic or

attribute of the case or subject• Because the columns potentially hold

different information for each row, they are also called variables

Page 3: Statistical measures   categorical data

Data on Titanic Passengers

Survived Age Sex Class

Dead Adult Male Third

Dead Adult Male Crew

Dead Adult Male Third

Dead Adult Male Crew

Dead Adult Male Crew

Dead Adult Male Crew

Alive Adult Female First

Dead Adult Male Third

Dead Adult Male Crew

Each row refers to one passenger (a case)

Each column refers to a characteristic of the case. It is called a variable, because it can hold different values for different cases.

In this table, all of the variables are categorical or qualitative. This means that they are labels and their values are usually words or numbers that are not used in computations, like a zip code or a phone number or a social security number.

rows are horizontalco

lum

ns a

re v

ertic

al

Page 4: Statistical measures   categorical data

Frequency Table & Bar ChartClass Frequency

First 325

Second 285

Third 706

Crew 885

This frequency table comes from counting the occurrence of each individual value of the Class variable from the table in the previous slide.

Page 5: Statistical measures   categorical data

Relative Frequency Table & Pie Chart

ClassRel.

Frequency (%)

First 14.8

Second 12.9

Third 32.1

Crew 40.2

The degree measure of each slice of the pie is calculated by multiplying the percent (in decimal form) times 360.

Page 6: Statistical measures   categorical data

Contingency Table

First Second Third Crew Total

Alive 202 118 178 212 710

% of row 28.5% 16.6% 25.1% 29.9% 100%

% of column 62.2% 41.4% 25.2% 24% 32.3%

Dead 123 167 528 673 1491

% of row 8.5% 11.2% 35.4% 45.1% 100%

% of column 37.8% 58.6% 74.8% 76.0% 67.7%

Total 325 285 706 885 2201

% of row 14.8% 12.9% 32.1% 40.2% 100%

This table is called a Contingency Table because it shows how the individuals are distributed along each variable, contingent (based) on the value of the other variable. It is also called a two-way table of categorical data.

conditional distribution of class for surviving passengers

marginal distribution of passenger class

conditional distribution of survival status for first class passengers

marginal distribution of survival status

conditional distribution = one row or on column marginal distribution = total row

or on column

Page 7: Statistical measures   categorical data

Questions

• What percent of 1st class passengers survived?• What percent of survivors were 2nd class

passengers?• What percent of all passengers were crew

members?• What percent of all passengers died?• What percent of those who died were crew

members?

Page 8: Statistical measures   categorical data

Conditional and Marginal Distributions

• List the conditional distribution of survival status for crew members

• List the conditional relative frequency distribution of passenger class for passengers who died

• List the marginal distribution of passenger class

• List the marginal distribution of survival status

Page 9: Statistical measures   categorical data

Segmented (Stacked) Bar Chart

Segmented Bar Chart of Survival Status by Passenger Class

Page 10: Statistical measures   categorical data

Segmented (Stacked) Bar Chart

Segmented Bar Chart of Passenger Class by Survival Status

Page 11: Statistical measures   categorical data

Independence

The colors of M&Ms are independent of whether bags are opened by boys or girls

Survival status is not independent of passenger class. Survival status and passenger class are dependent.

Page 12: Statistical measures   categorical data

Why?

• Even if two variables are dependent, we cannot assume a causal relationship

• Just knowing that there is an association between two variables, we may not be able to tell why it exists

Page 13: Statistical measures   categorical data

Assignment

• You are to find actual data with at least two categorical variables

• You might be able to find data online, in a magazine or newspaper (e.g. Consumer Reports, USA Today, etc.)

• Bring your data with you to class on Thursday• If you have any problems, see me tomorrow• Don’t wait until Thursday!

Page 14: Statistical measures   categorical data

Kevin’s ProjectName Country Education

Bryan Bickell Canada Juniors

Dave Bolland Canada Juniors

Troy Brouwer Canada Juniors

Jake Dowell USA College

Marian Hossa Europe Juniors

Ryan Johnson Canada College

Patrick Kane USA Juniors

Tomas Kopecky Europe Juniors

Fernando Pisani Europe College

… … …

Source: Chicago Blackhawks Web Site

Page 15: Statistical measures   categorical data

Country

Country Count %

Canada 14 60.9%

Europe 5 21.7%

USA 4 17.4%

Total 23 100%

Page 16: Statistical measures   categorical data

Education

Education Count %

College 13 56.5%

Juniors 10 43.5%

Total 23 100%

Page 17: Statistical measures   categorical data

Country and EducationJuniors College Total

USA 1 3 4% of row 25% 75% 100%

% of column 7.7% 30% 17.4%

Canada 7 7 14% of row 50% 50% 100%

% of column 53.8% 70% 60.9%

Europe 5 0 5% of row 100% 0% 100%

% of column 38.5% 0% 21.7%

Total 13 10 23% of row 56.5% 43.5% 100%