statistical measures categorical data
DESCRIPTION
TRANSCRIPT
Statistical Measures
Working with Categorical Data
Data
• In statistics we work with data• Statistical data comes in the form of tables• Each row of a table is called a case or a
subject• Each column of a table is a characteristic or
attribute of the case or subject• Because the columns potentially hold
different information for each row, they are also called variables
Data on Titanic Passengers
Survived Age Sex Class
Dead Adult Male Third
Dead Adult Male Crew
Dead Adult Male Third
Dead Adult Male Crew
Dead Adult Male Crew
Dead Adult Male Crew
Alive Adult Female First
Dead Adult Male Third
Dead Adult Male Crew
Each row refers to one passenger (a case)
Each column refers to a characteristic of the case. It is called a variable, because it can hold different values for different cases.
In this table, all of the variables are categorical or qualitative. This means that they are labels and their values are usually words or numbers that are not used in computations, like a zip code or a phone number or a social security number.
rows are horizontalco
lum
ns a
re v
ertic
al
Frequency Table & Bar ChartClass Frequency
First 325
Second 285
Third 706
Crew 885
This frequency table comes from counting the occurrence of each individual value of the Class variable from the table in the previous slide.
Relative Frequency Table & Pie Chart
ClassRel.
Frequency (%)
First 14.8
Second 12.9
Third 32.1
Crew 40.2
The degree measure of each slice of the pie is calculated by multiplying the percent (in decimal form) times 360.
Contingency Table
First Second Third Crew Total
Alive 202 118 178 212 710
% of row 28.5% 16.6% 25.1% 29.9% 100%
% of column 62.2% 41.4% 25.2% 24% 32.3%
Dead 123 167 528 673 1491
% of row 8.5% 11.2% 35.4% 45.1% 100%
% of column 37.8% 58.6% 74.8% 76.0% 67.7%
Total 325 285 706 885 2201
% of row 14.8% 12.9% 32.1% 40.2% 100%
This table is called a Contingency Table because it shows how the individuals are distributed along each variable, contingent (based) on the value of the other variable. It is also called a two-way table of categorical data.
conditional distribution of class for surviving passengers
marginal distribution of passenger class
conditional distribution of survival status for first class passengers
marginal distribution of survival status
conditional distribution = one row or on column marginal distribution = total row
or on column
Questions
• What percent of 1st class passengers survived?• What percent of survivors were 2nd class
passengers?• What percent of all passengers were crew
members?• What percent of all passengers died?• What percent of those who died were crew
members?
Conditional and Marginal Distributions
• List the conditional distribution of survival status for crew members
• List the conditional relative frequency distribution of passenger class for passengers who died
• List the marginal distribution of passenger class
• List the marginal distribution of survival status
Segmented (Stacked) Bar Chart
Segmented Bar Chart of Survival Status by Passenger Class
Segmented (Stacked) Bar Chart
Segmented Bar Chart of Passenger Class by Survival Status
Independence
The colors of M&Ms are independent of whether bags are opened by boys or girls
Survival status is not independent of passenger class. Survival status and passenger class are dependent.
Why?
• Even if two variables are dependent, we cannot assume a causal relationship
• Just knowing that there is an association between two variables, we may not be able to tell why it exists
Assignment
• You are to find actual data with at least two categorical variables
• You might be able to find data online, in a magazine or newspaper (e.g. Consumer Reports, USA Today, etc.)
• Bring your data with you to class on Thursday• If you have any problems, see me tomorrow• Don’t wait until Thursday!
Kevin’s ProjectName Country Education
Bryan Bickell Canada Juniors
Dave Bolland Canada Juniors
Troy Brouwer Canada Juniors
Jake Dowell USA College
Marian Hossa Europe Juniors
Ryan Johnson Canada College
Patrick Kane USA Juniors
Tomas Kopecky Europe Juniors
Fernando Pisani Europe College
… … …
Source: Chicago Blackhawks Web Site
Country
Country Count %
Canada 14 60.9%
Europe 5 21.7%
USA 4 17.4%
Total 23 100%
Education
Education Count %
College 13 56.5%
Juniors 10 43.5%
Total 23 100%
Country and EducationJuniors College Total
USA 1 3 4% of row 25% 75% 100%
% of column 7.7% 30% 17.4%
Canada 7 7 14% of row 50% 50% 100%
% of column 53.8% 70% 60.9%
Europe 5 0 5% of row 100% 0% 100%
% of column 38.5% 0% 21.7%
Total 13 10 23% of row 56.5% 43.5% 100%