distributions. what is a “distribution”? one distribution for a continuous variable. each youth...

25
DISTRIBUTIONS

Upload: frederick-wilcox

Post on 01-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

DISTRIBUTIONS

Page 2: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

What is a “distribution”?

One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number of youth homicide victims each month.

Two distributions, each for a single continuous variable: violent crimes and commitments to prison.

Each violent crime is a case. The variable is their number each year per 100,000 population

Each commitment to prison is a case. The variable is the number of commitments each year per 100,000 population

One distribution for TWO categorical variables:

Youth’s demeanor (two categories)

Officer disposition (four categories)

Each police encounter with a youth is a case.

An arrangement of cases in a sample or population according to their values or scores on one or more variables

(A case is a single unit that “contains” all the variables of interest)

Distributions can be depicted visually. How that is done depends on how many variables and their type (whether categorical or continuous).

Off

icer

’s d

ispo

siti

on

Page 3: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

DEPICTING THE DISTRIBUTION OF CATEGORICAL VARIABLES

Page 4: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Depicting distribution of a categorical variable: the bar graphDistributions depict the

frequency (number of cases) at each value of a variable. Here there is one variable with two values: gender (M/F).

A case is a single unit that “contains” all the variables of interest.Here each student is a case

Frequency means the number of cases – students – at a single value of a variable. Frequencies are always on the Y axis

Values of the variable are always on the X axis

Distributions illustrate how cases cluster or spread out according to the value or score of the variable. Herethe proportions of men and women seem about equal.

n=15

n=17

Y -

axis

X - axis

Value or score of variable

How

man

y at

eac

h v

alu

e/sc

ore

Bars are “made up of” cases. Here that means

students, arranged by the variable gender

N = 32

Page 5: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Using a table to display the distribution oftwo categorical variables

Val

ue

or s

core

of

vari

able

Number of cases at each value/score

Off

icer

’s d

ispo

siti

on

“cells”

Value or score of variable

Page 6: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

DEPICTING THE DISTRIBUTION OF CONTINUOUS VARIABLES

Page 7: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Depicting the distribution of continuous variables: the histogram

X - axis

Value or score of variable

How

man

y at

eac

h v

alu

e/sc

ore

Distributions depict the frequency (number of cases) at each value of a variable. Here there is one variable: age, measured on a scale of 20-33.

A case is a single unit that “contains” all the variables of interest.Here each student is a case

Frequency means the number of cases – students – at a single value of a variable. Frequencies are always on the Y axis

Values of the variable are always on the X axis

What is the area under the trend line “made up of”? Cases, meaning students (arranged by age)

Trend line

Y -

axis

Page 8: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Y -

axis

X - axis

Value or score of variable

How

man

y at

eac

h v

alu

e/sc

ore

Sometimes, bar graphs are used forcontinuous variables

What are the bars “made of”? Cases, meaning homicides (arranged by the variable homicides per year)

Page 9: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Continuous variables: What “makes up”the areas under the trend lines?

Each violent crime is one “case”Variable: # crimes per 100,000 population each year

Each commitment to prison is one “case”Variable: # commitments to prison, per 100,000 population, each year

Value or score of variable

How

man

y at

eac

h v

alu

e/sc

ore

Value or score of variable

How

man

y at

eac

h v

alu

e/sc

ore

Trend line

Trend line

Trend line

Cases, that’s what!Each murdered youth is one “case”Variable: # youths murdered each month

Page 10: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

CATEGORICAL VARIABLESSummarizing the distribution of

Page 11: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Summarizing the distribution of categorical variables using percentage

• Instead of using graphs or a lot of words, is there a single statistic that can convey what a distribution “looks like”?

• Percentage is a “statistic.” It’s a proportion with a denominator of 100.

• Percentages are used to summarize categorical data

– 70 percent of students are employed; 60 percent of parolees recidivate

• Since per cent means per 100, any decimal can be converted to a percentage by multiplying it by 100 (moving the decimal point two places to the right)

– .20 = .20 X 100 = 20 percent (twenty per hundred)

– .368 = .368 X 100 = 36.8 percent (thirty-six point eight per hundred)

• When converting, remember that there can be fractions of one percent

– .0020 = .0020 X 100 = .20 percent (two tenths of one percent)

• To obtain a percentage for a category, divide the number of cases in the category by the total number of cases in the sample

50,000 persons were asked whether crime is a serious problem: 32,700 said “yes.” What percentage said “yes”?

Page 12: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Using percentages tocompare datasets

• Percentages are “normalized” numbers (e.g., per 100), so they can be used tocompare datasets of different size

– Last year, 10,000 people were polled. Eight-thousand said crime is a seriousproblem

– This year 12,000 people were polled. Nine-thousand said crime is aserious problem.

Calculate the second percentage and compare it to the first

Page 13: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Class 1 Class 2

Draw two bar graphs, one for each class, depicting proportions for gender

Practical exercise

Page 14: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Calculating increases in percentage

2 times 3 timeslarger (2X) larger (3X)

200% 100% Original larger larger

Increases in percentage are computed off the base amount

Example: Jail with 120 prisoners. How many prisoners will there be...

…with a 100 percent increase?– 100 percent of the base amount, 120, is 120

(120 X 100/100)– 120 base + 120 increase = 240

(2 times the base amount) …with a 150 percent increase?

– 150 percent of 120 is 180 (120 X 150/100)– 120 base plus 180 increase = 300

(2½ times the base amount)

How many will there be with a 200 percent increase?

Page 15: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Percentage changes can mislead• Answer to preceding slide – prison with 120 prisoners

200 percent increase

200 percent of 120 is 240 (120 X 200/100)

120 base plus 240 = 360 (3 times the base amount)

• Percentages can make changes seem large when bases are small

Example: Increase from 1 to 3 convictions is 200 (two-hundred) percent

3-1 = 2

2/base = 2/1 = 2

2 X 100 = 200%

• Percentages can make changes seem small when bases are large

Example: Increase from 5,000 to 6,000 convictions is 20 (twenty) percent

6,000 - 5,000 = 1,0001,000/base = 1000/5,000 = .20 = 20%

Page 16: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

CONTINUOUS VARIABLESSummarizing the distribution of

Page 17: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Four summary statistics forcontinuous variables

• Continuous variables – review– Can take on an infinite number of

values (e.g., age, height, weight, sentence length)

– Precise differences between cases– Equivalent differences: Distances

between 15-20 years same as 60-70 years

• Summary statistics for continuous variables– Mean: arithmetic average of scores– Median: midpoint of scores (half

higher, half lower)– Mode: most frequent score (or scores,

if tied)– Range: Difference between low and

high scores

3.5

1.3

Page 18: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Summarizing the distributionof continuous variables - the mean

• Arithmetic average of scores– Add up all the scores– Divide the result by the number of scores

• Example: Compare numbers of arrests for twenty police precincts during a certain shift

• Method: Use mean to summarize arrests at each precinct, then compare the means

Mean 3.0 Mean 3.5

arrests arrests

Variable: number of arrestsUnit of analysis: police precinctsCase: one precinct

Issue: Means are pulled in the directionof extreme scores, possibly misleadingthe comparison

Page 19: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Transforming categorical/ordinal variables into continuous variables, then using the mean

• Ordinal variables are categorical variables with an inherent order– Small, medium, large– Cooperative, uncooperative

• Can summarize in the ordinary way: proportions / percentages

• Can also transform them into continuous variables by assigning categories points on a scale, then calculating a mean

• Not always recommended because “distances” between points on scalemay not be equal, causing misleadingresults

• Is the distance between “Admonished” and “Informal” same as between “Informal and Citation”? “Citation” and “Arrest”?

Value

Severity of Disposition

Youths

Freq. %

4 Arrested 16 24

3Citationor officialreprimand 9 14

2Informalreprimand

16 24

1Admonished& released

25 38

Total (N) 66 100

Severity of disposition mean = 2.24(25 X 1) + (16 X 2) + (9 X 3) + (16 X 4) / 66

Page 20: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6

Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21

Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21

Compute...

3 + 3 / 2 = 3

arrests

Summarizing the distributionof continuous variables - the median

• Median can be used withcontinuous or ordinal variables

• Median is a useful summarystatistic when there are extremescores, making the mean misleading

• In this example, which is identicalto the preceding page except forone outlier (16), the mean is 3.5 – .5 higher

• But the medians (3.0) are the same

Page 21: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

• Score that occurs most often (with the greatest frequency)

• Here the mode is 3

• Modes are a useful summarystatistic when cases cluster at particular scores – aninteresting condition thatmight otherwise be overlooked

• Symmetrical distributions, like thisone, are called “normal” distributions. In suchdistributions the mean, mode and median arethe same. Near-normal distributions are common.

• There can be more than one mode (bi-modal, tri-modal, etc.). Identify the modes:

• Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21

• Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21

arrests

Summarizing the distributionof continuous variables - the mode

Page 22: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

• Answers to preceding side

Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21Mode = 5 (unimodal)

Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Modes = 5, 21 (bimodal)

• Range: a simple way to convey the distribution of a continuous variable

–Depicts the lowest and highest scores in a distribution2, 3, 5, 5, 8, 12, 17, 19, 21 – range is “2 to 21”

–Range can also be defined as the difference between the scores(21-2 = 19). If so, minimum and maximum scores should also be given.

–Useful to cite range if there are outliers (extreme scores) that misleadingly distort the shape of the distribution

A final way to depict the distributionof continuous variables - the range

Page 23: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Practical exercise

• Calculate your class summary statistics for age and height – mean, median, mode and range

• Pictorially depict the distributions for age and height, placing the variables and frequencies on the correct axes

Case no.

Page 24: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

Next week – Every week:Without fail – bring an approved calculator – the same one you will use for the exam.

It must be a basic calculator with a square root key. NOT a scientific or graphing calculator. NOT a cell phone, etc.

Page 25: DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number

CaseNo.

 Income

No. of arrests

 Gender

1 15600 4 M

2 21380 3 F

3 17220 5 F

4 18765 2 M

5 23220 1 F

6 44500 0 M

7 34255 0 F

8 21620 0 F

9 14890 1 M

10 16650 2 F

11 44500 1 F

12 16730 3 M

13 23980 3 F

14 14005 0 F

15 21550 2 M

16 26780 4 M

17 18050 1 F

18 34500 1 M

19 33785 3 F

20 21450 2 F

HOMEWORK(link on weekly schedule)

1. Calculate all appropriate summary statistics for each distribution

2. Pictorially depict the distribution of arrests

3. Pictorially depict the distribution of gender