distributions of variables
DESCRIPTION
Distributions of Variables. Properties of Variables Nominal Data & Bar Charts Ordinal Data Interval & Ratio Data, Histograms & Frequency Distributions Cumulative Frequency Distributions & Percentile Ranks. - PowerPoint PPT PresentationTRANSCRIPT
Anthony J Greene 1
Distributions of Variables
I. Properties of Variables
II. Nominal Data & Bar Charts
III. Ordinal Data
IV. Interval & Ratio Data, Histograms & Frequency Distributions
V. Cumulative Frequency Distributions & Percentile Ranks
Anthony J Greene 2
Variables
Variable: A characteristic that takes on multiple values. I.e.,varies from one person or thing to another.
Anthony J Greene 3
Variables
Cause and Effect
• The Independent Variable
• The Dependent Variable
Anthony J Greene 4
Distributions
• The distribution of population data is called the population distribution or the distribution of the variable.
• The distribution of sample data is called a sample distribution.
Anthony J Greene 5
Variables
Anthony J Greene 6
VariablesKinds of Variables (any of which can be an independent
or dependent variable)
• Qualitative variable: A nonnumerically valued variable.
• Quantitative variable: A numerically valued variable.
• Discrete Variable: A quantitative variable whose possible values form a finite (or countably infinite) set of numbers.
• Continuous variable: A quantitative variable whose possible values form some interval of numbers.
Anthony J Greene 7
Quantitative Variables
• Discrete data: Data obtained by observing values of a discrete variable.
• Continuous data: Data obtained by observing values of a continuous variable.
Anthony J Greene 8
The Four Scales
• Nominal: Categories• Ordinal: Sequence• Interval: Mathematical Scale w/o a true zero• Ratio: Mathematical Scale with a true zero
Anthony J Greene 9
The Four Scales
• Nominal: Classes or Categories. Also called a Categorical scale. E.g., Catholic, Methodist, Jewish, Hindu, Buddhist, …
Qualitative Data
Anthony J Greene 10
The Four Scales
• Ordinal: Sequential Categories. e.g., 1st, 2nd, 3rd, … with no indication of the distance between classes
Discrete Data
Anthony J Greene 11
The Four Scales
• Interval: Data where equal spacing in the variable corresponds to equal spacing in the scale. E.g., 1940s, 1950s, 1960s… : or SAT Scores.
Discrete or Continuous
Anthony J Greene 12
The Four Scales
• Ratio: An interval scale with a mathematically meaningful zero. e.g., latencies of 1252 ms, 1856 ms, ….: mg of Prozac
Discrete or Continuous
Anthony J Greene 13
The Four Scales
Nominal: No mathematical operations
Ordinal: <, >, =
Interval: +, -, and ordinal operations
Ratio: , , and interval operations
Anthony J Greene 14
Nominal Variables
• Classes: Categories for grouping data. • Frequency: The number of observations that
fall in a class.• Frequency distribution: A listing of all
classes along with their frequencies.• Relative frequency: The ratio of the
frequency of a class to the total number of observations.
• Relative-frequency distribution: A listing of all classes along with their relative frequencies.
Anthony J Greene 15
Frequencies of Nominal Variables
Anthony J Greene 16
Sample Pie Charts and Bar Charts of Nominal Data
44%
23%
33%
DemocraticRepublicanOther
0
5
10
15
20
25
30
35
40
45
50
Republican Democratic Other
Anthony J Greene 17
Frequency Bar Charts
• Frequency bar chart: A graph that displays the independent variable on the horizontal axis -- categories -- and the frequencies -- dependent variable -- on the vertical axis. The frequency is represented by a vertical bar whose height is equal to the frequency of cases that fall within a given class of the I.V.
Anthony J Greene 18
Frequency Charts of Nominal Data
90
180
130
Democratic
RepublicanOther
0
20
40
60
80
100
120
140
160
180
200
Republican Democratic Other
Anthony J Greene 19
Relative Frequency Bar Charts• Relative-frequency bar chart: A graph that displays
the I.V. on the horizontal axis -- categories -- and the relative frequencies -- D.V. -- on the vertical axis. The relative frequency of each class is represented by a vertical bar whose height is equal to the relative frequency of the class.
• The difference between this and a frequency bar chart is that the proportion or percentage (always between zero and one) is listed instead of the numbers that fall into a given class.
Anthony J Greene 20
Relative Frequency Charts of Nominal Data
44%
23%
33%
DemocraticRepublicanOther
05
101520253035404550
%
Anthony J Greene 21
Probability Distribution and Probability Bar Chart
Frequency Distributions and Charts for a whole population
Probability distribution: A listing of the possible values and corresponding probabilities of a discrete random variable; or a formula for the probabilities.
Probability bar chart: A graph of the probability distribution that displays the possible values of a discrete random variable on the horizontal axis and the probabilities of those values on the vertical axis. The probability of each value is represented by a vertical bar whose height is equal to the probability.
Anthony J Greene 22
Probability Charts of Nominal Data
44%
23%
33%
DemocraticRepublicanOther
05
101520253035404550
Repub
lican
Democ
ratic
Other
%
Anthony J Greene 23
Bar Chart
Anthony J Greene 24
The Bar Graph: Nominal Data
Anthony J Greene 25
Sum of the Probabilities of a Discrete Random Variable
For any discrete random variable, X, the sum of the probabilities of its possible values equals 1; in symbols, we have P(X = x) = 1.
For example Republicans: 32.5%, Democrats 45.0%, Other 22.5%
0.325 + 0.450 + 0.225 = 1.00 or 100%
Anthony J Greene 26
Ordinal Variables
Note that “Rank” is the ordinal variable. “Mortality” is a ratio variable but can easily be downgraded to an ordinal variable with a loss of information
Anthony J Greene 27
Distributions and Charts for Ordinal Data
• Frequency distributions, relative frequency distribution, and probability distributions are done exactly as they were for Nominal Data
• Bar charts are used.
Anthony J Greene 28
Distribution of Education Level
Level P(x)
Elementary 0.03
High School 0.45
Associates 0.12
Bachelors 0.28
Masters 0.10
Doctorate 0.020
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Elemen
tary
High
Schoo
l
Assoc
iate
s
Bachelo
rs
Mas
ters
Docto
rate
Anthony J Greene 29
Interval and Ratio Data
• Frequency: The number of observations that fall in a class.
• Frequency distribution: A listing of all classes along with their frequencies.
• Relative frequency: The ratio of the frequency of a class to the total number of observations.
• Relative-frequency distribution: A listing of all classes along with their relative frequencies.
Anthony J Greene 30
Histograms
• Frequency histogram: A graph that displays the independent variable on the horizontal axis and the frequencies -- dependent variable -- on the vertical axis. The frequency is represented by a vertical bar whose height is equal to the frequency of cases that fall within a given range of the I.V.
Anthony J Greene 31
Interval and Ratio Variables
0
10
20
30
40
50
60
70
80
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of Education
Avg.Income (in thousands)
Anthony J Greene 32
Enrollment in Milwaukee Public Elementary Schools
Anthony J Greene 33
Relative Frequency distribution of Enrollments in MPS
Anthony J Greene 34
Probability distribution of a randomly selected elementary-school student
Anthony J Greene 35
Probability distribution of the age of a randomly selected student
Anthony J Greene 36
Probability Histogram
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
19 20 21 22 23 24 25 26 27
Anthony J Greene 37
Another Example
Anthony J Greene 38
Frequency vs. Relative Frequency
Anthony J Greene 39
Frequency vs. Relative Frequency
This is also the Probability Distribution
Anthony J Greene 40
More Examples:Frequency Histogram
Anthony J Greene 41
More Examples: Grouped Frequency Histogram
Anthony J Greene 42
Grouped Frequency Histogram
Anthony J Greene 43
Anthony J Greene 44
Proportions and Frequency
Anthony J Greene 45
Frequency Groupings
9 intervals with each interval 5 points wide. The frequency column (f) lists the number of individuals with scores in each of the class intervals.
Groupings: There had
to be a catch
• What to do with the in-betweens?
• Only a concern for continuous variables
• Real Limits -- those in the “14” bar are really from 13.5 to 14.5
• Upper Real Limits & Lower Real Limits: For the case of whole numbers, simply add 0.5 to the high score and subtract 0.5 from the lowest observed score (these observed scores are the “apparent limits”)
Anthony J Greene 47
Understanding Real Limits
Anthony J Greene 48
Real Limits & Apparent Limits
Anthony J Greene 49
Frequency & Cumulative FrequencyI.Q. Range Real Limits Frequency Cuml. Freq.
< 52 0 – 52.5 1 1
52-67 52.5-67.5 4 5
68-84 67.5-84.5 11 16
85-100 84.5-100.5 34 50
101-116 100.5-116.5 34 84
117-132 116.5-132.5 11 95
133-148 132.5-148.5 4 99
>148 148.5 + 1 100
Frequency (Normal Distribution)
0
5
10
15
20
25
30
35
40
>52 52-68 68-84 84-100 100-116 116-132 132-148 >148
Cumulative Frequency (Ogive)
0
10
20
30
40
50
60
70
80
90
100
>52 52-68 68-84 84-100 100-116 116-132 132-148 >148
Anthony J Greene 52
Computing Percentile RanksPounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 53
Computing Percentile Ranks
• Remember that each value has real limits
• What is the 90th %ile?
Pounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 54
Computing Percentile Ranks
• Remember that each value has real limits
• What is the 90th %ile? 2.5 because at or below “2” are 90% of the scores, but “2” includes all from 1.5 to 2.5
Pounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 55
Computing Percentile Ranks
• Remember that each value has real limits
• What is the 90th %ile? 2.5 because at or below “2” are 90% of the scores, but “2” includes all from 1.5 to 2.5
• What is the 20th %ile?
Pounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 56
Computing Percentile Ranks
• Remember that each value has real limits
• What is the 90th %ile? 2.5 because at or below “2” are 90% of the scores, but “2” includes all from 1.5 to 2.5
• What is the 20th %ile? 0.5
Pounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 57
Computing Percentile Ranks
• What about the in-betweens?
• What is the 80th %ile?
• What %ile corresponds to 2 lbs?
Pounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 58
Linear Interpolation
Anthony J Greene 59
Linear Interpolation
And Percentiles
• What is the 80th %ile?
Pounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 60
Linear Interpolation
And Percentiles
• What is the 80th %ile?
Where’s the 80th %ile? 17.5/27.5 = 0.63. The interval is 1.0 lb, so 1.5 + 1(0.63) = 2.13
Pounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 61
Linear Interpolation
And Percentiles
• What is the 80th %ile?
Where’s the 80th %ile? 17.5/27.5 = 0.63. The interval is 1.0 lb, so 1.5 + 1(0.63) = 2.13
• What %ile corresponds to 2 lbs?
Pounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 62
Linear Interpolation
And Percentiles
• What is the 80th %ile?
Where’s the 80th %ile? 17.5/27.5 = 0.63. The interval is 1.0 lb, so 1.5 + 1(0.63) = 2.13
• What %ile corresponds to 2 lbs?
2 lbs. Is halfway into the interval (0.5). So its halfway between 62.5 - 90.0. Since 27.5% of the scores are in this interval we need to go up 0.5(27.5%) = 13.75%. 62.5% + 13.75% = 76.25%
Pounds x
Real Limits
Freq
f
RelativeFreq.
Cuml.Freq.
%ile
0 0-0.5 8 0.200 0.200 20.0
1 0.5-1.5 17 0.425 0.625 62.5
2 1.5-2.5 11 0.275 0.900 90.0
3 2.5-3.5 3 0.075 0.975 97.5
4 3.5-4.5 1 0.025 1.000 100
Anthony J Greene 63
The Stem & Leaf Diagram
Anthony J Greene 64
Stem & Leaf Plots
Anthony J Greene 65
Comparison of Frequency Histogram vs.
Stem & Leaf Diagram
Anthony J Greene 66
The Blocked Frequency Histogram
Anthony J Greene 67
The Frequency Distribution Polygon –or– Line Graph
Anthony J Greene 68
Grouped Frequency Polygon
Anthony J Greene 69
The Normal Distribution
Anthony J Greene 70
Variants on the Normal Distribution
Anthony J Greene 71
Comparing Two Distributions
Number of Sentences recalled from each category
Anthony J Greene 72
Comparing Distributions
Anthony J Greene 73
Distributions
Anthony J Greene 74
Variables and Distributions
In Class Exercise
Anthony J Greene 75
The Math You’ll Need To Know
Calculate:
ΣX =
ΣX2 =
(ΣX)2 =
X
1
2
0
4
Anthony J Greene 76
The Math You’ll Need To Know
Calculate:
ΣX = 7
ΣX2 =
(ΣX)2 =
X
1
2
0
4
Anthony J Greene 77
The Math You’ll Need To Know
Calculate:
ΣX = 7
ΣX2 = 21
(ΣX)2 =
X
1
2
0
4
Anthony J Greene 78
The Math You’ll Need To Know
Calculate:
ΣX = 7
ΣX2 = 21
(ΣX)2 = 49
X
1
2
0
4
Anthony J Greene 79
The Math You’ll Need To Know
Calculate:ΣX = ΣY = ΣX ΣY = ΣXY =ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =
X Y
1 3
3 1
0 -2
2 -4
Anthony J Greene 80
The Math You’ll Need To Know
Calculate:ΣX = 6ΣY = ΣX ΣY = ΣXY =ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =
X Y
1 3
3 1
0 -2
2 -4
Anthony J Greene 81
The Math You’ll Need To Know
Calculate:ΣX = 6ΣY = -2ΣX ΣY = ΣXY =ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =
X Y
1 3
3 1
0 -2
2 -4
Anthony J Greene 82
The Math You’ll Need To Know
Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY =ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =
X Y
1 3
3 1
0 -2
2 -4
Anthony J Greene 83
The Math You’ll Need To Know
Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =
X Y
1 3
3 1
0 -2
2 -4
Anthony J Greene 84
The Math You’ll Need To Know
Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = 14(ΣX)2 = ΣY2 = (ΣY)2 =
X Y
1 3
3 1
0 -2
2 -4
Anthony J Greene 85
The Math You’ll Need To Know
Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = 14(ΣX)2 = 36ΣY2 = (ΣY)2 =
X Y
1 3
3 1
0 -2
2 -4
Anthony J Greene 86
The Math You’ll Need To Know
Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = 14(ΣX)2 = 36ΣY2 = 30(ΣY)2 =
X Y
1 3
3 1
0 -2
2 -4
Anthony J Greene 87
The Math You’ll Need To Know
Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = 14(ΣX)2 = 36ΣY2 = 30(ΣY)2 = 4
X Y
1 3
3 1
0 -2
2 -4
Anthony J Greene 88
The Math You’ll Need To Know
The Mean
Σx/n = M
where n = sample size
X
1
4
8
3
Anthony J Greene 89
The Math You’ll Need To Know
Calculate:
Σ(x-M) =
Σ(x-M)2 =
Σ(x2 –M2) =
X M = 4
1
4
8
3
Anthony J Greene 90
The Math You’ll Need To Know
Calculate:
Σ(x-M) = 0
Σ(x-M)2 =
Σ(x2 –M2) =
X M = 4
1
4
8
3
Anthony J Greene 91
The Math You’ll Need To Know
Calculate:
Σ(x-M) = 0
Σ(x-M)2 = 26
Σ(x2 –M2) =
X M = 4
1
4
8
3
Anthony J Greene 92
The Math You’ll Need To Know
Calculate:
Σ(x-M) = 0
Σ(x-M)2 = 26
Σ(x2 –M2) = 26
X M = 4
1
4
8
3
Anthony J Greene 93
The Math You’ll Need To Know
Calculate:
sp = 13
n1= 8
n2= 102
2
1
2
n
s
n
s pp
Anthony J Greene 94
The Math You’ll Need To Know
Calculate:
sp = 13
n1= 8
n2= 10
17.6025.38
9.16125.21
10
169
8
169
Anthony J Greene 95
The Math You’ll Need To Know
Calculate:
sp = 13
n1= 8
n2= 1017.62
2
1
2
n
s
n
s pp
Anthony J Greene 96
What Type of Data?
Years Spent in the Military
Anthony J Greene 97
What Type of Data?
Military Rank:
Lieutenant
Captain
Major
Lt. Colonel
Colonel
General
Anthony J Greene 98
What Type of Data?
Branch of Service:
Army
Air Force
Navy
Marine Corps
Coast Guard
Anthony J Greene 99
What Type of Data?
Time taken to complete a 30 mile bicycle race
Anthony J Greene 100
What Type of Data?
Finishing place in a 30 mile bicycle race
Anthony J Greene 101
Frequency Dist. & Percentile
Raw Scores:
15, 18, 21, 23, 27, 33, 33, 35, 36, 36, 39, 41
44, 47, 49, 50
Anthony J Greene 102
Frequency Dist. & Percentile
X f
10-19 2
20-29 3
30-39 6
40-49 4
50-59 1
Anthony J Greene 103
Frequency Dist. & Percentile
Compute the 52%ile X f
10-19 2
20-29 3
30-39 6
40-49 4
50-59 1
Anthony J Greene 104
Frequency Dist. & Percentile
Compute the 52%ile X f Cum f
10-19 2 2
20-29 3 5
30-39 6 11
40-49 4 15
50-59 1 16
Anthony J Greene 105
Frequency Dist. & Percentile
Compute the 52%ile• The 52%ile is
somewhere between 30-39.
X f Cum f
10-19 2 2 0.125
20-29 3 5 0.3125
30-39 6 11 0.6875
40-49 4 15 0.9375
50-59 1 16 1.0
Anthony J Greene 106
Frequency Dist. & Percentile
Compute the 52%ile• The 52%ile is
somewhere between 30-39.
• That interval is from 0.3125 – 0.6875
X f Cum f
10-19 2 2 0.125
20-29 3 5 0.3125
30-39 6 11 0.6875
40-49 4 15 0.9375
50-59 1 16 1.0
Anthony J Greene 107
Frequency Dist. & PercentileCompute the 52%ile• The 52%ile is
somewhere between 30-39.
• That interval is from 0.3125 – 0.6875
• That interval is 0.375 wide
X f Cum f
10-19 2 2 0.125
20-29 3 5 0.3125
30-39 6 11 0.6875
40-49 4 15 0.9375
50-59 1 16 1.0
Anthony J Greene 108
Frequency Dist. & PercentileCompute the 52%ile• The 52%ile is
somewhere between 20-29.
• That interval is from 0.3125 – 0.6875
• That interval is 0.375 wide
• To get from 0.3125 to 0.52 we go 0.2075 into the interval
X f Cum f
10-19 2 2 0.125
20-29 3 5 0.3125
30-39 6 11 0.6875
40-49 4 15 0.9375
50-59 1 16 1.0
Anthony J Greene 109
Frequency Dist. & Percentile• That interval is from 0.3125 – 0.6875
• That interval is 0.375 wide
• To get from 0.3125 to 0.52 we go 0.2075 into the interval
• That’s 0.553 of the way into the interval (0.2075/0.375)
X f Cum f
10-19 2 2 0.125
20-29 3 5 0.3125
30-39 6 11 0.6875
40-49 4 15 0.9375
50-59 1 16 1.0
Anthony J Greene 110
Frequency Dist. & Percentile• That’s 0.553 of the
way into the interval (0.2075/0.375)
• The real limits are from 19.5 to 29.5 (a range of 10)
• 52%ile is 29.5 + 5.53 = 35.03
This Process is called
Linear Interpolation
X f Cum f
10-19 2 2 0.125
20-29 3 5 0.3125
30-39 6 11 0.6875
40-49 4 15 0.9375
50-59 1 16 1.0