1
Quantitative Methods
Topic 5Probability Distributions
2
Outline
Probability Distributions For categorical variables For continuous variables
Concept of making inference
3
Reading
Chapters 4, 5 and Chapter 6(particularly Chapter 6)
Fundamentals of Statistical Reasoning in Education,
Colardarci et al.
4
Tossing a coin 10 times - 1
If the coin is not biased, we would expect “heads” to turn up 50% of the time.
However, in 10 tosses, we will not get exactly 5 “heads”.Sometimes, it could be 4 heads out of 10
tosses. Sometimes it could be 3 heads, etc.
5
Tossing a coin 10 times - 2
What is the probability of getting No ‘heads’ in 10 tosses1 ‘head’ in 10 tosses2 ‘heads’ in 10 tosses3 ‘heads’ in 10 tosses……
6
Do an experiment in EXCEL
See animated demo CoinToss1_demo.swf
7
Frequencies of 50 sets of coin tosses
8
Histogram of 50 sets of coin tosses
9
Some terminology Random variable
A variable the values of which are determined by chance.
Examples of random variablesNumber of heads in 10 tosses of a coinTest score of studentsHeight Income
10
Probability distribution (function)
Shows the frequency (or chance) or occurrence of each value of the random variable.
11
Probability Distribution of Coin Toss - 1 Slide 8 shows the
empirical probability distribution.
Theoretical one can be computed
See animated demoBinomial Probability_demo.swf
Number of heads in 10
tosses Probability
0 0.001
1 0.010
2 0.044
3 0.117
4 0.205
5 0.246
6 0.205
7 0.117
8 0.044
9 0.010
10 0.001
12
Probability Distribution of Coin Toss - 2
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0 1 2 3 4 5 6 7 8 9 10
Theoretical probabilities
13
How can we use the probability distribution - 1?
Provide information about “central tendency” (where the middle is, typically captured by Mean or Median), and variation (typically captured by standard deviation).
14
How can we use the probability distribution - 2? Use the distribution as a point of reference Example:
If we find that, 20% of the time, we obtain only 1 head in 10 coin tosses, when the theoretical probability is about 1%, we may conclude that the coin is biased (not 50-50 chance of tossing a head)
Theoretical distribution will be better than empirical distribution, because of fluctuation in the collection of data.
15
Random variables that are continuous Collect a sample of height measurement
of people. Form an empirical probability distribution Typically, the probability distribution will be
a bell-shaped curve. Compute mean and standard devation Empirical distribution is obtained Can we obtain theoretical distribution?
16
Normal distribution - 1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-4 -3 -2 -1 0 1 2 3 4
17
Normal distribution - 2
A random variable, X, that has a normal distribution with mean and standard deviation can be transformed to a variable, Z, that has standard normal distribution where the mean is 0 and the standard deviation is 1.
z-score
Need only discuss properties of the standard normal distribution
x
z
18
Standard normal distribution - 1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-4 -3 -2 -1 0 1 2 3 4
2.5% in this
region
5% in this region
1.96-1.64
19
Standard normal distribution - 2
2.5% outside 1.96 So around 5% less than -1.96, or greater than
1.96. So the general statement that
Around 95% of the observations are within -2 and 2.
More generally, around 95% of the observations are within -2 and 2 (± 2 standard deviations).
20
Standard normal distribution - 3
Around 95% of the observations lie within ± two standard deviations (strictly, ±1.96)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-4 -3 -2 -1 0 1 2 3 4
95% in this
region
21
Standard normal distribution - 3
Around 68% of the observations lie within ± one standard deviation
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-4 -3 -2 -1 0 1 2 3 4
68% in this
region
22
Computing normal probabilities in EXCEL See animated demo
NormalProbability_demo.swf
23
Exercise - 1
For the data set distributed in Week 2, TIMSS2003AUS,sav, for the variable bsmmat01 (second last variable, maths estimated ability),
compute the score range where the middle 95% of the scores lie: Use the observed scores and compute the percentiles
from the observations Assume the population is normally distributed
24
Exercise - 2
Dave scored 538. What percentage of students obtained scores higher than Dave?Use the observed scores and compute the
percentiles from the observationsAssume the population is normally distributed