binomial distribution - cornell universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfbinomial...
TRANSCRIPT
Binomial Distribution
Binomial Experiment
1 The same experiment is repeated a fixed number of times.
2 There are only two possible outcomes, success and failure.;P( success ) = p, P( failure ) = 1− p.
3 The repeated trials are independent, so that the probability of successremains the same for each trial.
The Binomial Distribution is
P( exactly k successes in n trials) = pk(1− p)n−kC (n, k).
Examples are a 2 showing in a rool of dice, or H in a toss of coins frombefore, but NOT birthdays.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 1 / 22
Binomial Distribution, Examples I
Example (#48 Section 8.4)
A hospital receives 1/5 = 0.2 of its flu vaccine shipments from Company Xand the remainder of its shipments from other companies. Each shipmentcontains a very large number of vaccine vials. For Company X’sshipments, 10% of the vials are ineffective. For every other company, 2%of the vials are ineffective. The hospital tests 30 randomly selected vialsfrom a shipment and finds that one vial is ineffective. What is the prob-ability that this shipment came from Company X?
Dan Barbasch Math 1105 Chapter 8 Week of September 17 2 / 22
Binomial Distribution, Examples IIAnswer.
For X , p = 0.1 and 1− p = 0.9, for NX , p = 0.02 and 1− p = 0.98.
P(X ) = 0.2 P(NX ) = 0.8
P(D | X ) = 0.1 P(D | NX ) = 0.02
P(1D/30 | X ) = C (30, 1)× (0.1)1 × (0.9)29
P(1D/30 | NX ) = C (30, 1)× (0.02)1 × (0.98)29
Draw the usual tree diagram for Bayes’s theorem and compute.
P(X | 1D/30) =P(1D/30 and X )
P(1D/30)=
=C (30, 1) · (0.1)1 · (0.9)29 · 0.2
C (30, 1) · (0.1)1 · (0.9)29 · 0.2 + C (30, 1) · (0.02)1 · (0.98)29 · 0.8.
The answer is (close to) 0.1.Dan Barbasch Math 1105 Chapter 8 Week of September 17 3 / 22
Pascal’s Triangle I
The triangular array of numbers shown below is called Pascals triangle inhonor of the French mathematician Blaise Pascal (1623 - 1662), who wasone of the first to use it extensively. The triangle was known long beforePascals time and appears in Chinese and Islamic manuscripts from theeleventh century.
11 1
1 2 11 3 3 1
1 4 6 4 11 5 10 10 5 1
The array provides a quick way to find binomial probabilities. The nth rowof the triangle, where n = 0, 1, 2, 3, . . . , gives the coefficients C (n, r) forr = 0, 1, 2, 3, . . . , n. For example, for n = 4,1 = C (4, 0), 4 = C (4, 1), 6 = C (4, 2), and so on. Each number in the
Dan Barbasch Math 1105 Chapter 8 Week of September 17 4 / 22
Pascal’s Triangle II
triangle is the sum of the two numbers directly above it. For example, inthe row for n = 4, 1 is the sum of 1, the only number above it, 4 is thesum 1 + 3, 6 = 3 + 3 and so on.The general formula is
C (n, r) = C (n − 1, r − 1) + C (n − 1, r).
Choosing r out of n is the same as the sum of choose r out of n − 1(make the choice of all r out of 1, . . . , n− 1 plus choose r − 1 out of n− 1(choose n and then r − 1 out of n − 1).
Dan Barbasch Math 1105 Chapter 8 Week of September 17 5 / 22
Example, Sports I
In many sports championships, such as the World Series in baseball andthe Stanley Cup final series in hockey, the winner is the first team to winfour games. For this exercise, assume that each game is independent ofthe others, with a constant probability p that one specified team (say, theNational League team) wins.a. Find the probability that the series lasts for four, five, six, and sevengames when p = 0.5.b. Morrison and Schmittlein have found that the Stanley Cup finals can bedescribed by letting p = 0.73 be the probability that the better team winseach game. Find the probability that the series lasts for four, five, six, andseven games.Source: Chance.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 6 / 22
Example, Sports II
Answer.
P( End in exactly 4 ) = P(AAAA and BBBB} = 2 · (0.5)4,
P( End in exactly 5 ) = C (4, 1)(0.5)5 + C (4, 3)(0.5)5,
P( End in exactly 6 ) = C (5, 2) · (0.5)6 + C (5, 3)(0.5)6,
P( End in exactly 7 ) = C (6, 3)(0.5)7 + C (6, 3)(0.5)7.
From the triangle,
C (4, 1) = C (3, 1) + C (3, 0) = 3 + 1 = 4,
C (4, 3) = C (3, 3) + C (3, 2) = 1 + 3 = 4,
C (5, 2) = C (4, 2) + C (4, 1) = 6 + 4 = 10,
C (5, 3) = C (4, 3) + C (4, 2) = 4 + 6 = 10,
C (6, 3) = C (5, 3) + C (5, 2) = 10 + 10 = 20.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 7 / 22
Fermat and Pascal IFand P are playing a game. They toss a coin, p = P(H) = 0.3. F wins ifH, P wins if T . F leads 8 to 7. What is the probability that the game endswhenever one reaches 20. What is the probability the game ends afteranother
1 12
2 20
tosses?
Dan Barbasch Math 1105 Chapter 8 Week of September 17 8 / 22
Fermat and Pascal II
Answer.
For (1), p12. Only F can win.
For (2) C (19, 12)p12(1− p)8 + C (19, 13)p7(1− p)13. The sum ofprobabilities that F wins and that P wins.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 9 / 22
Random Variables I
Random Variable A random variable X is a function that assigns a realnumber to each outcome of an experiment.
Probability Distribution The probability distribution of a radom variable is{P(X = k) = pk} with 0 ≤ pk ≤ 1 and the sum of pk ,∑n
k=0 = 1. This definition is for when X takes finitely manyvalues only.
Expected Value E (X ) =∑
k kP(X = k).
Example
Toss a coin. Let X = 1 if H, and X = 0 if T . The coin satisfies P(H) = pand P(T ) = 1− p. The probability distribution is P(X = 1) = 1 andP(X = 0) = 1− p. Then EX = 1 · p + 0 · (1− p) = p.If X = 1 if H, and X = −1 if T , then EX = 1 · p + (−1) · (1− p) = 2p− 1.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 10 / 22
Random Variables II
Example
Toss two fair dice. Let X be the sum of the faces.
P(X = 2) = 1/36 P(X = 3) = 2/36 P(X = 4) = 3/36
P(X = 5) = 4/36 P(X = 6) = 5/36 P(X = 7) = 6/36
P(X = 8) = 5/36 P(X = 9) = 4/36 P(X = 10) = 3/36
P(X = 11) = 2/36 P(X = 12) = 1/36
Then EX = 2 · 1/36 + 3 · 2/36 + 4 · 3/36 + 5 · 4/36 + 6 · 5/36 + 7 · 6/36 +8 · 5/36 + 9 · 4/36 + 10 · 3/36 + 11 · 2/36 + 12 · 1/36= 7.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 11 / 22
Motivation
Suppose you have a coin that comes up H 30% of times and T 70% oftimes. You get paid $2 if H, and you pay out $1 if T. What do you expectto have after 100 tosses? The intuition says it should be the average,2 · 30− 1 · 70 = −10. For one toss you’d expect 2 · 0.3 + (−1) · 0.7 = −0.1.Repeat a 100 times, and you expect to have lost $10.The mathematics is the Expected Value. We interpret P(H) = 0.3 andP(T ) = 0.7.The expected value is EX = 2 · P(H) + (−1) · P(T ).For n tosses you expect nEX .
Dan Barbasch Math 1105 Chapter 8 Week of September 17 12 / 22
Expected Value of a Sum of Independent Variables I
Definition
Two random variables X1,X2 are called independent ifP(X1 = a,X2 = b) = P(X1 = a) · P(X2 = b). More general, X1, . . .Xn arecalled independent if P(Xi1 = a1, . . .Xik = ak) = P(Xi1 = i1) · P(Xik = ak)for any choice of a subset of the variables.
Theorem
Let X1, . . . ,Xn be independent random variables. ThenE (X1 + · · ·+ Xn) = EX1 + · · ·+ EXn
We ilustrate the proof for the case n = 2, E (X1 + X2) = EX1 + EX2. Thisis the warmup.
P(X1 = a1) = p P(X1 = a2) = 1− p
P(X2 = b1) = q P(X2 = b2) = 1− q.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 13 / 22
Expected Value of a Sum of Independent Variables II
EX1 = a1p+ a2(1− p)
EX2 = b1q+ b2(1− q)
E (X1 + X2) = (a1 + b1)pq+ (a1 + b2)p(1− q)+
(a2 + b1)(1− p)q+ (a2 + b2)(1− p)(1− q)
Gather the terms according to the a′s and b′s, and do the algebra.
a1(pq + p(1− q)) = a1p
a2((1− p)q + (1− p)(1− q)) = a2(1− p)
b1((1− p)q + pq) = b1q
b2(p(1− q) + 1− p)(1− q)) = b2(1− q).
Dan Barbasch Math 1105 Chapter 8 Week of September 17 14 / 22
Expected Value of a Sum of Independent Variables III
Example (Binomial distribution)
For n independent identical trials each with two possible outcomes, S andF, with probability p and 1− p, X the number of S , the distribution isP(X = k) = C (n, k)pk(1− p)n−k . The expected value is
EX =n∑
k=0
kC (n, k)pk(1− p)n−k= np.
For a single trial, EX = p · 1 + 0 · (1− p) = p.The general case can be computed directly using algebra.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 15 / 22
Binomial Distribution
We apply this to the binomial distribution, X1, . . . ,Xn i.i.d. (independentidentically distributed random variables) with probability distributionP(X = 1) = p, P(X = 0) = 1− p. Then
EX = 1 · p + 0 · (1− p) = p.
SoE (X1 + · · ·+ Xn) = p + · · ·+ p︸ ︷︷ ︸
n
= np.
The case of two dice is similar; X = X1 + X2. Then
EX1 = EX2 = 1·1/6+2·1/6+3·1/6+4·1/6+5·1/6+6·1/6 = 21/6 = 7/2.
So E (X1 + X2) = 7/2 + 7/2 = 7.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 16 / 22
Statistics, Measures of Central Tendency I
We are considering a random variable X with a probability distributionwhich has some parameters. We want to get an idea what theseparameters are. We perfom an experiment n times and record theoutcome. This means we have X1, . . . ,Xn i.i.d. random variables, withprobability distribution same as X . We want to use the outcome to inferwhat the parameters are.
Mean The outcomes are x1, . . . , xn. The Sample Mean isx := x1+···+xn
n . Also sometimes called the average. Theexpected value of X , EX , is also called the mean of X .Often denoted by µ. Sometimes called population mean.
Median The number so that half the values are below, half above. Ifthe sample is of even size, you take the average of themiddle terms.
Mode The number that occurs most frequently. There could beseveral modes, or no mode.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 17 / 22
Statistics, Measures of Central Tendency II
Example
You have a coin for which you know that P(H) = p and P(T ) = 1− p.You would like to estimate p. You toss it n times. You count the numberof heads. The sample mean should be an estimate of p.
EX = p, and E (X1 + · · ·+ Xn) = np. So
E
(X1 + · · ·+ Xn
n
)= p.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 18 / 22
Descriptive Statistics I
Frequency Distribution Divide into a number of equal disjoint intervals.For each interval count the number of elements in thesample occuring.
Histogram see the next slide
Grouped Data Mean Essentially calculate the mean of the frequencydistribution. Intervals are used, rather than single values. Itis assumed that all these values are located at the midpointof the interval. The letter xM is used to represent themidpoints and f represents the frequencies:∑
xM f
n
Frequency Polygon Connect the middles of the tops of each interval.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 19 / 22
HistogramA histogram is a graphical representation of the distribution of numericaldata. It is a kind of bar graph. To construct a histogram, the first step isto ”bin” the range of values, that is, divide the entire range of values intoa series of intervals, and then count how many values fall into eachinterval. The bins are usually specified as consecutive, non-overlappingintervals of a variable. The bins (intervals) must be adjacent, and areoften (but are not required to be) of equal size.
Bin Count−3.5− 2.51 9−2.5− 1.51 32−1.5− 0.51 109−0.5− 0.49 1800.5− 1.49 1321.5− 2.49 342.5− 3.49 4
Mean: (−3)·9+(−2)·32+(−1)·109+·(0)180+1·132+2·34+3·4500
Dan Barbasch Math 1105 Chapter 8 Week of September 17 20 / 22
Example
The table on the next page gives the number of days in June and July ofrecent years in which the temperature reached 90 degrees or higher in NewYorks Central Park. Source: The New York Times and Accuweather.com.a. Prepare a frequency distribution with a column for intervals andfrequencies. Use seven intervals, starting with [0 4].b. Sketch a histogram and a frequency polygon, using the intervals in parta.c. Find the mean for the original data.d. Find the mean using the grouped data from part a.e. Explain why your answers to parts c and d are different.f. Find the median and the mode for the original data.
Dan Barbasch Math 1105 Chapter 8 Week of September 17 21 / 22
Temperature Data
9.1 Frequency Distributions; Measures of Central Tendency 417
a. Use this table to estimate the mean income for white house-holds in 2008.
b. Compare this estimate with the estimate found in Exercise39. Discuss whether this provides evidence that white Amer-ican households have higher earnings than African Ameri-can households.
41. Airlines The number of consumer complaints against the topU.S. airlines during the first six months of 2010 is given in thefollowing table. Source: U.S. Department of Transportation.
Delta 1175 2.19
American 660 1.56
United 487 1.84
US Airways 428 1.69
Continental 350 1.64
Southwest 149 0.29
Skywest 77 0.65
American Eagle 68 0.87
Expressjet 56 0.70
Alaska 34 0.44
Complaints per 100,000Airline Complaints Passengers Boarding
Pig 16
Cow 12
Chicken 11
Horse 9
Human 8
Sheep 7
Dog 7
Rhesus monkey 6
Mink 5
Rabbit 5
Mouse 4
Rat 4
Cat 2
Animal Number of Blood Types
a. By considering the numbers in the column labeled “Com-plaints,” calculate the mean and median number of com-plaints per airline.
b. Explain why the averages found in part a are not meaningful.
c. Find the mean and median of the numbers in the columnlabeled “Complaints per 100,000 Passengers Boarding.”Discuss whether these averages are meaningful.
Life Sciences
42. Pandas The size of the home ranges (in square kilometers) ofseveral pandas were surveyed over a year’s time, with the fol-lowing results.
0.1–0.5 11
0.6–1.0 12
1.1–1.5 7
1.6–2.0 6
2.1–2.5 2
2.6–3.0 1
3.1–3.5 1
Home Range Frequency
a. Sketch a histogram and frequency polygon for the data.
b. Find the mean for the data.
43. Blood Types The number of recognized blood types varies byspecies, as indicated by the table below. Find the mean,median, and mode of this data. Source: The Handy ScienceAnswer Book.
General Interest
44. Temperature The following table gives the number of days inJune and July of recent years in which the temperature reached90 degrees or higher in New York’s Central Park. Source: TheNew York Times and Accuweather.com.
a. Prepare a frequency distribution with a column for intervalsand frequencies. Use six intervals, starting with 0–4.
b. Sketch a histogram and a frequency polygon, using theintervals in part a.
c. Find the mean for the original data.
d. Find the mean using the grouped data from part a.
e. Explain why your answers to parts c and d are different.
f. Find the median and the mode for the original data.
1972 11 1985 4 1998 5
1973 8 1986 8 1999 24
1974 11 1987 14 2000 3
1975 3 1988 21 2001 4
1976 8 1989 10 2002 13
1977 11 1990 6 2003 11
1978 5 1991 21 2004 1
1979 7 1992 4 2005 12
1980 12 1993 25 2006 5
1981 12 1994 16 2007 4
1982 11 1995 14 2008 10
1983 20 1996 0 2009 0
1984 7 1997 10 2010 20
Year Days Year Days Year Days
Dan Barbasch Math 1105 Chapter 8 Week of September 17 22 / 22