introductory statistics

416
Int \370ductory statistics THOMA Associate University oi RONALI Professor of University ol New York WONNACOTT )lessor of Mathematics Western Ontario ) J. WONNACOTT Economics Western Ontario & SONS\177 INC. \370ndon' Sydney' Toronto

Upload: shaine-joseph

Post on 28-Sep-2015

276 views

Category:

Documents


3 download

DESCRIPTION

Wonnacott

TRANSCRIPT

  • Int \370ductory

    statistics

    THOMA

    Associate

    University oi

    RONALI

    Professor ofUniversity ol

    New York

    WONNACOTT

    )lessor of MathematicsWestern Ontario

    ) J. WONNACOTT

    EconomicsWestern Ontario

    & SONS\177 INC.

    \370ndon' Sydney' Toronto

  • Copyright \270 1969 by John Wiley & Sons, Inc.

    All rights reserved. No part of this book ma)'be reproduced by any means, nor transmitted,nor translated into a machine langua\177\275e with-out the written permission o.f the publisher.109876543

    Library of Congress Catalog Card Number' 69-16041SBN 471 95965 0Printed in the United States of America

  • INTRODUCTORY STATISTICS

  • Monique and Eloise

  • Our ol\177jective has been to write a text that would come into the Statistics

    market belween the two texts written by Paul G. Hoel (or the tWO textswritten by Iohn E. Freund). We have tried to cover most of the material intheir mathematical statistics books, but we have used mathematics onlyslightly m\370\177'edifficult than that used in their elementary books. calculus isused only ih sections where the argument is difficult to develop without it;although tt}is puts the calculus student at an advantage, we have made aspecial effort to design these sectionsso that a student without calculus canalso folloW\177 \177

    By r ectexts we haiby books iand infererobjectiveh\177appeared i\177alence of i\177and of analcase our m(

    enjoy--theproblems.

    We ha{regressionstatisticiansuch relatec

    Our or

    students, b]cordingly, tsciences, foi

    by mathe\177omitted fro !audience: f\177

    decisions, a

    uiring a little more mathematics than many other elementaryre been able to treat many important topics normally covered only

    mathematical statistics: for example, the relation of sampling:e to the theory of probability and random variables. Another.sbeen to show the logical relation between topics that have often

    texts as separate and isolatedchapters:for example, the equiv-terval estimation and hypothesistesting, of the t test and \234 test,,sis of variance and regressionusing dummy variables. in every:ivation has been twofold' to help the student appreciate. indeed

    underlying logic, and to help him arrive at answers to practical!\177eplaced high priority on the regression model, not only becauses widely regarded as the most powerful tool of the practicingbut also because it provides a good focal point for understandingtechniques as correlation and analysis of variance.

    ginal aim was to write an introduction to statistics for e\177onomictt as our efforts increased, so it seems did our ambitiQns. Ac\177\177isbook is now written for students in economics and other social

    business schools, and for service courses in statistics Provideditics departments.Some of the topics covered are typicallyrn introductory courses, but are of interest to such,r example, multiple comparisons, multiple regression,\177dgame theory.

    vii

    a broadBayesian

  • viii PREFACE

    A statistics text aimed at severalaudiences including students with andwithout calculus .raisesmajor problems of evenness and design. The textitself is kept simple, with the more difficult interpretations and developmentsreservedfor footnotes and starred sections. In all instancestheseareoptional;a special effort has been made to allow the more elementary student to skipthese completely without losing continuity. Moreover, some of the finerpoints are deferred to the instructor's manual. Thus the instructor is allowed,at least to somedegree,to tailor the course to his students' background.

    Problemsare also starred (*) if they are more difficult, or set with anarrow (=>) if they introduce important ideas taken up later in the text, orbracketed ( ) if they duplicate previous problems, and thus provide optionalexercise only.

    Our experiencehas beenthat this is about the right amount of materialfor a two-semestercourse;a single semester introduction is easily designedto include the first 7, 8, or 9 chapters. We have also found that majors ineconomics who may be pushed a bit harder can cover the first 10 chapters inone semester. This has allowed us in the second semester to use our forth-coming Econometricstext which provides more detailed coverage of thematerial in Chapters 11 to 15of this book, plus additional material on serialcorrelation, identification, and other econometric problems.

    So many have contributed to this book that it is impossible to thankthem all individually. However, a special vote of thanks should go, withoutimplication, to the following for their thoughtful reviews: Harvey J. Arnold,David A. Belsley,Ralph A. Bradley, Edward Greenberg, Leonard Kent,R. W. Pfouts, and especially Franklin M. Fisher. We are also indebted to ourteaching assistantsand the students in both mathematics and economicsatthe University of Western Ontario and Wesleyan (Connecticut) who suggestedmany improvements during a two-year classroom test.

    London, Ontario, CanadaSeptember, 1968

    Thomas H. WonnacottRonald J. Wonnacott

  • 1 Introducti

    2

    on

    1-1 Ex \177mple1-2 In(.uction and Deduction1-3 Wiry Sample?

    1-4 Hiw to SampleD o . t

    escnpt\177e Statistics for Samples2-1

    Intl'oduction

    FrgquenCy Tables and Graphsce([ters (Measuresof Location)Deifiations (Measures of Spread)Lin!:ar TransfOrmations (Coding)

    3 Probabilit3-13-23-33-4

    3-5

    3-6

    4 Random

    4-1

    4-64-7

    F

    Int\177 oduction

    Ele\177nentary Properties of ProbabilityEvents and Their Probabilitiescor\177ditional ProbabilityIndi:pendenceOtl\177er Views of Probability

    ariables and Their'DistributiOns

    DisCreteRandom VariablesMe\177.n and VarianceBin\177,mial Distribution

    Col:]Th eA F:Notl

    tinUous Distributions

    N\370rmal Distributionanction of a RandOm Variablettion

    ix

    1

    1355

    888

    121719

    27

    27

    2930404548

    52525659

    63

    667273

  • X CONTENTS

    5 Two Random Variables5-15-25-3

    5-4

    DistributionsFunctions of Two Random VariablesCovariance

    Linear Combination of Two Random Variables

    6 Sampling

    6-16-26-36-46-56-66-7

    Introduction

    Sample SumSample MeanCentral Limit TheoremSampling from a Finite Population, without ReplacementSampling from Bernoulli PopulationsSummary of Sampling Theory

    7 Estimation7-17-27-3

    IIntroduction'Confidence Interval for the MeanDesirable Properties of EstimatorsMaximum-Likelihood Estimation (MLE)

    8 Estimation II8-18-28-3

    8-4

    Difference in Two MeansSmall SampleEstimation: the t DistributionEstimating Population Proportions' The Election Prob16mOnce AgainEstimating the Variance of a Normal Population' The Chi-Square Distribution

    9 Hypothesis Testing9-1

    9-29-39-49-5

    Testing a SimpleHypothesisComposite HypothesesTwo-Sided Tests vs. One-SidedTestsThe Relation of Hypothesis Tests to ConfidenceIntervalsConclusions

    10 Analysis of Variance10-110\1772

    10-3

    Introduction

    One-Factor Analysis of VarianceTwo-Factor Analysis of Variance

    77

    77848893

    102102105107

    112

    116119124

    128128134141

    150150152157

    163

    167

    167175185187'193

    195195195211

  • 11

    12

    ction to Regression

    \177nExample

    iosSible Criteria for Fitting a Line'he Least Squares Solutionix 11-1

    Introd

    11-!

    dd-3

    ,,Appem

    Regress

    12-1 ]12-2 T12-3 E

    12-4 T12-5 T12-612-712-812-912.-1012-11

    CONTENTS

    An Alternative Derivation of Least SquaresEsti-mates With\370ut Calculus

    !i

    ion Theory

    he Mathematical Modelhe Nature of the Error Termstimating \177.and fihe Mean and Varianceof a and bhe Gauss-Markov Theoremhe DistribUtion of a and b

    C>nfidence Intervals and Testing Hypotheses aboutP,'ediction Interval for Y0D'angers of Extrapolation

    l\177/laximUm Likelihood Estimation

    ]'he Characteristics of the II\177dependent Variable

    13 Multipl Regression

    :roductory ExampleLe Mathematical Model

    13-1 In13-2 T\177

    t/I\177_3 Lqas t squares Estimatio n13-4 Mhltic\370llinearity

    13-5 In}erpreting an Estimates Regression13-6 D\177mmy Variables

    \17713-7 Re\177ression, Analysis of Variance, and

    va[ianceCorrelatii) n

    Analysis of Co-

    14-1 Sir14-2 Pa:14-3 M1

    15 Decision

    15-1 Pri15-20p

    \177ple Correlation

    'tiaI Correlationltiple Correlation

    theory)r and Posterior Distributions:imal Decisions

    xi

    220

    221223

    225

    231

    234

    234236237238

    240

    241

    243245249250254

    25525525625726O

    265

    269

    278

    285

    285308310

    312312315

  • xii

    15-3

    15-415-515-615-7

    CONTENTS

    Estimation as a DecisionEstimation' BayesianVersus ClassicalCritique of Bayesian MethodsHypothesisTesting as a Bayesian DecisionGame Theory

    Appendix Tables

    Table ITable IITable IIITable IV

    Table VTable VITable VIITable VIII

    Squares and Square RootsRandom Digits and Normal VariatesBinomial Coefficients and ProbabilitiesStandard Normal ProbabilitiesStudent'st Critical PointsModified Chi-Square Critical PointsF Critical PointsCommon Logarithms

    Acknowledgements

    Answers to Odd-NumberedProblems

    Glossary of SymbolsIndex

    322324331333340

    350351360362

    367

    368369370374376

    377

    393

    397

  • rIntrodUction

    The wor i \"statiStics\" originally meant the collectionof popUlatiOn andeconomic in?Ormation vital to the state. From that modest beglnning,statistics ha s grown into a scientific method of analysis now applied to allthe social a4d naturaI sciences, and one of the major branches of inathe -matics. The\177resent aims and methods of statistics are best illustrated with

    a familiar ex!mple.:

    1-1 EXAM?LE:

    Before /ery presidential election, the pollsters try to pick the winner;specifically, '.hey try to guess the proportion of the population thht willvote for each candidate. Clearly, canvassing all voters would be a hopelesstask.As the \177nly alternative, they survey a sample of a few thousand in thehope that thq sample proportion will be a good estimate of the total pOpula-tion proport'\177on. This is a typical exampleof statistical inference or statisticalinduction.' the (voting) characteristicsof an unknown population are inferredfrom the (vo ing) characteristics of an observed sample.

    As any \177ollSter will admit, it is an uncertain business.To be sur e of thepopulation, \177ne has to wait until election day when all votes are cOUnted.Yet if the sa ripling is done fairly and adequately, we can have high hopesthat the sam'\177le proportion will be close to the population proporti\370hl Thisallows us tO:estimate the unknown population proportion rr from the ob-served sampl\177! ProPortion (?), as follows:

    J ,r = P 4- a small errorwith crucial uestions being, \"How s mall is this error ?\"we that we a

    'e right ?\"

    ;

    and \"Ho w sure are

  • 2 INTRODUCTION

    Since this typifies the very core of the book, we state it more preciselyin the language of Chapter 7 (where the reader will find the proof and afuIler understanding).

    If the sampling is random and large enough, we can state }\177,ith 95\177confidence that

    -4- 1.96x/P(l -- P) (1-2)

    u'here \177rand P are the population and sampleproportion, and n is the samplesize.

    As an illustration of how this Formula works, suppose we have sampled1,000voters, with 600 choosing the Democratic candidate. With this sampleproportion of .60, equation (1-2)becomes

    or approximately.60 -4- 1.965/.60 (1 -- .60)1000

    .60q-.03 (1-3)Thus, with 95 \177 confidence, we estimate the population proportion votingDemocrat to be between .57 and .63.

    This is referred to as a confidenceinterval, and making estimates of thiskind will be one of our major objectives in this book. The other objectiveis to test h)7\177otheses. For example, suppose we wish to test the hypothesisthat the Republican candidate will win the election. On the basis of theinformation in equation (1-3) we would reject this claim; it is no surprisethat a sample result that pointed to a Democratic majority of 57 to 63 \177 ofthe vote will also allow us to reject the hypothesisof a Republican victory.In general, there is a very close association of this kind between confidenceintervals and hypothesis tests; indeed, we will show that in many instancesthey are equivalent procedures.

    We pause to make several other crucial observations about equation(\177-3).

    1. The estimate is not made with certainO'; we are only 95 \177 confident.We must concede the possibility that we are wrong and wrong because wewere unlucky enough to draw a misleading sample. Thus, even if less thanhalf the population is in fact Democratic, it is still possible, although un-Ilkely, for us to run into a string of Democrats in our sample. In such circum-stances, our conclusicm (1-3) would be dead wrong. Since this sort of badluck is possible, but not likely, we can be 95\177oconfident of our conclusion.

    2. Luck becomesless of a factor as sample size increases; the morevoters we canvass, the less likely we are to draw a predominantly Democratic

  • sample fromtion. Formalthat the err(sample to 1\177of .60, our 9

    3. Supp!

    INDUCTION AND 'DEDUCTION 3

    a Republican population. Hence,the more precise our predic-ly, this is confirmed in equation (1-2); in this formula we note\177rterm decreases with sample size. Thus, if we increased our,000 voters, and continued to observea Democraticproportion5 5/o confidence interval would become the more precise'

    .604-.01 (1-4)ose our employer indicates that 95\177o confidence is not good

    resources fo]confidenceoof Democrat

    enough.\"Come back when you are 99% sure of your conclusion.\" We nowhave two Ol\177tions. One is to increase our sample size; as a result of thisadditional cost and effort we will be able to make an interval estimate withthe precision'; Of (1-4) but at a higher level of confidence.But if the additional

    further sampling are not available, then we can increase our\177ly by making a less precise statement--i.e., that the proportionsis

    .60 4-.02

    The less we commit ourselves to a precise prediction, the more confidentwe can be that we are right. In the limit, there are only two ways that wecan be certain of avoiding an erroneous conclusion. One is to make a state-ment so imprecisethat it cannot be contradicted. \177The other is to sample thewhole popul\177ttion2; but this is not statistics ...... it is just counting. Meaningfulstatistical co]\177clusions must be prefaced by some degree of uncertainty.

    1-2 INDU\234TION AND DEDUCTION

    Figure -1 illustrates the difference between inductive and deductivereasoning.h duction involves arguing from the specific to the general, or(in our case) from the sample to the population. Deduction is the reverse--arguing fro m the general to the specific, i.e., from the population to thesample. a Equhtion (1-1) represents inductive reasoning; we are arguing froma sample prc portion to a population proportion. But this is only possible\177E.g., rr = .502 Or, almost thpopulation to dcanvassing untithat some peopwe don't deal x\177a The student c\177that the populaThus inductiondeduction is aron induction.

    :k .50.

    e whole population. Thus it would not be necessary to poll the wholeetermine the winner of an election; it would only be necessary to continueone candidate comes up with a majority. (It is always possible, of course,e change their mind between the sample survey and their actual vote, butith this issue here.).n easily keep these straight with the help of a little Latin, and recognitiontion is the point of reference. The prefix in means \"into\" or \"towards.\"s arguing towards the population. The prefix de means \"away from?' Thus\177uing away from the population. Finally, statistical inference is based

  • 4 INTRODUCTION

    Population

    (a)

    Population known

    Sampleknown

    Sample?

    FIG. 1-1 Induction and deduction contrasted. (a) Induction (statistical inference).(b)Deduction (probability).

    if we study the simpler problem of deduction first. Specifically, in equation(1-1), we note that the inductive statement (that the population proportioncan be inferred from the sample proportion) is based on a prior deduction(that the sample proportion is likely to be closeto the population proportion).

    Chapters 2 through 5 are devoted to deduction. This involves, forexample, the study of probability, which is useful for its own sake, (e.g., in

  • Game TheCdealt with

    \"With a git'on target'questions o\177the argumeabout an ur

    HOW TO SAMPLE 5

    ry); but it is even more useful as the basis for statisticaI ir\177ductionChapters 7 through 10. In short, in the first 6 chapters we ask,

    ,\177npopulation, how will a sample behave? Will the sample be\"Only when this deductive issue is resolved can we move tostatistical inference. This involves, in the later chapters, turning

    tt around and asking \"}low precisely can we make inferencesknown population from an observedsample?\"

    1-3 WHY SAMPLE?

    We sa_\177ple , rather than studythree reasonis.(1) Littilted resources.

    (2) Lirrlited data available.(3) Des1. Limi

    preelection tbut this is n,

    the whole population, for any one of

    tructive testing.

    :ed resources almost always play some part. In our example of,olls,funds were not available to observe the wholepopulation;\177tthe only reason for sampling.

    2. Som,times there is only a small sample available, no matter what costmay be incm red. For example, an anthropologist may wish to test the theorythat the two!civilizations on islands .4 and B have developed independently,with their o\177n distinctive characteristics of weight, height, etc. But there isno way in \177hich he can compare the two civilizations in toro. Instead hemust make }\177 g inhabitantsa inference from the small sampleof the 50 survivinof island .4 \177tnd the 100 surviving inhabitants of island B. The sample sizeis fixed by nature, rather than by the researcher's budget.

    There a\177re many examples in business. An allegedly more eflqcientmachine may; be introduced for testing, with a view to the purchase of addi-tional simila? units. The manager of quality control simply cannot waitaround to ol}serve the entire population this machine will produce. Insteada sample ru4 must be observed, with the decision on efficiency based on aninference from this sample.

    3. SamP ing may involve destructive testing. For example,suppose wehave produc\177:d a thousand light bulbs and wish to know their average life.It would be blly to insist on observing the whole population of bulbsuntilthey burn ott.

    1-4 HOW

    In staffsdistinguish b \177

    SAMPLE

    ics, as in business or any other profession, it is essential to\177.tween bad luck and bad management. For example,suppose a

  • 6 INTRODUCTION

    man bets you s100 at even odds that you will get an ace (i.e., 1 dot) in rollinga die. You accept the challenge,roll an ace, and he wins. He's a badmanagerand you're a good one; he has merely overcome his bad management withextremely good luck. Your only defense against this combination is to gethim to keepplaying the game with your dice.

    If we now return to our original example of preelectionpolls,we notethat the sample proportion of Democratsmay badly misrepresent thepopulation proportion for either (or both) of these reasons. No matter howwell managed and designed our sampling procedure may be, we may beunlucky enough to turn up a Democraticsample from a Republican popula-tion. Equation (1-2)relatesto this case; it is assumed that the only complica-tion is the luck of the draw, and not mismanagement. From that equationwe confirm that the best defense against bad luck is to \"keep playing\"; byincreasing our sample size, we improve the reliability of our estimate.

    The other problem is that sampling can be badly mismanaged or biased.Forexample,in sampling a population of voters, it is a mistake to take theirnames from a phone book, since poor voters who often cannot affordtelephones are badly underrepresented.

    Other examples of biased samples are easy to find and often amusing.\"Straw polls\" of peopleon the street are often biased because the interviewertends to selectpeoplethat seem civil and well dressed; the surly worker orharassed mother is overlooked. A congressman can not rely on his mail asan unbiased sample of his constituency, for this is a sample of people withstrong opinions, and includes an inordinate number of cranksand membersof pressure groups.

    The simplest way to ensure an unbiased sample is to give each memberof the population an equal chance of being included in the sample. This, infact, is our definition of a \"random\" sample. 4 For a sample to be random,it cannot be chosen in a sloppy or haphazard way; it must be carefullydesigned.A sample of the first thousand people encountered on a New Yorkstreet corner will not be a random sample of the U.S. population. Instead,it is necessary to draw some of our sample from the West, some from theEast, and so on. Only if our sample is randomized, will it be free of bias and,equally important, only then will it satisfy the assumptions of probabilitytheory, and allow us to make scientificinferences of the form of (1-2).

    In somecircumstances, the only available sample will be a nonrandomone. While probability theory often cannot be strictly applied to such asample,it still may provide the basis for a goodeducated guess or what wemight term the art of inference.Although this art is very important, it cannotbe taught in an elementary text; we, therefore, consider only scientific4 Strictly speaking, this is called \"simple random sampling,\" to distinguish it from morecomplex types of random sampling.

  • inference

    for ensun

    FURTHI\177

    For readelrecommen,

    1. Hu2. Hu:

    Paper\177

    4.

    New 55. Slo\177

    FURTHER READINGS 7

    >ased on the assumption that samples are random. The techniques\177gthis are discussed further in Chapter 6.

    READINGS

    s who wish a more extensive introduction to Statistics, we highlythe following.

    \177,Darrell, \"How to Lie with Statistics.\" New York' Norton, 1954.F, Darrell, \"How to Take a Chance.\"New York' Norton, 1957.lis, W. A., and Roberts,H. V., \"The Nature of Statistics.\" Free Pressack, 1956.)onald, J., and Osborn, R.,\"Strategy in Poker, Business, and War.\"ork- Norton, 1950.tim, M. J., \"Sampling.\" Simon and Shuster Paperback, 1966.

  • chapter z

    Descriptive Statistics for Samples

    2-1 INTRODUCTION

    We have already discussed the primary purpose of statistics to makean inference to the whole population from a sample.As a preliminary step,the sample must be simplified, and reduced to a few descriptive numbers;each is called a samplestatistic.\177

    In the very simple example of Chapter 1, the pollster would record theanswers of the 1000 people in his sample, obtaining a sequencesuch asD D R D R .... where D and R represent Democrat and Republican. Thebest way of describing this sample by a single number is the statistic P, thesample proportion of Democrats;this will be used to make an inferenceabout rr, the population proportion. Admittedly, this statistic is trivial tocompute. In the sample of the previous chapter, computing the sampleproportion (.60) required only a count of the number voting Democrat(600), followed by a division by sample size, (n = 1,000).

    We now turn to the more substantial computations of statistics todescribe two other samples.

    (a) The results when a die is thrown 50 times.(b) The average height of a sample of 200 American men.

    2-2 FREQUENCYTABLES AND GRAPHS(a) Discrete Example

    Each time we toss the die, we record the number of dots X, which takeson the values 1, 2,..., 6.X is called a \"discrete\" random variable becauseit assumes only a finite (or countably infinite) number of values.x Later, we shall have to define a statistic more rigorously; but for now, this will suffice.

    8

  • The 50To sim

    in Table 2-2of times) th\177tosses. Forn

    is computed

    TABLE 2.

    (1)

    : Number of

    123456

    The infois graphed incan be similagraphs are idvertical scaleThis now givi

    ' FREQUENCY \" 9TA E

    BL 2-I Results of Tossing a Die 50 Times

    hrows yield a string of 50 numbers such as given in Table 2-1.lily, we keep a running tally of each of the six possibleoutcomesIn column 3 we note that 9 is the frequency f (or total number

    it we rolled a 1; i.e., we obtained this outcome on 9/50 of ourally, this proportion (.18) is called relative frequency (fin); itin column 4.

    Calculation of the Frequency, and Relative Frequency of theNumber of Dots in 50 Tosses of a Die

    (2) (3) (4)Dots

    Relative FrequencyTally Frequency (f) (fin)

    ]\"N4 Itll 9 .18\177 \177 ]1 12 .24]\"\177-I I 6 .12'[\"b[.l 111 8 .16D44 D-\275I .20F'\177 5 .10 \234\1773

    Ef=50 =n \177(f/n) = 1.00

    where \177 f is \"the sum of all f\"

    rmation in column 3 is called a \"frequency distribution',\" andFigure \" ' \337 \337 \337 , \3372-1. The relative frequency distribution ' in column 4

    .fly graphed; the student who does so will note that the two

    .\177ntical except for the vertical scale. Hence, a simplechange oftransforms Figure 2-1 into a relative frequency distribution.:s us an immediate picture of the sample result.

    (b) Continuo\177 .s Example

    Suppose that a sample of 200 men is drawn from a certain population,with the heig'\177t of each recorded in inches. The ultimate aim will be an in-ference aborn the average height of the whole population; but first we mustefficiently sur \177marize and describe our sample.

  • 10 DESCRIPTIVE STATISTICS

    FrequencyRelative

    frequency

    15

    10

    -30%

    O0 2 3 4 5 6Number of dots

    -20%

    10%

    1

    FIG. 2-1 Frequency and relative frequency distribution of the results of a sample of 50tossesof a die.

    In this example, height (in inches) is our random variable X. In thiscase, X is continuous; thus an individual's height might be any value, suchas 64.328 inches. 2 It no longer makes sense to talk about the frequency ofthis specific value of 2'; chancesarewe'llnever again observe anyone exactly64.328 inches tall. Instead we can tally the frequency of heights within a

    TABLE 2-3 Frequency, and Relative Frequency of the Heights of a Sampleof 200Men

    CellNO.

    (1) (2) (3) (4) (5)Relative

    Cell Cell Frequency, FrequencyBoundaries Midpt Tally f f/n

    1 55.5-58.5 572 58.5-61.5 603 61.5-64.5 634 665 696 . 727 758 789 81

    10 82.5-85.5 84

    2 .0107 .035

    22 .11013 .06544 .22036 .18032 .16013 .06521 .10510 .050

    E\234=200=, \177f/.----1.00

    2 We shall overlook the fact that although height is conceptually continuous, in practicethe measured height is rounded to a few decimal places at most, and is therefore discrete.

  • , . FREQUENCY

    class or cell (e.g., 58.5\" to 61.5\")as in column 3 of Table 2-3. Then thefrequency a d relative frequency are tabulated as before.

    The cells hav e been chosensomewhat arbitrarily, but with the followingconvenience in mind..

    1. Th e\177.tumber of cell s is a reasonablecompromise between too muchdetail and t\177,o little.

    2. Eack cell midpoint, which hereafter will represent all sampl e valuesin the cell, i a convenient whole number.

    The grc uPin g o f' the 200 obSerVations into cells is ilIustrated i n Figure2-2, where e iCh Observation is represented by a dot. For simplicity, we haveassumed tt the observations are recorded exactly, rather than being:

    .2 5

  • 12 DESCRIPTIVE STATISTICS

    In fact, there are two highly useful descriptions' the first is the central pointof the distribution and the second is its spread.

    2-3 CENTERS (MEASURES OF LOCATION)

    There are several different concepts of the \"center\" of a frequencydistribution. Three of these, the mode, the median, and the mean, arediscussed below. We shall start with the simplest.

    (a) The Modes

    This is defined as the most frequent value. In our example of heights,the mode is 69 inches, since this cell has the greatest frequency, or highestbar in Figure 2-3. Generally, the mode is not a good measure of centraltendency, since it often depends on the arbitrary grouping of the data.(The student will note that, by redefining cell boundaries, the mode can beshifted up or down considerably.)It is alsopossible to draw a sample wherethe largest frequency (highest bar in the group) occurs at two (or even more)heights; this unfortunate ambiguity is left unresolved, and the distributionis \"bimodal.\"

    (b) The Median

    This is the 50th percentile; i.e., the value below which half the valuesin the sample fall. Sinceit splits the observations into two halves, it is some-times called the middle value. In the sample of 200 shown in Figure 2-2,the median (say, 71.46)is most easily derived by reading off the 100th value4from the left; but if the only information availableis the frequency distribu-tion in Figure 2-3, it must be calculated choosing an appropriate valuewithin the median cell. 5

    a \"Mode\" means fashion, in French.4 Or 101st value. This ambiguity is best resolved by defining the median as the average ofthe 100thand 101st values. In a sample with an odd number of observations, this ambiguitydoes not arise.\177The median cell is clearly the 6th, since this leaves 44% (i.e.,88) of the sample valuesbelow and 38 % (i.e., 76) above.Themedian value can be closely approximated by movingthrough this median cell from left to right to pick up another 6\177o of the observations.Since this cell includes 18% of the observations, we move 6/18 of the way through thiscell. Thus our median approximation is 70.5+ (6/18 x 3) -- 71.5.

  • (c) The Me:iThis is i

    This is the \177

    CENTERS' \177 13

    n (.\177) \337

    ;ometimes Called the arithmetic mean, or simply the average.\177\370stc\370mm\370n Central meaSUre. The original observations

    , X,)'iare simply summed, then divided by n. Thus

    \177 \177 1 (X 1 + X2 + . . + X\177)

    Definition(2-1a)

    where Xi repr\177 ;ents the ith value of 2', and \177x= means \"equals, by definition.\"

    The average height of our sample could be computed by summing all200 observati:')ns and dividing by 200. However, this tedious calculation canbe greatly sirr:plified by using the grouped data in Table 2-3. Let fl representthe number (,f observations in cell 1, where each observation may be ap-proximated 6 !)y the cell midpoint, xx. Similar approximations hold for allthe other celli too, so that

    n'3t- Xl 'JF ' ' ' '3t- Xl) 3L (X2 JV' X2 +\177 \177 .\177l'l ii\177\177 \177 _Hi i ....

    fl times f\177times

    where \177 represents

    +f10 times

    approximate equality; it follows that

    + + \337'A00}

    A0\177 \177 .... XlO

    X \177 \177 x i (2-1b)i=

    \177geach observed value by the midpoint of its cell, we sometimes errmes negatively; but unless we are very unlucky, these errors will tendth e unluckiest case,however, the error must be smaller than half the cellcell midPoints are designated by the small xi, to distinguish the m from

    In genera]

    \177In approximatipositively, sometto cancel.Even irwidth. Note thatthe observed val\177

  • where (f\177/n)We numbe rformulationcalculationcolumn 3 ot

    each x value

    (d) CotnpariThese th

    show a distrfis the mirro]coincide. Bu\177

    (o)

    F1G. 2-4(a) Acoincide at the

    CENTERS \" 15

    = relative frequency in the ith cell, and m = number of cells.this equation (2-lb) to emphasize that it is the equivalent

    of (2'Ia), appropriatefor grouped data. In our example, theif (2-Ib)i s based on the data in Table 2-3, and is shbwn inTable 2-4. We can think of this as a \"weighted\" average, withweighted appropriately by its relative frequency.

    on of Mean, Median and Mode

    ree measuresofcenterare compared in Figure 2-4. In part a webUtio n which has a singlepeak and is symmetric (i.e., one halfimage of the other); in this case all three central measures

    when the distribution is skewed to the right as in b, the \177nediani

    ModeMedian

    Mean

    Mode 4, f XMedianMean

    ;ymmetric distribution with a single peak.The mode, median, an d mean)oint of symmetry. (b) A right-skewed distribution, showing mode sea class of 100students consists of several groups, in theproportions'

    Taking math

    Nottaking math

    Men Women

    17 38100 10023 22

    100 100

    student is chosen by lot to be class president, what is thee student will be:n?

    man ?g math ?n, or taking math ?n, and taking math ?classpresident in fact turned out to be a man, what is the

    \177the is taking math ? Not taking math ?

    .ed by arrows are important, because they introduce a later section in

  • 40 PROBABILITY

    3-12 The students of a certain school engage in various sports in thefollowing proportions:

    Football, 30 \177oof all students.Basketball, 20 \177o.Baseball, 20 %.Both football and basketball,5 %.Both football and baseball, 10\177o.Both basketball and baseball, 5 \177o.All three sports, 2 \177o.

    If a student is chosen by lot for an interview, what is the chancethat he will be:(a) An athlete (playing at least one sport).9(b) A football player only.9(c) A football player or a baseballplayer.9

    If an athlete is chosen by lot, what is the chance that he will be:(d) A football player only ?(e) A football player or a baseballplayer.9

    Hint. Use a Venn diagram.(f) Use your result in (a) to generalize (3-14).

    3-4 CONDITIONAL PROBABILITY

    Continuing with the experiment of fairly tossing 3 coins, suppose thatthe tossing is completed, and we are informed that there were fewer than\1772heads, i.e., that event G had occurred.Given this condition, what is theprobability that event ! (no heads) occurred? This is an example of \"con-ditional probability,\" and is denoted as Pr (I/G),or \"the probability of I,given G.\"

    The problem may be solved by keeping in mind that our relevantoutcome set is reduced to G. From Figure 3-5 it is evident that Pr (I/G) = 1/4.

    The second illustration in this figure shows the conditional probabilityof H (all coinsthe same), given G (less than 2 heads). Our knowledge of Gmeans that the only relevant part of H isH \177 G (\"no heads\" = I) and thusPr (H/G)= 1/4. This example is immediately recognizedas equivalent tothe preceding one; we are just asking the samequestion in two different ways.

    Suppose Pr (G),Pr (H), and Pr (G \177 H) have already been computed forthe original sample space $. It may be convenient to have a formula forPr (H/G) in terms of them. We therefore turn to the definition (3-1) ofprobability as relative frequency. We imagine repeating the experiment ntimes, with G occurring n(G) times, of which H also occurs n(H \177 G) times.

  • Knowledge thathas occurred mthis original sarspace S irreJev,

    kes

    pie---->tit.

    G, which nowbecomes the ne\177t\177sample space, !

    I; this event includesone of four equi-\177probable outcom\177

    in sample space

    Thus Pr(I/G)

    FIG. 3-5 Ve\177

    The ratio is

    On divi

    This formula

    multiplying t

    PROBLEMS ]

    \337e 1

    \337e2

    \337e3

    ' e6

    \337e7

    (a)

    CONDITIONAL PROBABILITY

    G, which becomesnew sample space.

    { 'el '\",,,

    \337e2 \\

    tn diagrams to illustrate conditional probability.Note Pr(H/G) is identical to Pr(I/G).

    Knowledge that Ghas occurred makes

    \177__theoriginal samplespace S (includingoutcome e\177in H)irrelevant.

    :--H

    H 13 G, the onlyrelevant part of H,

    (a) Pr(I/G). (b) Pr(H/G).

    Lhe conditional relative frequency, and in the limit

    Pr (H/G) .x lim n(H (h G)n(G)

    ting numerator and denominator by n, we obtain

    obtainedis oftenPr (G)

    41

    Pr (H/G) = Iim n(H (h G)/n,\177 n(G)/n

    Pr (H/O)= Pr (H rh O)tPr (G)

    used in a slightly different

    /-Pr (H ch G) = Pr (G) Pr (H/G)

    form,

    (3-21)

    (3-22)

    by cross

    (3-23)

    (3-13) Flip 3follow:

    2 In this sectionprobabilities. Tt

    coins over and over again, recording your results as in theng table.

    ind the next, we shall assume all events under consideration have nonzerois permits us to divide legitimately by various probabilities at will.

    :

  • 42 PROBABILITY

    Trial Accumulated If G Occurs, AccumulatedNumber G Frequency Then H Also Frequency

    n Occurs ? n(G) Occurs? n(H \177 G)

    ConditionalRelative

    Frequencyn(H \177G)/n(G)

    1 No 02 Yes 1 Yes 1 1.003 No 14 Yes 2 No 1 .505 Yes 3 Yes 2 .67

    After 50 trials, is the relative frequencyn(H H G)/n(G) close tothe probability calculated theoretically in the previous section? (Ifnot, it is because of insufficient trials, so poolthe data from the wholeclass.)

    3-14 Usingthe unfair coins and definitions of Problem 3-7,calculate(a) Pr (G/H)(b) Pr (H/G)(c) Pr (K/L)(d) Pr (R/L)

    3-15(a) A consumer may buy brand X or brand Y but not both. Theprobability of buying brand X is .06, and brand Y is .15.Given thatthe consumer bought either X or Y, what is the probability that hebought brand X?(b) If events A and B are mutually exclusive (and of course non-empty, i.e., include at least one possible outcome), is it always truethat

    Pr (A/A L; B) = [Pr (A)]/[Pr (A) + Pr (B)]?3-16 A bowl contains 3 red chips (numbered Rx, R2, Rs) and 2 white chips

    (numbered Wx, W2). A sample of 2 chipsis drawn, one after the other.List the sample space. For each of the following events, diagram thesubset of outcomes included and find its probability.(a) Second chip is red.(b) First chip is red.(c) Second chip is red,given the first chip is red.(d) First chip is red, given the second chip is red.(e) Both chips are red.

    Then note the following features, which are perhaps intuitivelyobvious also'

  • CONDITIONAL PROBABILITY \177 43

    (I) \177he answers to (a) and (b) agree, asdo the answers to (c) and (d).(2) \177:h \370w that the answer to (e) can be found altern\177atively b3, appl3,ing(3-2'-) to parts (b) and (\275).(3) !Xtension of part (2)' if 3 chipsare drawn what is the probability

    that i i113 are red ? Can 3,ou now generalizeTheorem (3-23)?(3-17) Two \177cards are drawn from an ordinary deck. What is the probabiIit3,that: !

    (a) l'he3,are both aces?(b) \177ljhey are the two black aces ?(c) \177he3, are both honor cards (ace, king, queen, jack or ten)?3-18A p\370\177er hand (5 cards)'

    \177sdrawn from an ordinary 'deck of cards.Wha i is the chance of drawing, in order,(a) 2iaces , then 3 kings?(b) 21aces, then 2 kings, finall queen ?(c) 4iaces,then a king ?What?is the chance of drawing, in any order whatsoever,(d) 4 acesand a king?(e) 4 aces?

    (f) '5Vour of a kind\" (i.e., 4 aces, or 4 kings,or 4 jacks, etc.)?If the 5 cards are drawn with replacement (i.e., each card is replacedin the deck before drawing the next card, so that it is no longer a realpokerdeal),what is the probability of drawing, in any order,(g) E\177:actly 4 aces?

    3-19 A sup\"ply of 10 light bulbs contains 2 defective bulbs. If the bulbsare picked up in random order, what is the chance that(a) T\177e first two bulbs are good ?(b) T15e first defective bulb was picked 6th ?(c) The first defective bulb was not picked until the 9th ?

    => 3-20 Two diceare throw n. LetE: firs die is 5F: tot\177 I is 7G' totall is 10

    Compute the relevant probabilities using Venn diagrams. Show that:(a) Pr (F/E)= Pr (F).(b) Pr (G/E) \177 Pr (G).

    (c) Is it true that Pr (E/F)= Pr (E)? Do you think this is closelyrelated to (a), Or just an accident ?

    3-21 If E a\177d F are any 2 mutually exclusive events (and both are non-empty, .of course), what can be said about Pr (E/F)

    3-22A corrpany employs 100 persons--75 men and 25 women. Theaccoun

    ng department provides jobs for 12\177o of the men and 20\177

  • 44 PROBABILITY

    of the women. If a name is chosen at random from the accountingdepartment, what is the probability that it is a man? That it is awoman?

    => 3-23 (Bayes' Theorem).In a population of workers, suppose 40% aregrade schoolgraduates, 50 % are high school graduates, and 10% arecollege graduates. Among the grade school graduates,10%are un-employed, among the high school graduates, 5\177o are unemployed,and among the college graduates 2 % are unemployed.

    If a worker is chosen at random and found to be unemployed,what is the probability that he is(a) A grade school graduate?(b) A high school graduate ?(c) A college graduate9.(This problem is important as an introduction to Chapter 15; thereforeits answer is given in full.)Answer. Think of probability as proportion of the population, if youlike.

    Classesof WorkersC 1 C2 C3

    Old sample space= pop ulationof workers

    !2///////// ,,,,,,,,,,,

    Pr (E rh Ci) = \177j ....... :..... -/Pr (E/C\177)Pr(C\177' .040 / .025 .002

    In the new sample space shaded, (3-22) gives,040

    (a) Pr (C\177/E) --' -- .597.067

    Effect E (unemployment) isthe new sample space,

    shaded.Pr (E) = .067.

    er(E)=\177er(Erh

    = .067

    .025 '(b) Pr (C2/E)-- .067-- .373

    .002

    (c) Pr (Ca/E) -- .067 -- .030check, sum -- 1 1.000

  • INDEPENDENCE 45

    No!es on Bayes' Theorem. Problem 3-.23 is an example of Bayes'Theoren\177, which may be stated as Follows'

    Certain \"causes\"(education levels) Q, Ca,... C,,, have .prior.probab/l;tiesPr (Ci). In a sense the causes produce an \"effect\" E(unemP oyment) not with certainty, but with conditional probabilities

    Pr (E/Cly Using conditional probability manipulations, one calculateseventual the probability of a cause given the effect,Pr (Ca/E)\337Given Deduced

    Pr (E/C\177)J-+Pr (C,/E)\177 3-24 In a cd

    rtain country it rains 40 % of the days and shines60 \177 of thedays. A barometer manufacturer, in testing his instrument in the lab,has four d \177at it sometimes errs: on rainy days it erroneously predicts'\177shine\" 10 \177 of the time, and on shiny days it erroneously predicts\"rain\" \1770\177 of the time.

    (a) In p\177'edicting tomorrow's weather before looking at the barometer,the (prk,r) chance of rain is 40 \177. After looking at the barometer andseeingit predict \"rain,\" what is the (posterior) chance of rain ?(b) Wh\177,t is the posterior chance of rain if an improved barometer(error r\177.tes of 10 and 20\177 respectively) predicts \"rain\"?(c) Wh\177t is the posterior chance of shine if the improved barometerpredicts \"rain\" ?

    3-5 INDEPE] \177DENCE

    In Proble \177 3-20 we noticed that Pr (F/E) = Pr (F).Tfiis means that thechance of F, khowing E, is exactly the same as the chance of F, without

    knowing E; or)knowledge of E does not change the \177robability of F at all.It seems reasonable,therefore, to call F statisticall), independent of E. Infact, this is the :basis for the general definition:

    Definition.

    Of course,

    say that G wa\177changes the pr o

    An event F is called statisticall), independent

    of an event E if Pr (F/E) = Pr (F)](3-24)

    n the case of events G and E, where !'(G/E) v\177P(G), we wouldstatistically de\234endent on E. In this case, knowledge of E

    bability of G.

  • 46 PROBABILITY

    We now develop the consequences of F being independent of E. Sub-stituting (3-22) in (3-24), we obtain

    Pr (F (h E) = Pr (F)Pr (E)

    hencePr (Fm E) = Pr (F) Pr (E) (3-25)

    We can reverse this argument, and work backwardsfrom (3-25) asfollows:

    Pr (F (h E) __ Pr (E)Pr (F)

    [Pr(E/F)=Pr(E) (3-26)That is, E is independent of F whenever F is independent of E. In otherwords,the result in Problem 3-20(c) above was no accident.In view of thissymmetry, we may henceforth simply state that E and F are statisticallyindependent of each other, whenever any of the three logically equivalentstatements (3-24),(3-25),or (3-26) is true. Usually, statement (3-25) isthe preferred form, in view of its symmetry. Sometimes, in fact, this \"multi-plication formula\" is taken as the definition of statistical independence.But this is just a matter of taste.

    Notice that so far we have insistedon the phrase \"statistical inde-pendence,\" in order to distinguish it from other forms of independencephilosophical, logical, or whatever. For example,we might be tempted tosay that in our dice problem, F was \"somehow\" dependent on E because thetotal of the two tosses depends on the first die. This vague notion of depend-enceis of no use to the statistician, and will be considered no further. Butlet it serve as a warning that statistical independence is a very precise concept,defined by (3-24), (3-25), or (3-26) above.

    Now that we clearly understand statistical independence, and agree thatthis is the only kind of independencewe shall consider, we shall run no riskof confusion if we are lazy and drop the word \"statistical.\"

    Our results so far are summarized as follows:

    Pr (E t3 F) Pr (E m F)

    General Theorem = Pr (E) + Pr (F) - Pr (Em F) = Pr (F). Pr (E/F)SpecialCase = Pr (E) + Pr (F)

    if E and F mutuallyexclusive; i.e.,

    if Pr (E rh F) = 0

    = Pr (F). Pr (E)if E and F are

    independent; i.e.,if Pr (E/F) = Pr (E)

  • PROBLEMS

    3-25 Three co

    3-26

    3-27 A certainon or off

    INDEPENDENCE 47

    ns are fairly tossed.

    E\177' first two coins are heads;Es' last coin is a head;Ea' all three coins are heads.

    Try to a\177.swer the following questions intuitively (does knowledge ofthe condition affect your betting odds?). Then verify by drawing thesample s[,aceand calculating the relevant probabilities for (3-24)(a) Are 4?\177and E2 independent ?(b) Are \177 and E,\177independent ?Repeat Problem 3-25 t\177sing the three unfair coins whose sample Space\177sas folh ws (compare Problem 3-7).

    This

    off 30 pe:(a) Is \"b'(b) Is \"bi

    3-28 A single

    e

    .(H H H)(H H T)(H T H)

    \337(H T T), 7(Tn H).(T H T)

    (T T H)(T TT)'

    t5

    10

    10151510

    .10

    .15

    electronicmechanism has 2 bulbs which have been observedwith the following long-run relative frequencies:

    Bulb.1'1'l\177BUlb 2 On Offon i .15 .45Off .10 .30

    :able '\177means, for example, tt\177at both bulbs were simultaneously

    cent of the time.fib I on\" independent of \"bulb 2 on\"? \177fib I off\" independent of \"bulb 2 on\"?ard is drawn from a deck of cards, and let

    E: it is an aceF' it is a heart.

  • 48 PROBABILITY

    Are E and F independent, when we use(a) An ordinary 52-card deck.(b) An ordinary deck, with all the spades deleted.(c) An ordinary deck, with all the spades from 2 to 9 deleted.

    3-6 OTHER VIEWS OF PROBABILITY

    In Section 3-1 we defined probability as the limit of relative frequency.There are several other approaches, including symmetric probability,axiometric probability, and subjectiveprobability.

    (a) Symmetric ProbabilityThe physical symmetry of a fair die assures us that all six of its outcomes

    are equally probable. Thus

    Pr (e\177) = Pr (e2) -- '\" = Pr (e0)In order that these six probabilities sum to one, each must be 1/6,

    (compareto (3-5)).In general, for an experiment having N equally likely outcomes or

    points, for each point ej1

    Pr (es) = --N

    Then, for an event E consistingof NE points, the probability is givenby (3-9) as

    where the summation \177 extends only over points es in E (NE in number).Thus, for equally probable outcomes

    Pr (E) - NEN

    (3-27)

    For example, in rolling a fair die consider the eventE: number of dots is an even number.

    E consists of three of the six equiprobable elementaryoutcomes (2, 4, or 6dots);thus its probability is 3/6.

    Symmetric probability theory begins with (3-27) as the definition ofprobability, and gives a simpler development than our earlier relative

  • OTHER VIEWS OF PROBABILITY

    frequency ap[roach. However, our earlier analysis was more general;although the exampleswe cited often involved equiprobable outcomes\177, thetheory we der:loped was in no way limited to such cases. In reviewing it,you should c\177nfirm that it may be applied whether or not outcomes areeouiDrobable;!special attention should be given to those cases (e.g., Problem3-\17763 where o\177.tcomes we.re not equiprobable.

    Not only :is symmetric probability limited because it lacks generality; italso has a major philosophical weakness. Note how the definition of proba-

    ' I). ,, \" guilty ofbility in (3-2 revolves the phrase equally probable ; we arecircular reasolqing. :'

    Our own!relative frequency approach to probability suffers from thesame philoso\177lhical weakness. We might ask what sort of limit is meant inequation (3-1 ? It is logicallypossiblethat the relative frequency n\177/n behavesbadly, even i\177 the limit; for example, no matter how often we tossa die,itis just conce/ \177able that the ace will keep turning up every time, makinglira n\177/n -- 1. Fherefore, we should qualify equation (3-1) by stating that thelimit occurs \177ith high ?robabilitF, not logical certainty. In using the concept!.of probability in the definition c(f probability, we are again guilty of circularreasoning. I ,-

    (b) Axiomaft: Objective Probability

    The onI,\177 philosophically sound approach, in fact, is an abstract axio-matic appro\177.ch. In a simplified version, the following properties are takenas axioms'

    Axioms .....

    > 0 (3-2) repeatedPr (e\177) _Pr (ex) + Pr (es)''' + Pr (e 5) = 1 (3-4) repeatedPr (E) = \177 Pr (e0 (3-9) repeated

    Then the ol her properties, such as (3-1),(3-3), and (3-20) are theoremsderived fro_\177 these axioms .with axioms and theorems together comprisinga system of analysis that appropriately describes probability situations suchas d\177e toss\177n\177g, etc.

    Equatiqn (3-1) is. particularly important, and is known as the la,v oflarge numbers. Equations (3-3) and (3-20) may be proved very easily, soeasily in fa\275 that we shall give the proof to illustrate how nicely this axio-matic theorevent E,

    can be developed. We can prove even stronger results' for any

  • 50

    Theorems.

    PROBABILITY

    0 _< Pr (E)Pr (E) _< 1Pt(E) = 1 -Pr(E)

    (3-28), like (3-2)(3-29), like (3-3)(3-30),repeating

    (3-20)Proof According to axioms (3-9) and (3-2), Pr (E) is the sum of positive

    terms, and is therefore positive; thus (3-28) is proved.To prove (3-30),we write out axiom (3-4)'

    Pr (e0 + Pr (e2) + \". + Pr (e,v) = 1Terms for E Terms for \177

    According to (3-9), this is justPr (E)+ Pr (E)= 1 (3-31)

    from which (3-30) follows.

    In (3-28) we proved that every probability is positive or zero. In particu-lar Pr (E) is positive or zero; substituting this into (3-31) ensures that'

    Pr (E) _< 1 (3-29) proved.Thus our above theorems are established;other theorems may similarlybe derived.

    (c) Subjective Probability

    Sometimes called personal probability, this is an attempt to deal withevents that cannot be repeated, even conceptually, and hence cannot begiven any frequency interpretation. For example,considerevents such asan increase in the stock market average tomorrow, or the overthrow of acertain government within the next month. These events are describedby thelayman as \"likely\" or \"unlikely,\" even though there is no hopeof estimatingthis by observing their relative frequency. Nevertheless, their likelihoodvitally influences policy decisions, and as a consequencemust be estimatedin some rough-and-ready way. It is only then that decisions can be madeon what risks are worth taking.

    To answer this practical need, an axiomatic theory of personalproba-bility has been developed. Roughly speaking, personal probability is definedby the odds one would give in betting on an event; we shall find this a usefulconcept later in decision theory (Chapter 15).

  • Review Probl\177

    3-29 A tetrah

    given tht(a) Pr (\177(b) Pr (\177(c) Pr (e(d) Pr (e

    3-30 In a fam

    \177lllS

    OTHER VIEWS OF PROBABILITY

    ',dral (four-sided) die has been loaded. Find Pr (e4) if possible,following conditions. (If the problem is impossible, staf'e so.)

    1) = .2; Pr (e2) =.4; Pr (ea) = .1) = .4;Pr (e2) = .4; Pr (ea) = .3) = .6;Pr(ea) = .2) = .7; Pr (e2) -- .5

    ly of 3 children, what is the chance of(a) At !eiast one boy ?(b) At Ielast 2 boys 9(c) At !e\177ast 2 boys, given at least one boy?(d) At lepst 2 boys, given that the eldest is a boy?

    3-31 Suppose}hat the last 3 customers out of a restaurant all lose theirhat-checks, so that the girl has to hand back their 3 hats in randomorder. W \177at is the probability(a) That no man will get the right hat?(b) That exactly .1 man will?(c) That exactly2 men will ?(d) That all 3 men will?

    3-32 What is t \177eprobability that(a) 3 peo >lepicked at random have different birthdays ?(b) A rot mful of 30 people all have different birthdays ?(c) In a oomful of 30 people there is at least one p\177tir with the samebirthday

    3-33 A bag co atains a thousand coins, one of which has heads on bothsides. A c )in is drawn at random. What is the probability that it is theloaded coin, if it is flipped and turns up heads without f\177il(a) 3 times in a row(b) 10times in a row(c) 20 times in a row.

    3-34 Repeat Pr \177blem 3-33 when the loaded coin in the bag has both H andT faces, b' it is biased so that the probability of H is 3/4.

  • chapter 4

    Random P'ariables andTheir Distributions

    4-1 DISCRETE RANDOM VARIABLES

    Again consider the experiment of fairly tossing 3 coins. Suppose thatour only interest is the total number of heads.This is an example of a randomvariable or variate and is customarilydenotedby a capital letter thus:

    X-- the total number of heads (4-1)The possible values of X are 0, 1, 2, 3; however, they are not equally

    likely. To find what the probabilities are, it is necessary to examine the originalsample space in Figure 4-1. Thus, for example, the event \"two heads\"(X -- 2) consists of 3 of the 8 equiprobable outcomes; hence its probabilityis 3/8.Similarly, the probability of each of the other events is computed.Thus in Figure 4-1 we obtain the probability function of X.

    The mathematical definition of a random variable is \"a numerical-valued function defined over a sample space.\"But for our purposes we canbe lessabstract; it is sufficient to observe that:

    A discrete random variable takes on various values (4-2)with probabilities specified in its probability function.

    In our specific example, the random variable X (number of heads) takes onthe values 0, 1, 2, 3, with probabilities specified by the probability functionin Figure 4-lb. x

    x Although the intuitive definition (4-2) will serve our purposes well enough, it is not alwaysas satisfactory as the more rigorous mathematical definition which stresses the randomvariable's relation to the original sample space.Thus, for example, in tossing 3 coins,the random variable Y -- total number of tails, is seen to be a different random variablefrom X = total number of heads. Yet X and Y have the same 1)robability distribution,(cont 'd)

    52

  • 1DISCRETE

    Pr(e) \275

    o (T,T,T) --.\337(T,T,H) ---.\337(T,H,T) ----\337(T,H,H) ----\337(H,T,T) --\337(H,T,H) \177\337(H, H, T)\337(H,H,H) f

    Oldsample space

    (a)

    RANDOM VARIABLES i\177 53

    p(x)

    o I 2

    (b)

    x p(x)

    \3370 \177\3371 \177\3372 \177\3373 \177

    N ew , smallersample space

    of theFIG. 4-1 (a) X the random variable \"number of heads in three tosses.\"(b)Grapi:\177probability fUnctiOn.

    In the ge teral case of defining a probability function, as in Figure4-2,we begin by c\177nsidering in the original sample space events such as (X = 0),(X = 1),... jin general (J\177 = x); (note that capital J\177represents the randomvariable, and Ismall x a specific value it lnay take). For these events we calculate the probabfi\177tzes and denote- them \234(0), \234(1),., .\234(x) .... Thisprobability fu \177ction \234(x) may be presented equally well in any of 3ways:

    1. Table2. Graph

    f'orm, as in Figure 4-1a.form, as in Figure 4-lb.

    and anyone whowere the same ra!

    probability funcl\177This notation,Thus, for exampithat the numberspondingly abbn

    used the loose definition (4-2) might be deceived into thinking thht theyedomvariable. In conclusion, there is more to a random variable than itsion.

    [ke any other, may be regarded simply as an abbreviation for convenience.e,/2(3) is short for Pr (X = 3),which in turn is short for \"the probability>f heads is three.\" Note that when A' = 3 is abbreviated to 3, Pr is corre-viated to ?.

  • 54 RANDOM VARIABLES

    Pr(el)Pt(e2)

    Pr(e)

    New, smaller01d outcome set set of numbers

    FIG. 4-2 A general random variable X as a mapping of the original outcome set onto acondensed set of numbers. (The set of numbers illustrated is 0, 1,2,..., the set of positiveintegers. We really ought to be more general, however, allowing both negative values andfractional (or even irrational) values as well. Thus our notation, strictly speaking, should

    be x\177,x 2, ... , x\177,... rather than O, 1, 2,... , x, .... )

    a complicated sample space (outcome set) is reducedto a much smaller,numerical sample space. The original sample spaceis introduced to enableus to calculate the probability function ?(x) for the new space; having servedits purpose, the old unwieldy space is then forgotten. The interesting questionscan be answered very easily in the new space. For example,referring toFigure 4-3, what is the probability of 1 head or fewer ? We simply add up therelevant probabilities in the new sample space

    Pr (X < 1) p(0) + p(1) -- \1774- a _ \177 (4-2)

    FIG. 4-3

    \337(H, T, H)\337(H, H, T)

    X

    \177 p(x)

    '2'3

    Prob = \177

    The event X

  • The answe\177sample spa,

    1

    EXAMPL E

    In the lchanges in tis 1, becaus,

    FIG.4-4 The

    DISCRETE RANDOM VARiABLEs

    could have been found, but with more trouble, in the original

    ;ame experiment of 3 fair tosses of a coin, let Y = number of\177esequence. For example, for the sequence HTT, the value of Y

    there is one changeover from H to T. In Figure 4-4 we use the

    \337(H, H, T)\177-. \177 V

    \337(H, T, H)

    (T, r. H).

    random variable Y (\"number of changes in sequence of 3 tosses of a coin\and its probability distribution.

    technique de,'eloped above to define this random variable and its probabilityfunction p(y)

    !-

    PROBLEMs

    In each (by first constr

    4-1 In 4 fair(a) 2' =(b) Y=

    4-2 Let 2' be4-3 Two box(

    paper arebetween t]

    .\177se, tabulate the probability function of the random variable,ucting a sample space of the experimental outcomes.ossesof a coin,letlumber of heads.

    lumber of changes in sequence.he total number of dots showing when two fair dice are tossed.s each contain 6 slips of paper numbered 1 to 6. Two slips ofdrawn, one from each of the boxes. Let 2' be the difference\177enumbers drawn (absolute value).

  • 56 RANDOM VARIABLES

    =::> 4-4 To review Chapter 2, consider the experiment of tossing 3 coins; thenumber of headsX may be 0, 1,2, or 3. Repeat this experiment 50 times ato obtain 50 values of X, so that you can(a)O)(c)(e) If the experiment were repeated millions'oftimes1. The relative frequencies tend ?

    Construct a relative frequency table of X.Graph this relative frequency table.Calculate the sample \177 from (2-lb).Calculate the mean squared deviation from (2-5b).

    , to what value would

    2. \177 tend ?3. MSD (mean squared deviation) tend?4. s 2 tend?

    4-2 MEAN AND VARIANCE

    In Chapter 3 we definedprobability as limiting relative frequency. Nowwe notice the close relation between the relative frequencytable observed inProblem 4-4 and the probability table calculated in Figure 4-1, for tossing3 coins. If the sample size were increased without limit, (i.e., if we continuedto toss ad infinitum), the relative frequency table would settle down tothe probability table.

    From the relative frequency table (Problem4-4),we calculated the mean\177 and variance s 2 of our sam?le4.It is natural to calculate analogous popula-tion values from the probability table, and call them the mean \177and variancerr 2 of the probability distribution ?(x), or of the random variable X itself.Thus

    Population mean, 1, & \177 xp(x)

    Population variance, o2 & \177 (x - t,)27(x)

    (4-3)cf. (2-lb)(4-4) cf. (2-5b)

    a Or simulate this by consulting the random numbers in Appendix Table IIa, (with aneven number representing a head, and an odd number a tail); or elseuse the authors'results, as follows' 03220 1123211221 22213 13332

    12212 12121 11233 21112 11213

    4 Strictly speaking, we calculated the mean squared deviation, rather than s2. However, asn --\177co, they become practically indistinguishable.

  • MEAN AND VARIANCE

    For our e: :ample of tossingthreecoins,we compute/\177 and cr 2 in Tabl\177 4 = 1. 5Note the anal \177gy here to our calculations in Table 2-4.

    We call fi\177the \"population mean,\" since it is based on the populatlon ofall possible tdsses of the coins. On the other hand, the mean ,\177 is called a

    sample mean, s\177nce it is based on a mere sample of tosses drawn fro m th eparent popul a :ion of all possible tosses. similarly o 2 and s 2 represent p 0pula-tion and samp\177le variance, respectively. A clear distinctionbetweenpopulationand sample Vhlues is crucial; we return to this point in Chapters 6 and 7.

    TABLE \1774-1

    Given

    ProbabiJityFunction

    1/83/83/81/8

    Since th definitions of/\177 and a ?' parallel those of ,\177 and s 2, We findparallel inter I tretations. We continue to think of the mean/\177 as a weightedaverage, usi\177 g probability weights rather than relative frequency Weights.The mean is dsoa fulcrum and center. The standard deviation is a measureofspread.5 The computa ion of 09' is often simplified by using: \177

    I a2= \177 a\1772?(x) _ t, 9. (4-5)This formula, Mth its proof, is analogous to (2-7).The computation is illustrat\177g in the

    * 1last column ofjTable 4- .

    Proof thati(4-5) is equivalent to (4-4).Reexpress(4-4)as'

    ia\177lCalculation of the Mean and Variance of a Random Var eEasier CalcUlationCalculation

    of t* from of a2, Using(4-3) Calculation of cr2 from (4-4) (4-5)

    x p(x) (x - \177) (x - \177)2 (\177,-/)2 ?(\177,) \177,.pC)0 --3/2 9/4 9/32 0

    3/8 --1/2 1/4 3/32 3/86/8 +1/2 1/4 3/32 12]83/8 -t-3/2 9/4 9/32 9/8

    /\177= 12/8 02= 24/32 x\177?(x) = 24/8= 1.50 -- .75 /1\177= 18/8

    o2 = 6/8

    ?(1)(4-5) proved

    .2= \177 (x2 _ 2,uz + \1772)?(x)

    and noting th!t., is a constant:= \177 \177p\275) - 2.\177 x p(\177) + \177'

    Since \177 x?(x\177 =/\177 and \177?(x) = 1, we have\1772= \177 x\177p(x) - 2\177(/\177)+

    = \177 \177,\177?(\177)_ y2

  • 58 RANDOM VARIABLES

    When a random variable is linearly transformed, the new mean andvariance behave in exactly the same way as when sample observations weretransformed in Section 2-5 (the proof is quite analogous and is left as anexercise). For future reference, we state these results in Table 4-2.We could write out verbally all the information in this table, working acrossthe rows, as Follows:

    TABLE 4-2 Linear Transformation (Y) of a Random Variable (X)Random Variable Mean Variance Standard Deviation

    \"Consider the random variable X, with mean 3tx and varianceIf wedefinea new random variable Y as a linear function of X (specificallyY = a + bX), then the mean of Y will be a + b,ux, and its variance

    will be b2rr\177r.\"

    PROBLEMS

    4-5 Compute 3t and o2 for the probability distributions in Problem 4-1.As a check, compute a \177'in 2 ways from the definition (4-4), and fromthe easy formula (4-5).

    (4-6) Compute 3t and o 2 for the random variables of(a) Problem 4-2.(b) Problem 4-3.

    4-7 Letting X = the number of dots rolledon a fair die, find 3tx and ax.If Y = 2X + 4, calculate3tr and ay in 2 ways:(a) By tabulating the probability function of Y, then using (4-3) and(4-5).(b) By Table 4-2.

    (4-8) A bowl contains tags numbered from 1 to 10. There are ten 10's, nine9's, etc. Let I' denote the number on a tag drawn at random.(a) Make a table of its probability function.(b) Find 3tx and ax.

    4-9 A student is given 4 questions, each with a choice of 3 answers. LetXbe the number of correct answers when the student has to guesseachanswer. Compute the probability function and the mean and varianceof X.

  • BINOMIAL DISTRIBUTION 59

    :c>4-10 Let i' be a random variable with mean # and standard deviationWhat }tre the mean and standard deviation of Z, where Z -- 'Y --/\177(This \177ntroduces section 4-5.) a

    4-11Suppolethat the whole population of American families yields thefollow ng table for family sizeo (For simplicity, the data is \177Slightlyalterec by truncating at 6.)

    No. Chi

    Proport\177of famili

    Source. St\177

    dren 0 1 2 3 4 5

    3n\33743 .18 .17 .11 .06 .03 \17702

    itistical Abstract of U.S., 1963,p. 41.

    (a) Le X be the number of children in a family selected at random.(This election may be done by lots: imagine each familY' beingrepres\177 nted on a slip of paper, the slips well mixed, and then one slipdrawn ) The probability fufiction of X is given in the table, of course.Find p x and rr x.(b) N\177,w let a child be selected at random (rather than a famil'y), andlet Y ,e the number of children in his family. (This selection may bedone ,y a teacher, for example, who picks a child by lot from theregist\177 of children.) What are the possible valuesof Y? Completetheproba[ ility table, and compute #r and ar. \337(c) Is \177x or/\177r more properly called the \"average family size'\"'?

    L

    4-3 BINOl\177

    There aone- the binfor the prob\177

    The clas

    In order to gin either \"su,Then the tovariable.

    There a\177are listed in T

    [IAL DISTRIBUTION

    many types of discrete random variables. We shall study>mial as an example of how a general formula can be developedLbility function ?(x).sical example of a binomial variable is

    X = number of headsin n tosses of a coin

    :neralize,we shall speak of n independent \"trials,\" each resulting',cess\" or \"failure,\" with respective probabilities rr and (1 - rr).

    number of successes ,Y is defined as a binomial random

    e many practical random variables of this type, some ot ?which\177ble 4-3. We shall now derive a simpleformula for the probability

  • 60 RANDOM VARIABLES

    TaBLe 4-3 Examples of Binomial Variables\"Trial\" \"Success\" \"Failure\" rr n X

    Tossinga fair coin Head Tail 1/2 n tosses Number ofheads

    Birth of a child Boy Girl Practically Family size Number of' 1/2 boys in

    family

    Throwing 2 dice 7 dots Anything 6/36 \177throws Number ofelse ' sevens

    Drawing a voter in Democrat Republican Proportion of Sample size Number ofa poll Democrats Democrats

    in the in the sampkpopulation

    the history of one Decay No change Very small Very large, Number ofatom which may the number radioactiveradioactively of atoms in decaysdecay during a the samplecertain time period

    function \234(x). First, consider the special case in which we compute theprobability of getting 3 heads in tossing 5 coins (Figure 4-5a).Each point inour outcome set is representedasa sequenceof five of the letters S (success)and F (failure).We concentrate on the event three heads (X-- 3), and showall outcomes that comprise this event. In each of theseoutcomes S appears

    \337(sssss)\337(SSSSF)\337(SSSFS)

    \337(FFSSS)

    \337(FFFFF)x.. j

    FIG. 4-5

    Outcome set ,

    5 trials n trials

    X=3

    Event

    Computing binomial probability.

    (sss ...........ss)(sss...........SF)

    (SSS --SSFF... F) \177

    times n-x times

    \337(FF .............. F)x.. j

    (a) Specialcase' 3 heads in 5 tosses of acoin. (b)General case: x successes in n trials.

  • three times

    The prol:SSSFFis

    BINOMIAL DISTRiBUTiON 61

    and F twice. Since the probability of S is rr and F is (1 --ability of the sequence In general, the probability of

    SS... S FF... Ftimes n -- x times

    i is ,\177.,,..., (\177 ,. \177). (\177 _ ,,)...this ' ' ' = rr\370:(1 -- rr)\177-x

    multipI\177catlon being justified by the independence of the trials. Wefurther note:that any outcome in this event has the same probability. Forexample,th e probability of SFSSF is

    The same fa cNow we

    are include dthree S's and

    and is \370

    \177r'(1 -- =). =. = .(\177 _ \177r) = \177r\177(\177-- =)\177tors appear; they are onIy ordered differentIy.

    only have to determine how many such sequences (outcomes)n this evento This is precisely the number of ways in which thetwo F's can bearranged.This number of ways is designated as

    =10

    Our event

    includes

    3 = 10outcomes, eacl\177 with a probability

    \177 7T tg \177 1312\177(\177 )i - (\177) (\177) = \275\177

    Hence its probability is:

    p(3) = Vr (x =

    d-) \177o5\177

    3!2! (\177

    We summarize

    o This formula is Cdesignated S\177,S2, Sand so on; thus th\177

    or, in general

    x!(n - x)!

    (X \177= x)

    \234(x) = (xn)rr\177(1_ rr),\177-\177 (4-7)

    with Figure 4-6.

    evelopedas follows. Suppose we wish to fill five spots with five objects,a, F1, F2. We have a choice of 5 objects to fill the first spot, 4 the second,number of options we have is' 5.4.3.2.1= 5! (4-6)

    \275o\177t'a)

  • 62 RANDOM VARIABLES

    Pr(e) e

    ,rr(1 - \177')\"-z \337{FF ......... FFS)fr (1 - \177')\"-z \337(FF ......... FSF)

    fr(1 - fr) \"-z \337(SF ......... FFF)

    rr\177(1- fr)'*-\177 \337(FF'\" FSS'\" SS)

    \177,', \337(SSSS ....... SS)

    X

    (1-n,,r (1 - \177r)\177-\177

    FIG. 4-6 Computing binomial probability of x successes in n trials.

    As a final example, we return to our previous experiment of tossingthree fair coins. What is the probability of two heads? Each toss is an in-dependent trial in which rr = 1/2. Noting also that n = 3 and x = 2, we have

    32) r\177_V.l_\177_\177._ 3[ a (4-8)Pr (X = 2) = \177 \177 2! 1! (\177)a = \177

    a confirmation of a previous result.But this is not the problem at hand; in our case we cannot distinguish between Sz, S2,

    and Sa--all of which appear as S. Thus many of our separate and distinct arrangementsin (4-6), (e.g.SzS9SaF\177F2 and S2S\177SaF\177F\177.)cannot be distinguished, and appear as the singlearrangement SSS\177rF. Thus (4-6) involves serious double counting. How much?

    We double counted 3.2.1 = 3! times because we assumed in (4-6) that we could dis-tinguish between S\177,S 2, and Sa when in fact we cannot. (3!issimply the number of distinctways of arranging S\177,S 2, and Sa.)Similarly, we double counted 2.1 = 2!times because weassumed in (4-6) that we could distinguish between F\177and Fs, when in fact we cannot.When (4-6)is deflated for double-counting in both these ways, we have

    = 3!2! 3!(5 -- 3)[

  • PROBLEMS

    as well

    CONTINUOUS DISTRIBUTIONS 63

    as the complete binomial distribution p(x) aretabulated i n'

    4-12 (a) (funct

    genei

    Tabl e i][I of the Appendix' for your optional use. \177:onstrUCt a diagram similar to Figure4-6to obtain the probabilityion for the number of headsZ when 4 coins are tossed; use'al rr.. \177 to obtain the results for a fair coin.(b) \177'hen set rr = \177,

    (c) \177rom (b), calculate 3t and a 2.(d) Graph the probability function of (b), showing

    I n4-13 A b\17711 is drawn from a bowl containi g 2 red, 1 blue, and 7 blackballsl The ball is replaced, and a secondball is drawn, and so On until3 bats have been drawn (sam tin with replacement)(a) Eet X = the total number of red balls drawn, Tabulate its proba-bilit,!function. Find ,it and cr. Graph.(b) irePeat (a), for Y = the total number of blue ballsdraw\177.

    4-14 Check the probability function of Problem 4-9 using the fornlulas ofthis .ection.

    (4-15) In rt 11ing 3 dice, let Xbe the number of aces that occur. TabUlatetheprobtbility function of .Y. Find 3t and cr \177.Graph.

    4-16 On ,q blind tos.s of a dart, suppose the probability of hitting the targetis 1 . What is the probability that in 6 tosses you will hit the targetexaCily2 times? At most 2 times ? At least 3 times ?

    \177 4-17 on:the basis of these questions, can you guess the mean of \177generalbinomial variable, in terms of n and rr ? Can you guess the variance?(This leadsint \370Chapter 6-6.)

    \3374-18 \177(For 1 calculus students only, leading into section 4-5). Graph thefunction f0 = e-\177/2, showing its

    (a) \177mmetry.(b) ksymptotes.(c) qlaximum.(d) iboints of inflection.

    4-4 CONq,INUOUS DISTRIBUTIONS

    In Chz ,ter 2 w e saw how a continUOus quantity such as height \177Was bestgraphed Wi:h a relative frequency histogram. The histogram of heights of\177Starred pro' \1771eros are optional, since they are more theoretical and/or difficul\177 than therest.

  • 64 RANDOM VARIABLES

    Figure 2-3 is reproduced in Figure 4-7a below. (For purposes of illustration,we measure height in feet, rather than inches. Furthermore, the g-axis hasbeen shrunk to the same scale as the x-axis.)Note that in Figure 4-7a relativefrequency is given by the height of each bar; but since its width (or base) is1/4, its area (height times width) is numerically only 1/4 as large. Thus wecan't use area in this figure to represent relative frequency, since it wouldbadly understate. In fact, if we wish area to represent relative frequency each

    1.00-0.75-0.50-0,25-

    5 6 7 Height (ft)(a)

    ,- 1.00

    >' 0.75= 0.50--\1770.25 --

    a:: 0

    Unit square,Area = 1

    Relative frequencyea

    5 6 7 Height (ft)

    FIG. 4-7 Relative frequency histogram (a) transformed into relative frequency density in(b) making total area -- 1.

    height must be increased fourfold. This is done in Figure 4-7b, where thearea of each bar is relative frequency, and the height of each bar is calledrelative frequency density.

    In general

    (relative frequencydensity)(cellwidth) = (relative frequency)

    area of any bar = relative frequency.

    There is but one more important observation. In Figure 4-7a, the heightssum to one (the sum of all relative frequencies must be one). From the

  • 6Height (if)

    Area = relative frequency of men in interval I(5\177 to 5\177ft)

    \177.probability

    67 Height (ft)

    Area = relative frequency of men in interval I

    f \177 probability

    6

    (c) 7 Height (ft)

    Area = probability of drawing a man in interval

    5 I 6 7 Height (ft)

    FIG. 4-8 How :lative frequency density may be approximated by a probability densityfunction as sample size increases, and cell size decreases. (a) Small n, as in Fig. 4-7b.(b)Largeenough n to stabilize relative frequencies. (c) Even larger n, to permit finer cellswhile keeping relai:ive frequencie s stable. (d)For very large n, this becomes (approximately)

    a smooth probability density curve.65

  • 66 RANDOM VARIABLES

    numerical equivalence of height in Figure 4-7a to area in Figure 4-7b, itfollows that the areas in Figure 4-7b must also sum to one. And this is a keycharacteristic of a density function in statistics: it encloses an area numericallyequal to 1.

    In Figure 4-8 we show what happens to the relative frequencydensity ofa continuous random variable as

    1. Sample size increases.2. Ceil size decreases.

    With a small sample, chance fluctuations influence the picture. But as samplesizeincreases,chanceis averaged out, and relative frequencies settle down toprobabilities. At the same time, the increase in sample size allows a finerdefinition of cells. While the area remains fixed at 1, the relative frequencydensity becomes approximately a curve, the so-called probability densityfunction, which we shall refer to simply as the probability function, designated

    If we wish to compute the mean and variance from Figure 4-8c, thediscrete formulas (4-3)and (4-4)canbe applied. But if we are working withthe probability density function in Figure 4-8d, then integration (whichcalculus students will recognize is the limiting case of summation) must beused; if a and b are the limits of X, then (4-3) and (4-4)become

    Mean, ,tt = f \177x p(x) dx

    Variance, \1772= (x - \177)2 p(x) dx

    (4-9)

    (4-10)

    All the theorems that we state about discreterandom variables areequally valid for continuous random variables, with summations replaced byintegrals. Proofsare alsovery similar. Therefore, to avoid tedious duplication,we give theorems for discrete random variables only, leaving it to the readerto supply the continuous case himself, if he so desires.

    4-5 THE NORMAL DISTRIBUTION

    For many random variables, the probability density function is a specificbell-shapedcurve, called the normal curve, or Gaussiancurve, as shown in

  • Figures 4-9statistics. Maare made in \177distributed. Ithe binomial'

    (a) Standard

    The pro

    The constant

    symbols rr ar3.14 and 2.7

    FI(

    TH E NORMAL DiSTRIBUTiON

    to 4-12. It is the single most useful probability function inny variables are normally distributed; for example,errorsthat\177easuring physical and economic phenomena oftenare normallyn addition, there are other useful probability functions (SUch aswhich often can be approximated by the normal curve.

    Normal Distribution

    )ability function of the standard normal variableZ is

    p(z) = x/2\177 e-k\177z2 (4-11)

    1./v/\177-'\177is a scale factor required to make the total area 1. Thed e denote important mathematical constants, approximately8 respectively. We draw the normal curve in 7 Figure 4-9 to

    p(z)1.0 )

    I Unit square.5 I

    -3 -2 -1 0 1 2

    \337 .2

    .3 -2 -1 0 1 2 3(b)

    r. 4-9 (a) Standard normal curve. (b)Vertical axis rescaled.

    reach a maxintum at z -- 0. We confirm in (4-11) that this is so: as we moveto the left or right of 0, z2 increases; since its negativeexponent is increasing? In Problem 4-114-9.

    The math\177Section 4-3 to d(

    you may have confirmed that the graph of (4-11) is that shown in Figure

    natical constant rr = 3.14 is not to be confused with the rr used in:signate probability of success.

  • 68 RANDOM VARIABLES

    in size, ?(z) decreases. Moreover, the further we move away from zero, themore ?(z) decreases; as z takes on very large (positive or negative) values,the negativeexponent in (4-11) becomes very large and p(z) approacheszero.Finally, this curve is symmetric. Since z appears only in squared form, --zgenerates the same probability in (4-11) as + z. This confirms the shapeof this standard normal curveaswe have drawn it in Figure 4-9. The mean andvariance of Z can be calculated by integration using (4-9) and (4-10); since

    p(z)

    FIG. 4-10

    (o4Zqzo)

    0 Zo

    Probability enclosed by the normal curve between 0 and z0.

    this requires calculus, we quote the results without proof:\177z=OCrz= 1

    it is for this very reason, in fact, that Z is called a standard normal variable.Later when we speak of \"standardizing\" any variable, this is precisely whatwe mean: shifting it so that its mean is 0 and shrinking (or stretching) it sothat its standard deviation (or variance) is one.

    The probability (area) enclosedby the normal curve between the mean(0)and any specified value (say z0) also requirescalculusto evaluate precisely,but may be easily pictured in Figure 4-10.

    This evaluation of probability, doneonceand for all, has been recordedin Table IV of the Appendix. Students without calculus can think of this asaccumulating the area of the approximating rectangles, as in Figure 4-8c.

    To illustrate this table, consider the probability that Z falls between .6and 1.3,as shown in Figure 4-11a. From Table IV in the Appendix we notethat the probability that Z falls between 0 and .6is .2257;similarly the proba-bility that Z falls between 0 and'1.3is .4032. We require the difference inthese two, namely:

    Pr (.6

  • THE

    , . < \177.3)0 1 2 z

    NORMAL DISTRIBUTION

    -2 -1 0 1 2 z i(a) (b)

    FIG. 4,11 Standard normal probabilities.

    between 0 ah!d 1 is identical to the pr\370bability between 0 and +1, \177hich is.3413. In thi'\177 instance we add this to the probability of Z between 0 and2-namely .Z1772--which yields

    pr (--1 __< Z < 2) = .3413+ .4772= .8185Finally, the \275.tudent may confirm that the probability enclosed between onestandard de\177ation above and below the mean (-- 1 __< Z < + 1) is .6826,orjust over 2/31 of the area of the n0rmal 'cU;\275e, -

    PROBLEM \177

    \177standard normal variable, use Appendix Table IV to ekaluate:(-2

  • 70 RANDOM VARIABLES

    We notice that in the very special case in which F = 0 and a -- 1, (4-I2)reduces to the standard normal distribution (4-11). But more important,regardless of what F and a may be, we can translate an), normal variableX in (4-12)into the standard form (4-11) by defining:

    X-- F--Z

    o'(4-13)

    p(x)- \177 e -\177 2

    X Standard normal variate / /Sometimes scale \177 \177 / Sometimes

    is stretched \177 p(z)-- 1 e-\177Z2\\ \177\177 \177 / / / it is shrunk

    FIG. 4-12Linear transformation of any normal variable into the standard normal

    variable.

    Z is recognized as just a linear transformation of X, as shown in Figure 4-12.Notice that whereas the mean and standard deviation of a general normalvariate X can take on any values, the standard normal variate Z is uniquewith mean 0 and standard deviation 1 as proved in Problem 4-10.

    To evaluate any normal variate X, we therefore translate X into Z,and then evaluate Z in the standard normal table (Appendix Table IV).Forexample,suppose that X is normal, with F = 100 and \177r= 5. What isthe probability of getting an X value of 110or more?That is, we wish toevaluate

    Pr (X >_ 110)First (4-14) can be written equivalently o as

    pr (X-- 100 110-- 100)-> ; .

    (4-14)

    (4-15)

    9 Any inequality is preserved if both sides are diminished by the same amount (100) anddivided by the same positive amount (5).

  • which, not! ng (4-13), is

    THE NORMAL DISTRIBUTION 71

    We see t (4-16) is the standardize d form of (4-14), and from Table IVwe evaluate t. this probability to be .0228. Moreover, the standardiied form(4-16) all\370\275is a Clearer interpretation of our original question; in \177fact, wewere askin\177 \"What is the probability of getting a normal valUe at l\177ast twostandard deviations above the mean ?\" The answer is' very small--about onein fifty

    As a filial example, suppose a bolt picked at random from a productionline has a length \177 Which is a normal random variable with mean 10 cm andstandard diiViation 0.2 cm. What is the pr0babilitY that its length Will b ebetween 9.9 and 10.1cm? That is

    Pr (9.9 _< X _< 10.1)This may b\177 written in the standardized form

    Pr (9.9-- 10 X- 10 10.1--<

  • 72 RANDOM VARIABLES

    4-6 A FUNCTION OF A RANDOM VARIABLE

    Looking again at the experiment of tossingthree coins, let us supposethat there is a reward R depending upon the number X of headswe toss.Formally, we might state that R is a function of X, or

    R =

    Now let us suppose that the specific form of this function is

    = (x -which is equally well given by Table 4-4.

    TABLE 4-4 Tabled Form of the Function R = (X - 1)\177'

    Value of XValue of R

    ,. = g(x) = (,\177- 1

    0 (0 - 1)2 = 11 (1 - 1)2 = 02 (2 - 1)2 = 13 (3 - 1)2 = 4

    The values of R are customarily rearranged in order as shown in thethird column of Table 4-5. Furthermore, the values of R have certain proba-bilities which may be deduced from the previousprobabilitiesofX, (just as theprobabilities of X werededucedfrom the probabilities in our original sample

    TABLE 4-5 Calculation of the Probability of Each R Valuefrom the Probabilities of Various X Values

    (1) (2) (3) (4) (5)x ?(x) r = g(,\177') ?(r)

    \3370 \177/8\177. 0 3/8 01 3/8 1 4/8 4/8\3372 3/8 _.\177 4 1/8 4/8\3373 1/8 --------

  • NOTATION \177 73

    space, in Fig ue 4-1). Thus we note from Table 4-4 that two values 9f X,(0 or 2)give ris: to an R value of 1.This is indicated with arrows in Table 4-5.The third and ,ourth column in this table show the probability distributionof R. The last ',olumn shows the calculation of the mean of R,

    R is a ran tom variable; although it has been \"derived\" from X, it hasall the proper 1 es of an ordinary random variable. The mean of R c\177n becomputed fron its probability distributio n, as in Table 4-5, and is found tobe 1.0. But if it is more convenient, the answer can be derived from theprobability di s :ribution of X, as in Table 4-6.

    Mean of R = (X - 1)\177, calculated from p(:c)TAI\177LE 4-6

    o

    123

    1o14

    ?(x) g(x)?(x)

    It is easy to see why this works; in a disguised Way we are calculating/\177e in the sanlte way as in Table 4-5. The first and third lines 9f Tab!e 4,6appear togeth\177er as the second line of Table 4-5.,Also, the second and fourthlines of Table4-6 correspon d to the first and ihird lines of Tabl e 4-5.1ThusTable 4-6 conyields the samis orderedacaccordingto

    This exa\337

    and g is any function, then R = g(X) is a random variable. /\177R may becalculated eit her from the .probability function of R, or alternatively fromthe probability function of X accordingto

    [heorem \177

    3/8 03/8 3/8

    4/8

    /\177/\177= 8/8 = 1.0

    rains precisely the same information as Table 4,5; it thereforee value for/\177e. The only difference in the two tables is t\177at 4-6:ording to X vaiu es, While 4-5 is ordered (and condensed)values.

    nple can be generalized, as follows. If Xis a random .variable,

    =(4-17a)

    ,ION4-7 NOTA\177

    \177notation wi ll help u s better understand the vari\370us vie\177p \370intsFor any rando m variable, X let us say, all the following terms

    Some n eof the mean.

  • 74 RANDOM VARIABLES

    mean exactly the same thing: \177o/\177x = mean of X

    = average .g= expectation of X

    = E(X), the expected value of X

    Theterm E(X) is introduced because it is useful as a reminder that it representsa weighted sum, i.e.,

    x

    With this new notation, result (4-17a) can be written\177(a) = Z g(x) p(\177)

    (4-3)

    (4-17b)x

    Finally, we recall that R was just an abbreviation for g(X), so that we mayequally well write (4-17b) in an easily remembered form'

    e[g(x)] =Theorem

    (4-17c)

    As an example of this notation, we may write

    By (4-4),E(X -- t\177)\177= \177 (x -- p\177)\177p(x) (4-18)

    E(X --/\177)'\177 = \1772

    Thus we see that g= may be regardedasjust a kind of expectationthe expectation of the random variable (X--3t) z.

    (4-19)

    namely,

    PROBLEMS

    4-24 As in Problem 4-1, let X be the number of headswhen 4 coins arefairly flipped.(a) If R(X)= X\" -- 3X, find its probability function, and/\177R and(b) Find E [X- 21in 2 ways:

    (1) Using the probability function of [X -- 21;and(2) Using the probability function of X in (4-17)

    (c) Find E(X 2)(d) Find \234(X- #x)\". Is this related to \177x in any way ?

    \1770The reason for the plethora of names is historical. For example, gamblers and economistsuse the term \"expected gain,\" meteorologists use the term \"mean annual rainfall,\" andteachers use the term \"average grade.\