econtutor.com...contents 1 lecture 1: january 29th, 2019 1 1.1 goals . . . . . . . . . . . . . . . ....

Statistics

ECON-UA 18 StatsTutor New York

Contents

1 Lecture 1: January 29th, 2019 11.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Course Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Grades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 What are Probability and Statistics . . . . . . . . . . . . . . . . . . . 4

1.3.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Experiments, Sample Spaces, and Events . . . . . . . . . . . . . . . . 51.4.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4.2 Sample outcome . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4.3 Sample space . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4.4 Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 A Gambling Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Lecture 2: January 31st, 2019 82.1 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Revisit the gambling problem . . . . . . . . . . . . . . . . . . 82.1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Set Theory in Probability . . . . . . . . . . . . . . . . . . . . 102.2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.4 Another Example . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Rules and Theorems of Conditional Probability . . . . . . . . 132.3.3 Example: Economic Opportunity at NYU . . . . . . . . . . . 132.3.4 Bayes’ Theorem Example . . . . . . . . . . . . . . . . . . . . 14

i

CONTENTS ii

3 Lecture 3: February 5th, 2018 153.1 Review of Conditional Probability . . . . . . . . . . . . . . . . . . . . 153.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1 What makes two events independent? . . . . . . . . . . . . . 163.2.2 Another way to think about independence . . . . . . . . . . . 163.2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2.4 Second Example . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Probability of Repeated Independent Events . . . . . . . . . . . . . . 173.3.1 Examples: Multiplicative Rule . . . . . . . . . . . . . . . . . . 183.3.2 Gambling Example of Multiplicative Rule . . . . . . . . . . . 18

3.4 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Lecture 4: February 7th, 2019 214.1 Review of Permutations . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1.1 Finding the Number of Unique Combinations . . . . . . . . . 214.1.2 Example: What is the Probability of a four of a kind? . . . . . 224.1.3 Order with multiple types . . . . . . . . . . . . . . . . . . . . 22

4.2 Review of Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3 Binomial Probability Distributions . . . . . . . . . . . . . . . . . . . 24

4.3.1 Binomial Distribution Example . . . . . . . . . . . . . . . . . 25

5 Lecture 5: February 12th, 2019 275.1 Review of Binomial Probability Distributions . . . . . . . . . . . . . . 27

5.1.1 Binomial Probability Example . . . . . . . . . . . . . . . . . . 275.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.3.1 Expected Value Example . . . . . . . . . . . . . . . . . . . . . 31

6 Lecture 6: February 14th, 2019 336.1 Expected Value Principles . . . . . . . . . . . . . . . . . . . . . . . . 33

6.1.1 Expected Value of cX + k . . . . . . . . . . . . . . . . . . . . 336.1.2 Expected Value of a Sum . . . . . . . . . . . . . . . . . . . . . 336.1.3 Expected Value of a Binomial Random Variable . . . . . . . . 34

6.2 Probability distribution versus Cumulative Distribution Functions . . 356.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.3 Interpreting Expected Value . . . . . . . . . . . . . . . . . . . . . . . 36

CONTENTS iii

7 Lecture 7: February 19th, 2018 377.1 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.1.1 Variance Examples . . . . . . . . . . . . . . . . . . . . . . . . 387.1.2 Principles of Variance . . . . . . . . . . . . . . . . . . . . . . . 387.1.3 Expected Value and Variance with Real World Data . . . . . . 39

7.2 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . 42

8 Lecture 8: February 21st, 2019 458.1 Joint Distribution Function . . . . . . . . . . . . . . . . . . . . . . . 47

8.1.1 Example of Jointly Distributed Random Variables . . . . . . . 478.1.2 Definition of Covariance . . . . . . . . . . . . . . . . . . . . . 488.1.3 Conditional Probability Function . . . . . . . . . . . . . . . . 488.1.4 Independent Random Variables . . . . . . . . . . . . . . . . . 498.1.5 Example: Discrete Joint Probability Function . . . . . . . . . 49

9 Lecture 9: February 26th, 2019 519.1 Joint Distribution Functions Continued . . . . . . . . . . . . . . . . . 51

9.1.1 Summation (Integration) of Joint Probability is the IndividualProbability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9.1.2 Summing Over Both Variables Must Equal 1 . . . . . . . . . . 519.1.3 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . 52

9.2 Principles of Expectation, Variance, and Covariance . . . . . . . . . . 529.3 Application to Finance . . . . . . . . . . . . . . . . . . . . . . . . . . 549.4 Variance of the Sum of Independent Random Variables . . . . . . . . 55

9.4.1 Importance of Sample Size . . . . . . . . . . . . . . . . . . . . 56

10 Lecture 10: February 28th, 2019 5810.1 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5810.2 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 5910.3 Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . 6410.4 Extended Example of Using Normal Distribution . . . . . . . . . . . 6410.5 Example of Using Normal Distribution: Airline Passengers . . . . . . 66

11 Lecture 11: March 5th, 2019 6911.1 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 6911.2 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 6911.3 Example of Using Central Limit Theorem and LLN . . . . . . . . . . 70

CONTENTS iv

12 Lecture 12: March 12th, 2019 7312.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7312.2 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

13 Lecture 13: March 14th, 2019 7513.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7513.2 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

13.2.1 Unbiasedness of a Sample Mean . . . . . . . . . . . . . . . . . 7613.2.2 Extended Example of Unbiasedness . . . . . . . . . . . . . . . 7713.2.3 Intuition Behind the Unbiased Estimator of Variance . . . . . 79

13.3 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8013.3.1 Example of Efficiency . . . . . . . . . . . . . . . . . . . . . . . 80

14 Lecture 14: March 26th, 2019 8214.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8214.2 Hypothesis Testing Process . . . . . . . . . . . . . . . . . . . . . . . . 8314.3 One-Sided versus Two Sided Tests . . . . . . . . . . . . . . . . . . . . 88

15 Lecture 15: March 28th, 2019 9015.1 Review of Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . 90

15.1.1 Critical z values . . . . . . . . . . . . . . . . . . . . . . . . . . 9215.2 Real Data: Examples of Hypthesis Testing . . . . . . . . . . . . . . . 92

15.2.1 First Example: Sports Gambling . . . . . . . . . . . . . . . . 9215.2.2 Second Example: Math Scores . . . . . . . . . . . . . . . . . . 9315.2.3 Time of Death Example . . . . . . . . . . . . . . . . . . . . . 94

16 Lecture 16: April 2, 2019 9616.1 Type 1 and Type 2 Errors . . . . . . . . . . . . . . . . . . . . . . . . 96

16.1.1 Example of Type 1 and 2 Errors: Sports Gambling . . . . . . 9616.1.2 Relationship between Statistical Significance and Type 1 and

2 Error Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . 9816.1.3 Larger Samples Produce Fewer Type 2 Errors . . . . . . . . . 9916.1.4 Likelihood of Type 2 Error Depends on How Far Off the Null

Hypothesis Is . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

17 Lecture 17: April 4th, 2019 10117.1 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

17.1.1 Sample Size Can Significantly Increase Power . . . . . . . . . 10317.2 Finding the Required Sample Size . . . . . . . . . . . . . . . . . . . . 103

CONTENTS v

18 Lecture 18: April 9th, 2018 10618.1 Student’s t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 106

18.1.1 Unkown Variance . . . . . . . . . . . . . . . . . . . . . . . . . 10618.1.2 The t statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . 10618.1.3 t-statistic Example . . . . . . . . . . . . . . . . . . . . . . . . 10818.1.4 Extreme t statistics happen more frequently . . . . . . . . . . 10818.1.5 Importance of Sample Size . . . . . . . . . . . . . . . . . . . . 108

18.2 Using t statistics for hypothesis testing . . . . . . . . . . . . . . . . . 109

19 Lecture 19: April 11, 2019 11119.1 Using t-distribution for Confidence Intervals . . . . . . . . . . . . . . 111

19.1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11119.2 t-test Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

20 Lecture 20: April 16th, 2019 11720.1 Hypothesis Tests with Two-Populations . . . . . . . . . . . . . . . . . 11720.2 Second Example: Two-Sample t-tests . . . . . . . . . . . . . . . . . . 121

21 Lecture 21: April 18th, 2019 12321.1 two-sample t test with equal variances . . . . . . . . . . . . . . . . . 12321.2 Example: Two-sample t-test with equal variance . . . . . . . . . . . . 124

21.2.1 Two-sample test Using the Binomial Distribution . . . . . . . 125

22 Lecture 22 12722.1 Testing Hypotheses about Variance . . . . . . . . . . . . . . . . . . . 127

22.1.1 χ2 distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 12722.1.2 χ2 test Example . . . . . . . . . . . . . . . . . . . . . . . . . . 131

22.2 Testing for Equality of Variances Using Samples from Two Populations13222.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13322.2.2 Example: F -test . . . . . . . . . . . . . . . . . . . . . . . . . 136

23 Lecture 23 13823.1 Statistics with Jointly Distributed Random Variables . . . . . . . . . 138

23.1.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14023.1.2 Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . 143

Chapter 1

Lecture 1: January 29th, 2019

Suppose there are three roommates who live together in Brooklyn and commute toNYU by a different mode of transit: subway, bike, and uber. Each one thinks theirmethod is fastest. They decide to record their commute time every day for a week.The table below shows the results.

Commute in MinutesSubway Bike Uber

Monday 33 29 45Tuesday 31 31 35Wednesday 36 30 26Thursday 28 30 55Friday 40 29 22

• Is it faster to commute by one method versus another?

• Are there any other characteristics of the commute methods that are interest-ing?

• How do we know this isn’t just due to random chance? Are five days enoughto tell which method is fastest? How many days of observations would we needto be confident?

– Maybe Uber is faster, there just happened to be really bad traffic onMonday and Thursday.

– Or maybe the subway was lucky because most weeks there is a muchlonger delay than was experienced that week.

My goal for this course is for you to be able to see data and be able to rigorouslyanswer these types of questions.

1

CHAPTER 1. LECTURE 1: JANUARY 29TH, 2019 2

1.1 Goals Statistics tutor New York University

1. Understand the mathematical foundations of statistics used by economists

2. Interpret descriptions of data and tests of hypotheses

3. Know how to test hypotheses using data and the R programming language.

• If I give you some economic data and a hypothesis, you should know howto test it, implement that test in R, and report your results

1.3 What are Probability and Statistics

1.3.1 Probability

Definition: The likelihood of a given event being the outcome of a random process.An example of a probability question.

• If I flip a coin: what is the probability that I get heads?

1.3.2 Statistics

Statistics has two main types descriptive statistics and inferential statistics.

Economics Tutors in New YorkPrivate Lessons in Statistics and SPSS in New York City


Descriptive statistics

Definition Summarizing characteristics of an entire population with numbers. Anexample of descriptive statistics

• The average income of an NYU graduate.

• The average income of people in New York City below age 25.

Inferential statistics

Definition The science of observing data and making inferences about the processthat generated it.An example of an inferential statistics question:

• I have a coin that might be fair and might be unfair. I flip it 10 times, 8 timesI get heads, and 2 times I get tails. Based on that data, should I conclude thatthe coin is not fair?

1.4 Experiments, Sample Spaces, and Events

1.4.1 Experiment

Definition: A random process that:

1. Can be repeated

2. Has a well-defined set of outcomes

Examples of an experiment:

• Flipping a coin

• Rolling a die

• Buying a lottery ticket


1.4.2 Sample outcome

Definition: One of the potential outcomes of a given experimentExamples of sample outcomes

Experiment Example of a Sample OutcomeFlip a coin TailsRoll a die 2

Buy a lottery ticket Not win anything

1.4.3 Sample space

Definition: All of the potential outcomes of an experimentExamples of sample spaces

Experiment Sample SpaceFlip a coin Heads and TailsRoll a die 1, 2, 3, 4, 5, 6

Buy a lottery ticket Win any amount of money (including 0)

Quick questions:

• I flip a coin two times, it lands on heads than tails, is it an experiment?

• What is the sample space?

1.4.4 Event

Definition Any collection of sample outcomes, including the entire sample space.Examples of events

Experiment Event ExampleRoll a die 2 or 4Flip a coin Heads or Tails

Buy a lottery ticket Win at least $100

If every sample outcome in the sample space of an experiment is equally likely, thanthe probability of an event is

P (Event) =Number of Outcomes in Event

Number of Outcomes in Sample Space


1.5 A Gambling Example

We flip a coin two times.

• What is the Sample Space? Sample Space = {HH, HT, TH, TT} (H = Heads,T = Tails) I make a bet that there will be at least as many heads as tails.

• What is the event in which I win the bet? Win = {HH, HT, TH}

• What is the probability I win this bet?

P (Win) =Number of Outcomes in Event

Number of Outcomes in Sample Space=

3

4

Chapter 2 Stats TutorsNY

2.1 Axioms of Probability

Let S be the sample space of an experiment, and let A be an event.

1. 0 ≤ P (A) ≤ 1, the probability of an event is between 0 and 1

2. P (S) = 1, the probability of an outcome in the sample space occurring isprecisely 1

3. If A = {s1, s2, . . . , sk} then:P (A) = P (s1) + P (s2) + · · ·+ P (sk) =

∑ki=1 P (si)

The probability of an event is the sum of the probabilities of the sample out-comes it contains(Note: axiom (3) relies on the fact that each sample outcomes is distinct fromevery other one, in other words, if one sample outcome occurs it means allother sample outcomes did not occur).

2.1.1 Revisit the gambling problem

Each of the four outcomes in S has an equal probability, .25. If A is the event thatthere are at least as many heads as tails, then:

→ P (A) = P (HH) + P (HT ) + P (TH) = .25 + .25 + .25 = .75

2.1.2 Example

You roll two die

8

CHAPTER 2. LECTURE 2: JANUARY 31ST, 2019 9

Sample Space from Rolling Two Die1-1 2-1 3-1 4-1 5-1 6-11-2 2-2 3-2 4-2 5-2 6-21-3 2-3 3-3 4-3 5-3 6-31-4 2-4 3-4 4-4 5-4 6-41-5 2-5 3-5 4-5 5-5 6-51-6 2-6 3-6 4-6 5-6 6-6

What is the probability of the following events

1. The two dice add up to the number 4

2. The two dice show the same number

3. The two dice add up to four or both dice show the same number

4. The two dice add up to four and both dice show the same number

2.2 Set Theory

• The sample space is the set of all sample outcomes.

• An event is a set of sample outcomes. It is a subset of the sample space.

Suppose that A and B are events in a sample space S:

• Intersection of A and B, written A∩B The event whose outcomes belong toboth A and B

• Union of A and B, written A∪B The event whose outcomes belong to eitherA or B.

• Complement of A, written Ac The event containing the sample outcomes inS that are not in A

• Disjoint A and B are disjoint if they do not share any elements: A ∩B = ø


2.2.1 Example

Let’s return to throwing two dice.

• Let A be the event that the dice add up to 4

• Let B be the event that the dice show the same number

Questions:

1. What is A ∩B?

2. What is A ∪B?

3. What is Ac ∩Bc?

4. What is (A ∪B)c?

2.2.2 Set Theory in Probability

We have two events, A and B defined for an experiment with sample space S

• Union of two events

P (A or B) = P (A ∪B) = P (A) + P (B)− P (A ∩B)

• Complement of an Event

P (Ac ∪ A)︸︷︷︸=P (S)=1

= P (A) + P (Ac)− P (A ∩ Ac)︸︷︷︸=0

= 1→ P (A) = 1− P (Ac)

• Suppose you have a series of disjoint events whose union is the sample space:

E1 ∪ E2 ∪ · · · ∪ En = Sample Space &∀i, j Ei ∩ Ej = ∅

Then for any event, Fn∑i=1

P (F ∩ Ei) = P (F )

A simplified version of this principle is, for any events E and F within the samesample space:

P (F ) = P (F ∩ E) + P (F ∩ Ec)


Let’s check these principles by looking at our example where A is the two dice addingup to 4 and B is the two dice showing the same number.

P (A ∪B)︸︷︷︸8//36

= P (A)︸︷︷︸3/36

+P (B)︸︷︷︸6/36

−P (A ∩B)︸︷︷︸1/36

P (A)︸︷︷︸3/36

= 1− P (Ac)︸︷︷︸33/36

P (B)︸︷︷︸6/36

= P (B ∩ A)︸︷︷︸1/36

+P (B ∩ Ac)︸︷︷︸5/36

2.2.3 Example

A and B are both events in a sample space S. We know the following about them:

• P (A) = .4

• P (B) = .3

• P (A ∩B) = .2.

What is the probability of either A or B occurring?

2.2.4 Another Example

• You have two investments that either make money or lose money.

• Each one has a 60 percent chance of making money, and a forty percent chanceof losing money.

• There is a 30 percent chance that both investments make money.

What is the probability that both lose money?

P (W1 ∩W2) = .3→ P (W c1 ∪W c

2 ) = 1− .3 = .7

P (W c1 ∪W c

2 )︸︷︷︸=.7

= P (W c1 )︸︷︷︸

=.4

+P (W c2 )︸︷︷︸

=.4

−P (W c1 ∩W c

2 )

→ P (W c1 ∩W c

2 ) = .4 + .4− .7 = .1

There is a 10% chance that both investments lose money.


2.3 Conditional Probability

• Probability Likelihood of event A occurring.

• Conditional Probability Likelihood of Event A occurring, if you know for a factthat event F occurred.

In English: What is the likelihood of one thing happening if you knowthat another thing happens?

2.3.1 Example

Go back to the example of rolling two dice:

A = The sum of the dice is greater than 10

What is P (A)?

P (A) =Number of sample outcomes in A

36= 3/36 = 1/12

B = The first die is five

What is P (A|B)?

P (A|B) =# Sample outcomes in A ∩B

# Sample outcomes in B=

1

6=

1/36

6/36=P (A ∩B)

P (B)

When conditioning on B, it effectively shrinks the number of potential sample out-comes in the sample space (the denominator) from 36 to 6. But it also shrinks thenumber of potential outcomes in A to those that are also in B. Therefore

P (A|B) =P (A ∩B)

P (B)

Another example:

• Let F be the event that both dice land on 5. What is P (F )?Answer: 1/36

• Let V be the event that one of the two die land on five. What is P (F |V )?Answer The sample space effectively shrinks to eleven. So it becomes 1/11.


2.3.2 Rules and Theorems of Conditional Probability

Suppose you have two events C and D within the same sample space for an experi-ment.

• Results of Conditional Probability

P (C|D) =P (C ∩D)

P (D)

P (C ∩D) = P (D)P (C|D)

What is P (D|C)?

• We can just flip the terms around

P (D|C) =P (C ∩D)

P (C)

Therefore

• Definition: Bayes’ Theorem If you have two events, A and B:

P (B|A) =P (B)P (A|B)

P (A)

2.3.3 Example: Economic Opportunity at NYU

A few facts about low-income students and alumni at NYU:

• 6.1 percent of students come from families with a household income below the20th percentile (≈ 25, 000 dollars)

• 3.6 percent of students come from a family below the 20th percentile of income,but achieve an income above the 80th percentile by age 34 (≈ 54, 000 dollars)

What is the probability a student at NYU whose parents are below the 20th percentilefor household income will have an income in the future above the 80th percentile forpeople of her or his age?Hint:

P (A|B) =P (A ∩B)

P (B)

P (S Above 80|P below 20) =P (S Above 80 ∩ P Below 20)

P (P Below 20)=.036

.061


2.3.4 Bayes’ Theorem Example

Suppose there is a test that screens for a disease. Only one in 1,000 people have it.If a person has the disease, the test will successfully detect it 90% of the time. If aperson does not have the disease, the disease will incorrectly diagnose them 1% ofthe time.

1. What is the probability of testing positive on the test?

P (Positive) =P (Positive|Disease)P (Disease)+

P (Positive|No Disease)P (No Disease)

= .9× .001 + .01× .999 = .01089

2. What is the probability a person who tested positive has the disease?

P (Disease|Positive) =P (Disease)P (Positive|Disease)

P (Positive)=.001× 9/10

.01089

≈ 8 percent chance of having the disease if the test is positive!

Chapter 3

Statistics Tutor New York

3.1 Review of Conditional Probability

Suppose you have two events C and D within the same sample space for an experi-ment.

• Results of Conditional Probability

P (C|D) =P (C ∩D)

P (D)

P (C ∩D) = P (D)P (C|D)

What is P (D|C)?

• We can just flip the terms around

P (D|C) =P (C ∩D)

P (C)

Therefore

• Definition: Bayes’ Theorem If you have two events, A and B:

P (B|A) =P (A ∩B)

P (A)=P (B)P (A|B)

P (A)

15

CHAPTER 3. LECTURE 3: FEBRUARY 5TH, 2018 16

3.2 Independence Stats Tutors

3.2.1 What makes two events independent?

• A and B are considered independent if the following conditions hold:

P (A|B) = P (A) and P (B|A) = P (B)

• Recall

P (A|B) =P (A ∩B)

P (B)→ P (A|B)P (B) = P (A ∩B)

• If A and B are independent:

P (A|B) = P (A)→ P (A)P (B) = P (A ∩B)

3.2.2 Another way to think about independence

• If A and B are independent

P (A) = P (A|B) = P (A|Bc)

In other words, it doesn’t matter if B occurs, or B does not occur (Bc), the proba-bility of A occurring is the same.

3.2.3 Example

A is the second die is a 6. P (A) = 6/36 = 1/6B is the first die is a 1 or 3. P (B) = 12/36 = 1/3Are A and B independent? In other words,

P (A|B) = P (A)? and P (B|A) = P (B)?

P (A|B) =P (A ∩B)

P (B)=

2/36

12/36= 1/6 = P (A)→ Independent


3.2.4 Second Example

Suppose you survey the population of people who are either working or activelyseeking work (unemployed). You find that 34.1% of the people have a college degree,and 4.6% of them are unemployed. Suppose that being unemployed and being acollege graduate were independent, what percentage of people would be both collegegrads and unemployed? Answer

.046× .342 = .0157 = 1.57%

In fact you find that only .9% of the people are both unemployed and college grads.Based on this, do you think being a college graduate and being unemployed areindependent?

Unemployed (Y=0) Employed (Y=1) TotalNon-college grads (X=0) .037 .622 .659College grads (X =1) .009 .332 .341Total .046 .954 1

Test your conclusion by finding the following:

P (Unemployed|College grad)

P (College grad|Employed)

and comparing them to P (Unemployed) and P (College grad).

3.3 Probability of Repeated Independent Events

We proved before that if A and B are independent, then

P (A ∩B) = P (A)× P (B)

This would also work with three independent events:

P (A ∩B ∩ C) = P (A)× P (B ∩ C) = P (A)× P (B)× P (C)

In fact, this can be extended to a case with many independent events:

Definition: Multiplicative ruleIf A1,A2, ,A3, . . . ,An are each independent events, then

P (A1 ∩ A2 ∩ A3 ∩ · · · ∩ An) =n∏i=1

P (Ai)


3.3.1 Examples: Multiplicative Rule

• What is the probability I flip a coin 6 times in a row, and it lands on headsevery single time?

(1/2)6 = 1/64

• Suppose I buy a lottery ticket where I have to pick five random numbers be-tween 0 and 99. If I guess all five correctly, I win $500 million. What is theprobability that I win?

(1/100)5 = 1/10, 000, 000, 000 = 1/10billion

3.3.2 Gambling Example of Multiplicative Rule

Your friend offers you two options for a $100 bet:

1. You roll a single die four times, if in any of the four times it lands on a six,you win $100, if not you lose $100.

2. You roll two dice 24 times. If in any of the 24 times it lands on doubles sixes,you win $100, if not you lost $100.

Are either of these a good bet? Which one is best?We need to find the probability of each one:Let A1 be the event you get a six on the first roll, and A2 be the event you get a sixon the second roll, and so on. We need to find

P (A1 ∪ A2 ∪ A3 ∪ A4)

How do we do this? One way to simplify this is to realize that:

(A1 ∪ A2 ∪ A3 ∪ A4)c = (Ac1 ∩ Ac2 ∩ Ac3 ∩ Ac4)

P (A1 ∪ A2 ∪ A3 ∪ A4) = 1− P (Ac1 ∩ Ac2 ∩ Ac3 ∩ Ac4)

To find P (Ac1 ∩ Ac2 ∩ Ac3 ∩ Ac4), we need to recognize two things:

1. P (Aci) = 5/6

2. Ac1 is independent of Ac2, Ac3, and Ac4


Therefore

P (Ac1∩Ac2∩Ac3∩Ac4) = P (Ac1)×P (Ac2)×P (Ac3)×P (Ac4) = (5/6)4 = 625/1296 ≈ .482

P (A1 ∪ A2 ∪ A3 ∪ A4) = 1− .482 = .518

Using the same method, we can find the probability of winning the second optionfor the bet. Let B1 be the event that the first roll of two die land on two sixes:

P (B1 ∪B2 ∪ · · · ∪B24) = 1− P (Bc1 ∩Bc

2 ∩ · · · ∩Bc24) = 1− (35/36)24 = .491

So one bet you will win slightly more than half the time, but the other one you willwin slightly less than half the time. As a gambler, it’s important to be able to figureout which one is which!

3.4 Permutations

Suppose you have five siblings, Ana, Bob, Chris, Dan, and Ed. They all share onebathroom and they all have to take showers in the morning. Ana thinks they shouldshower in alphabetical order, but Ed disagrees. Finally they settle on randomizingthe order. What is the probability that it will end up being alphabetical orderanyways?

P (alphabetical) = P (A1)︸︷︷︸1/5

×P (B2|A1)︸︷︷︸1/4

×P (C3|A1 ∩B2)︸︷︷︸1/3

×P (D4|A1 ∩B2 ∩ C3)︸︷︷︸1/2

×P (E5|A1∩B2∩C3∩D4)

P (alphabetical) =1

5× 4× 3× 2× 1= 1/120

Suppose you had to order 100 siblings, what would be the probability of any partic-ular order?

1

100× 1

99× 1

98× · · · × 1

3× 1

2× 1

1=

1

100!

Definition: Permutation ruleIf you have a set of n distinct elements in a set, it can be ordered in n!different ways, the probability of any particular order (if each order hasequal probability) is 1/n!.


Suppose you had a baseball team with 20 players, and a nine-player batting or-der. If players are chosen to bat in random order, how many different potentialstarting line-ups are there?

20× 19× 18× · · · × 12 =20!

(20− 9)!

Therefore, if all orders had equal likelihood, the probability of any particular orderof baseball players would be:

P (Any particular order) =1

20!/11!

Theorem If you have n elements in a set, and you randomly chose k ofthem for an ordered sequence, the number of possible ordered sequenceswill be:

n!

(n− k)!

Chapter 4

Statistics Tutors Columbia

4.1 Review of Permutations

• Definition: Permutation ruleIf you have a set of n distinct elements in a set, it can be ordered in n! dif-ferent ways, the probability of any particular order (if each order has equalprobability) is 1/n!.

• Theorem If you have n elements in a set, and you randomly chose k of themfor an ordered sequence, the number of possible ordered sequences will be:

n!

(n− k)!

4.1.1 Finding the Number of Unique Combinations

Suppose order does not matter? You have a deck of 52 cards, and you want to knowhow many five card poker hands there are. The order of your cards does not matter,only the combination of cards. First figure out how many combinations of cardsthere are if order does matter (how many unique orders of 5 cards there are).

# of hands with unique order =52!

(52− 5)!= 52× 51× 50× 49× 48

But the order does not, in fact, matter, each unique combination of cards will have5! unique orders that it could potentially be in.

# of hands with unique combo = # of hands with unique order/5! =52!

(52− 5)!× 5!

21


This leads to a generalizable principle, if you have a set with n elements, and you arerandomly choosing a combination of k of those elements, where order chosen doesnot matter, the number of unique combinations will be:

n!

(n− k)!k!

This means the probability of any given unique combination is:

1/n!

(n− k)!k!

if each combination is equally likely.

4.1.2 Example: What is the Probability of a four of a kind?

We already know the number of unique combinations of cards, which is our samplespace, how many of those combinations are four of a kinds?

• 13 different values we could have four of a kind of (A, 2, 3, . . . , 10, J, Q, K)

• For each potential four of a kind, there are 48 possibilities for the 5th card.

• Therefore, there are 13× 48 ways to get four of a kind

Therefore

P (four of a kind) = 13× 48/52!

(52− 5)!5!= .00024

4.1.3 Order with multiple types

A pastry in a vending machine costs 85 cents. In how many ways can a customerput in two quarters, three dimes, and one nickel. So we have six coins all together{n1, d1, d2, d3, q1, q2}

Suppose each of the nickels, dimes and quarters were distinct (perhaps they wereminted in different years). How many potential orders would there be?

6!

Now, assume they are NOT distinct, each dime is indistinguishable from the others.Therefore the dimes could be all be re-arranged and the order would look the same

{d1, d2, d3, n1, q1, q2} = {d3, d1, d2, n1, q2, q1}


So for each order, we have to account for the potential re-arrangement of the quartersand dimes. The quarters can be rearranged in 2! ways, and the dimes can be re-arranged in 3! ways. This means there are actually:

6!

2!3!

Unique orders when the quarters, dimes, and nickels are not distinct.

General Principle:If we have a set with n elements of k different types (n1 is the number of elementsof type 1, n2 is the number of elements of type 2). Then there are

n!

n1!× n2!× · · · × nk!

Different potential orders

4.2 Review of Counting

1. If you have a set of n distinct elements in a set, it can be ordered in n! differentways, the probability of any particular order is 1/n!.

2. If you have n elements in a set, and you randomly chose k of them for anordered sequence, the number of possible ordered sequences will be:

n!

(n− k)!

3. If you have a set with n elements, and you are choosing a combination of kof those elements, where order chosen does not matter, the number of uniquecombinations will be:

n!

(n− k)!k!

This means the probability of any given unique combination is:

1/n!

(n− k)!k!


4. If we have a set with n elements of k different types (n1 is the number ofelements of type 1, n2 is the number of elements of type 2). Then there are

n!

n1!× n2!× · · · × nk!

Different potential orders

4.3 Binomial Probability Distributions

Suppose I flip a coin 100 times. What is the probability it lands on heads precisely 57times? Well, what is the likelihood of getting 57 straight heads, and then 43 straighttails?

P (H)57 × P (T )43 = .557 × .543 = .5100

But of course, we know that the order of the heads and the order of the tails donot matter. There are many, many potential ways to order 57 heads and 43 tails.Specifically:

100!

57!× 43!

P (57 heads out of 100 flips) =100!

57!× 43!︸︷︷︸# of orders

× .557 × .543︸︷︷︸Probability of 1 order

= .03

General Case of Binomial Probability DistributionConsider a series of n independent trials, each resulting in one of two possible out-comes, “success” or “failure.” Let p = P (success occurs at any given trial) and as-sume that p remains constant from trial to trial. Then

P (k success) =n!

(n− k)!k!︸︷︷︸# of orders

×

P (1 order with k successes)︷︸︸︷pk × (1− p)n−k , k = 0, 1, . . . , n

We can break down this equation into several component parts:

P (k successes) =# of ways to arrange k successes and n− k failures×Prob of one particular order with k successes n− k failures


4.3.1 Binomial Distribution Example

Suppose you roll a die three times. If it lands on 6 any of those four times, you win,if it doesn’t you lose. What is the sample space and the probability of each outcome.Let S be a six, and N be not a six:Sample Outcome Probability

S-S-S 16× 1

6× 1

6= 1/63

S-S-N 16× 1

6× 5

6= 5/63

S-N-N 16× 5

6× 5

6= 25/63

N-N-N 56× 5

6× 5

6= 125/63

N-S-N 56× 1

6× 5

6= 25/63

N-N-S 56× 5

6× 1

6= 25/63

N-S-S 56× 1

6× 1

6= 5/63

S-N-S 16× 5

6× 1

6= 5/63

There are four possible numbers of successes: 0, 1, 2, 3. Let’s go through theprobability of each one:

• P (0 sixes)The likelihood of a particular order = 125/63

The number of potential orders (N-N-N) = 1Probability = 1× 125/63

• P (1 six)The likelihood of a particular order = 25/63

The number of potential orders (N-N-S, N-S-N, S-N-N) = 3Probability = 3× 25/63


The number of potential orders (N-S-S, S-S-N, S-N-S) = 3Probability = 3× 5/63


The number of potential orders (S-S-S) = 1Probability = 1× 1/63

Instead of doing all this work, we can just plug it in to the binomial probablitydistribution formula

P (k sixes) =3!

k!(3− k)!(1/6)k(5/6)3−k


So for example, the probability of one six would be:

3!

2!1!(1/6)1(5/6)2 = 3× (1/6)× (5/6)2 = 75/63

Chapter 5

Online Private Stats Tutors NY

5.1 Review of Binomial Probability Distributions

General Case of Binomial Probability DistributionConsider a series of n independent trials, each resulting in one of two possible out-comes, “success” or “failure.” Let p = P (success occurs at any given trial) and as-sume that p remains constant from trial to trial. Then

P (k success) =n!

(n− k)!k!︸︷︷︸# of orders

×

P (1 order with k successes)︷︸︸︷pk × (1− p)n−k , k = 0, 1, . . . , n

We can break down this equation into several component parts:

P (k successes) =# of ways to arrange k successes and n− k failures×Prob of one particular order with k successes n− k failures

5.1.1 Binomial Probability Example

Suppose you are forming a statistics study group to work on the latest problemset. Based on past experience forming study groups, 70 percent of the people youinvite to join the group do in fact join. You invite five people to join, what isthe probability that 3 people join? You can quickly solve this using the binomialprobability distribution:

• n is 5

27


• p is .7

• k is 3

P (k = 3|n = 5, p = .7) =5!

3!(5− 3)!︸︷︷︸# orders w/ k=3

× (.7)3 × (1− .7)5−3︸︷︷︸P of one order

= .3087

5.2 Random Variables

Discrete Random Variable A function that transforms each sample outcome froman experiment into a number. For example, suppose I roll a die three times, thatis an experiment with 63 sample outcomes. A random variable would be the # oftimes it lands on a six. The random variable would only have four outcomes:Value Probability

0 125/63

1 75/63

2 15/63

3 1/63

The mapping above between potential values of the random variable and the prob-ability of each value is called a probability function. In this case because this is adiscrete random variable, it is a discrete probability function. Discreteprobability functions are best represented as bar graphs, where the X axis is thedifferent potential values of the discrete random variable, and the height of thebars (they Y axis) corresponds to their probabilities. Figure 5.1 is a bar graphshowing the probability function for the number of sixes on three rolls of a die.Figure 5.2 shows the probability function for the number of heads on 100 coin flipand 5.3 shows the probability function for the number of sixes when rolling a die100 times. you can work with Statistics tutors in New York city. We hire top gradu-ates from NYU, Columbia, Brown, Yale.


Figure 5.1: # of sixes when rolling a die 3 times

0.00

0.25

0.50

0.75

1.00

0 1 2 3

values

prob

abili

ty

Figure 5.2: # of heads when flipping a coin 100 times

0.00

0.02

0.04

0.06

0.08

0 25 50 75 100

values

prob

abili

ty


Figure 5.3: # of sixes when rolling a die 100 times

0.00

0.03

0.06

0.09

0 25 50 75 100

values

prob

abili

ty


5.3 Expected Value

Suppose you are offered the following investment opportunity: you pay $1 and youwill get back $S. Where S is the number of sixes when rolling a die three time.Should you invest?

Probability of each value of SP(S=0)=125

216P(S=1)= 75

216P(S=2)= 15

216P(S=3)= 1

216

In order to determine whether or not it’s a good investment, you need to determinethe expected value of S, E[S].

E[S] = 0× 125

216+ 1× 75

216+ 2× 15

216+ 3× 1

216

Expected value of a discrete random variableExpected value of a discrete random variable X, that takes on values v1 through vnis

E[X] = P (X = v1)× v1 + · · ·+ P (X = vn)× vn =n∑i=1

P (X = vi)× vi

5.3.1 Expected Value Example

As a reminder, a random variable is a function that converts sample outcomes, intonumbers. For example, suppose I roll two dice. There 36 possible sample outcomes.There are different functions that can convert each outcome to a number:

Random variable A Sum of the two dieRandom variable B Product of the two dieRandom variable C Absolute difference between the two die

A, B, and C are functions that take sample outcomes as input, and output numbers.Suppose we roll a two and a four:

A(roll 2 & 4) = 6

B(roll 2 & 4) = 8

C(roll 2 & 4) = 2


What is the E[C]?Difference Probability

0 6/361 10/362 8/363 6/364 4/365 2/36

E[C] = 0×6/36+1×10/36+2×8/36+3×6/36+4×4/36+5×2/36 = 70/36 ≈ 1.94

Chapter 6

Online Lessons in Stats in NY

6.1 Expected Value Principles

6.1.1 Expected Value of cX + k

Suppose I roll a die, and the random variable D is the number that shows on thatdie. What is?

E[cD + k]

Where c and k are constants?

E[cD + k] =n∑i=1

(cDi + k)p(Di) =cn∑i=1

Dip(Di) +n∑i=1

kp(Di)

=cE[D] + kn∑i=1

p(Di)︸︷︷︸=1

=cE[D] + k

This generalizes to, for any random variable X and constants c and k:

E[cX + k] = cE[X] + k

6.1.2 Expected Value of a Sum

Suppose I flip a coin three times, let H be the number of heads. Suppose also thatI roll a die. Let D be the number that shows. What is?

E[H +D]

33


E[H +D] =3∑i=0

6∑j=1

(Hi +Dj)P (Hi ∩Dj)

=3∑i=0

6∑j=1

HiP (Hi ∩Dj) +3∑i=0

6∑j=1

DjP (Hi ∩Dj)

=3∑i=0

Hi

6∑j=1

P (Hi ∩Dj)︸︷︷︸P (Hi)

+6∑j=1

Dj

3∑i=0

P (Hi ∩Dj)︸︷︷︸P (Dj)

= E[H] + E[D]

Generalizable principle:

E

[n∑i=1

Xi

]=

n∑i=1

E[Xi]

6.1.3 Expected Value of a Binomial Random Variable

Suppose you make an investment that has the following properties:

• Every year it either increases in value by $1 or remains the same value. Itincreases with a probability p, and remains the same with a probability 1− p.

• The probability p does not change from year to year.

What is the expected return on the investment over the course of 10 years?Answer The returns over 10 years are a random variable with a binomial distribu-tion. The probability function will be

P (R = k) =10!

(10− k)!k!pk(1− p)10−k

So its expected value would be:

E[R] =10∑k=0

k × P (R = k)

That would take a long time to calculate, so let’s figure out what it will be in a moreintuitive way. Define R1 as the return in year 1, and Ri as the return in year i. Then

R =n∑i=1

Ri → E[R] = E

[n∑i=1

Ri

]=

n∑i=1

E[Ri] = 10× E[Ri]


So now all we have to do is find E[Ri].

E[Ri] = 0× (1− p) + 1× p = p→ E[R] = 10p

Generalizable PrincipleExpected value of a binomial random variable X with n trials each with a probabilityof success p, is np

E[X] = E[X1 +X2 + · · ·+Xn] = E[X1]︸︷︷︸=p

+ . . . E[Xn]︸︷︷︸=p

= p+ · · ·+ p = np

6.2 Probability distribution versus Cumulative Dis-

tribution Functions

Recall the random variable C equal to the absolute difference of two dice that arerolled. It has the following probability distribution function that we alreadyderived:Difference Probability

0 6/361 10/362 8/363 6/364 4/365 2/36

We would write the probability distribution function for C as:

pC(c) = P (C(s) = c)

Where s is some sample outcome, C is the random variable (function) that convertssample outcomes into numbers. So, to give a few examples:

pC(2) = 8/36 pC(3) = 6/36 pC(5) = 2/36

We also can derive a cumulative distribution function, which is defined as theprobability of a random variable being less than or equal to a given number:

FC(c) = P (C(s) ≤ c)

So for example

FC(4) = pC(0) + pC(1) + pC(2) + pC(3) + pC(4) = 34/36

FC(2) = pC(0) + pC(1) + pC(2) = 24/36


6.2.1 Example

Suppose you roll two dice, let the random variable X be the larger of the two. FindFX(t).

FX(t) = P (X ≤ t) = P (D1 ≤ t ∩D2 ≤ t) = P (D1 ≤ t)︸︷︷︸t/6

×P (D2 ≤ t)︸︷︷︸t/6

=t2

36

Given this cumulative probability function, what is the probability of the following:

P (2 < X ≤ 5) P (X > 4)

P (2 < X ≤ 5) = FX(5)− FX(2) =25

36− 4

36=

21

36

P (X > 4) = 1− FX(4) = 1− 16

36=

20

36

6.3 Interpreting Expected Value

There are approximately 8 million people who live in New York City. If you wantedto find the average income of people in NYC, you would add up the incomes of all 8million people and divide by the number of people (8 million). In fact, the averageincome in New York City is $64,000. Suppose you were to randomly choose 1 personliving in NYC and ask them their income. Let I be the income of the randomlychosen person? What would E[I] equal? If Ii is the income of the ith person, andp(i) is the probability that person i is chosen, then you will have

E[I] =8 million∑i=1

Iip(i)

If you are randomly choosing from the population, then p(i) = 1/8 million for eachperson living in NYC. Therefore

E[I] =I1 + I2 + . . . I8 million

8 million= Average Income

The average is equivalent to the expected value of a randomly chosen observation,when each observation has an equal probability of being randomly chosen.

Chapter 7

Statistics Tutors in New York

7.1 Variance

Even if the average income in NYC is $64,000, there is still a lot more to know aboutincomes in New York. For example, a city with a couple hundred people makingover $1 billion per year, and then everyone else making an average of $40,000 peryear could have the same average income, as a city where everyone makes between$60,000 and $70,000 per year. The expected value is a way to measure a centralvalue of a distribution, but we need variance to measure dispersion.

To use another example, compare two binomial distributions.

1. One where n = 1, 000 and p = .05

2. One where n = 80 and p = 5/8

Both distributions would have an expected value of 50, but if you compare theprobability distributions, in Figure 7.1, you will see that for one, the probability ofbeing far from the expected value is much larger than the other. For example, theprobability of 60 successes for the first distribution is:

P (k = 60|n = 1, 000, p = .05) =1, 000!

60!940!(.05)60(.95)940 = 0.0197

But for the second distribution, 60 successes is much more unlikely:

P (k = 60|n = 80, p = 5/8) =80!

60!20!(5/8)60(3/8)20 = 0.0061

Loosely, variance is a way of measuring how much a typical observation differs fromthe expected value. For some random variables, the most likely outcomes are pretty

37


close to the expected value, for others many of the most likely outcomes are very far.

Definition: VarianceIf X is a random variable:

V ar[X] = E[(X − E[X])2] =n∑i=1

(Xi − E[X])2 × p(X = Xi)

7.1.1 Variance Examples

1. X is the number that shows when you roll a die.

E[X] =6∑i=1

i× P (X = i) = 1× 1

6+ 2× 1

6+ . . . 6× 1

6= 3.5

V ar(X) =6∑i=1

(i−E[X])2×P (X = i) = (1−3.5)2×1

6+(2−3.5)2×1

6+. . . (6−3.5)2×1

6= 2.91666

2. Y is a binomial random variable with n = 1 and p = .6

E[Y ] = n× p = .6

V ar(Y ) =1∑y=0

(y−E[Y ])2P (Y = y) = (0−.6)2×(1−.6)+(1−.6)2×.6 = .36×.4+.16×.6 = .24

7.1.2 Principles of Variance

Var[X] = E[X2]− E[X]2

Derivation:

V ar[X] = E[(X−E[X])2] = E[X2−2XE[X]+E[X]2] = E[X2]−2E[X]E[X]+E[X]2 = E[X2]−E[X]2

Let X be a random variable, and c be some constant. Var[cX] = c2Var(X)Derivation:

V ar(cX) = E[(cX)2]− E[cX]2 = c2E[X2]− c2E[X]2 = c2 (E[X2]− E[X]2)︸︷︷︸=V ar(X)


Figure 7.1: Probability Distributions with the Same Expected Value

1 2

0 25 50 75 100 0 25 50 75 100

0.000

0.025

0.050

0.075

index

dens

ities

7.1.3 Expected Value and Variance with Real World Data

Let LE by the life expectancy of a randomly chosen European country, and LA bethe life expectancy of a randomly chosen African country. Using Figure 7.3, answerthe following questions:What is higher, E[LE] or E[LA]?What is higher, V ar(LE) or V ar(LA)?


Figure 7.2: Log of GDP per Capita and Life Expectancy for All Countries in 2007

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

● ●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●●

40

50

60

70

80

6 7 8 9 10 11Log(GDP per Capita)

Life

Exp

ecta

ncy Continent

●

●

●

●

●

AfricaAmericasAsiaEuropeOceania

Income and Life Expectancy by Country


Figure 7.3: Log of GDP per Capita and Life Expectancy for Europe and Africa in2007

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

40

50

60

70

80


Life

Exp

ecta

ncy

Continent●

●

AfricaEurope


Work as Statistics Tutor in New York


7.2 Continuous Random Variables

• So far we have dealt only with discrete sample spaces. They are discretebecause the sample outcomes are:

– Finite (i.e. rolling a die two times has 36 outcomes)

– Infinite but countable (i.e. # of times I roll a die until I get a 4, which ispotentially infinite)

• Continuous sample spaces are uncountable

– Amount of time until your computer stops working

– Length in time of a randomly selected telephone call

Discrete or Continuous?

1. Weight of a Chipotle burrito

2. Number of black beans in a Chiptole burrito

3. Number of students who attend a professor’s office hours

4. Total time professor spends with students

Continuous variables can be infinitely precise. For example, a professor could spend3.456793854893 hours with students.

Discrete ContinuousEach outcome has a probability Each interval has a probability

Prob. of outcome= px(xi) Prob. of interval =∫ jipx(x)dx

1 =∑n

i=1 px(xi) if n outcomes 1 =∫∞−∞ px(x)dx

E[X] =∑n

i=1 xipx(xi) E[X] =∫∞−∞ xpx(x)dx

V ar(X) =∑n

i=1(xi − E[X])2px(xi) V ar(X) =∫∞−∞(x− E[X])2px(x)dx

Graphing a Probability Density Function Let X be a continuous random vari-able uniformly distributed between 0 and 10. What is p(x) if x is between 0 and10?

Stats Tutoring Jobs online


0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

−2 0 2 4 6 8 10 12X

p(X

) ak

a D

ensi

ty

1 =

∫ 10

0

p(x)dx =

∫ 10

0

1

10dx =

1

10× 10− 1

10× 0

What is the probability that X is between 7 and 8?


0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

−2 0 2 4 6 8 10 12X

p(X

) ak

a D

ensi

ty

∫ 8

7

1

10dx =

1

10× 8− 1

10× 7 =

1

10

• What is the probability that X is between 3 and 7?

∫ 7

3

1

10dx =

1

10× 7− 1

10× 3 =

4

10

• What is the probability that X is 7.64839235923?

∫ 7.64839235923

7.64839235923

1

10dx = 0

Since X is continuous, p(X) is considered a probability density function or pdf

Chapter 8

Stats Tutors in NY Jobs

For each probability density function there is also a cumulative density function(or CDF).

• pX(x) is a probability density function such that∫ ∞−∞

pX(x)dx = 1

• F (t) is the cumulative density function corresponding to f(x) if

FX(t) =

∫ t

−∞pX(x)dx

The CDF, F (t) is the probability of having a value below t.

FX(t) = P (X < t)

P (c < X < c+ k) = F (c+ k)− F (c)

Write in terms of a cumulative distribution function the following:

1. Probability that a random variable X is greater than a value t (P (X > t))Answer = 1− F (t)

2. 1Answer F (∞)

3. 0Answer = F (−∞)

45

CHAPTER 8. LECTURE 8: FEBRUARY 21ST, 2019 46

Suppose the probability function is defined as:

pX(x) =4x3 if 0 ≤ x ≤ 1

=0 if x > 1 or x < 0

0

1

2

3

4

0.00 0.25 0.50 0.75 1.00x

p(x)

1. What is the CDF, F (x)?Answer Use the anti-derivative FX(x) = x4

2. What is F (1/2)?Answer F (1/2) = (1/2)4 = 1/16

p(x) =4x3 if 0 ≤ x ≤ 1

=0 if x > 1 or x < 0

p(x) =4x3 if 0 ≤ x ≤ 1

=0 if x > 1 or x < 0


• What is the probability .25 < x < .75?

F (.75)− F (.25) = .754 − .254 = .3125

8.1 Joint Distribution Function

Let S be a sample space with sample outcomes denoted by s. Let X and Y berandom variables (functions) that translate sample outcomes into real numbers:

X(s) = a real # and Y (s) = a real #

Then the joint distribution function of X and Y is:

pX,Y (x, y) = P (the set of s such that X(s) = x and Y (s) = y)

For example, if you were to look at A and B from the above section:

pA,B(1, 1) = P (A = 1 ∩B = 1) =2

8

8.1.1 Example of Jointly Distributed Random Variables

3 coin flips, let A be number of heads on the first two flips, B be the number of tailssquared on the second two squared

1st 2nd 3rd A BT T T 0 4H T T 1 4T H T 1 1T T H 0 1H H T 2 1H T H 1 1T H H 1 0H H H 2 0

E[A] = 0× 2

8+ 1× 4

8+ 2× 2

8= 1

E[B] = 0× 2

8+ 1× 4

8+ 4× 2

8= 1.5


8.1.2 Definition of Covariance

If X and Y are jointly distributed random variables, their covariance is:

Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

Loosely, the covariance describes how two jointly distributed variables move to-gether.

X is E[Y |X] is Covariance isAbove E[X] Above E[Y] PositiveAbove E[X] Below E[Y] NegativeBelow E[X] Below E[Y] PositiveBelow E[X] Above E[Y] NegativeAnything E[Y] 0

1st 2nd 3rd A B (A− E[A])(B − E[B])T T T 0 4 (0− 1)× (4− 1.5) = −2.5H T T 1 4 (1− 1)× (4− 1.5) = 0T H T 1 1 (1− 1)× (1− 1.5) = 0T T H 0 1 (0− 1)× (1− 1.5) = .5H H T 2 1 (2− 1)× (1− 1.5) = −.5H T H 1 1 (1− 1)× (1− 1.5) = 0T H H 1 0 (1− 1)× (0− 1.5) = 0H H H 2 0 (2− 1)× (0− 1.5) = −1.5

Covariance= (−2.5 + .5− .5− 1.5)/8− .5

8.1.3 Conditional Probability Function

Suppose you have a discrete random variable X, and a discrete random variable Ythat are jointly distributed:

pY |x(y) = P (Y = y|X = x) =pX,Y (x, y)

pX(x)

This also implies thatpY |x(y)pX(x) = pX,Y (x, y)


With A and B from above, the conditional probability function pA|B=1(a) would be:

pA|B=1(0) =P (A = 0|B = 1) = 1/4

pA|B=1(1) =P (A = 1|B = 1) = 2/4

pA|B=1(2) =P (A = 2|B = 1) = 1/4

(8.1)

8.1.4 Independent Random Variables

Two random variables X and Y are independent if for any two constants c and k

pX|Y=y(x) = pX(x)

Given that the joint distribution function of A and B is pA|b(a)pB(b), this impliesthat for independent random variables, the joint probability function is simply theproduct of the individual probability functions.

pX,Y (x, y) = pX(x)pY (y)

Are A and B from the above example independent of each other?

8.1.5 Example: Discrete Joint Probability Function

Values of BValues of A 0 1 4

0 0 1/8 1/81 1/8 2/8 1/82 1/8 1/8 0

• What is P (B = 4|A = 1)?

P (B = 4|A = 1) =1/8

1/8 + 2/8 + 1/8=

1

4

• What is P (A = 1|B = 4)?

P (A = 1|B = 4) =1/8

1/8 + 1/8= 1/2


• Find pA|B=1(a)

pA|B=1(0) =P (A = 0|B = 1) =1/8

1/8 + 2/8 + 1/8= 1/4

pA|B=1(1) =P (A = 1|B = 1) =2/8

1/8 + 2/8 + 1/8= 2/4

pA|B=1(2) =P (A = 2|B = 1) =1/8

1/8 + 2/8 + 1/8= 1/4

(8.2)

• Find pA(a) and pB(b)

a pA(a)0 0 + 1/8 + 1/8 = 2/81 1/8 + 2/8 + 1/8 = 4/82 1/8 + 1/8 + 0 = 2/8

b pB(b)0 0 + 1/8 + 1/8 = 2/81 1/8 + 2/8 + 1/8 = 4/84 1/8 + 1/8 + 0 = 2/8

• Are A and B independent?No. There are two ways to tell.

1. pA(a)× pB(b) 6= pA,B(a, b) you can check this by picking various values ofa and b. (Note by random chance, pA(1) × pB(1) = pA,B(1, 1), but forthe other values of A and B this is not the case which makes them notindependent)

2. pA|B=b(a) 6= pA(a) you can check this by picking values of a and b. (Noteby random chance, pA|B=1(a) = pA(a) and pB|A=1(b) = pB(b), but whenyou condition on other values of A and B this is not the case which makesthem not independent)

Stats Tutors in New Jersey forPrinceton University

Chapter 9

Stats tutors for UPenn

9.1 Joint Distribution Functions Continued

9.1.1 Summation (Integration) of Joint Probability is theIndividual Probability

For discretep(x) =

∑all y

p(x, y) and p(y) =∑all x

p(x, y)

For continuous

p(x) =

∫ ∞−∞

p(x, y)dy and p(y) =

∫ ∞−∞

p(x, y)dx

9.1.2 Summing Over Both Variables Must Equal 1∑All X

∑All Y

p(X, Y )︸︷︷︸p(X)

=∑All X

p(X) = 1

∫ ∞−∞

∫ ∞−∞

p(X, Y )dx︸︷︷︸p(Y )

dy =

∫ ∞−∞

p(Y )dy = 1

51


9.1.3 Conditional Expectation

Let X and Y be two discrete random variables. Let x1, x2, . . . , xn are the potentialvalues of X, and y1, y2, . . . , yn the potential values of Y .

E[X|Y = yi] =n∑i=1

xipX|yi(xi)

Conditional expectation of X is exactly the same as E[X], except it uses a conditionalprobability function instead of the unconditional probability function. Returning tothis joint probability function:

Values of BValues of A 1 2 3 4

1 1/12 0 1/4 1/122 1/3 1/6 0 1/12

• What is E[A|B = 1]?

2∑a=1

a× pA|B=1(a) = 1×

pA|B=1(1)︷︸︸︷1/12

1/12 + 1/3+2×

pA|B=1(2)︷︸︸︷1/3

1/12 + 1/3= 9/5

• What is E[A|B = 4]?

2∑a=1

a× pA|B=4(a) = 1×

pA|B=4(1)︷︸︸︷1/12

1/12 + 1/12+2×

pA|B=4(2)︷︸︸︷1/12

1/12 + 1/12= 1.5

9.2 Principles of Expectation, Variance, and Co-

variance

• If X and Y are two independent random variables

E[XY ] = E[X]E[Y ]

Proof:E[XY ] =

∑all x

∑all y

xypX,Y (x, y)


E[XY ] =∑all x

∑all y

xypX(x)pY (y)

E[XY ] =∑all x

xpX(x)∑all y

ypY (y)︸︷︷︸E[Y ]

E[XY ] =∑all x

xpX(x)E[Y ] = E[Y ]∑all x

xpX(x)︸︷︷︸E[X]

Same logic applies with continuous random variables:

E[XY ] =

∫ ∞−∞

∫ ∞−∞

xypX,Y (x, y) =

∫ ∞−∞

∫ ∞−∞

xypX(x)pY (y)

E[XY ] =

∫ ∞−∞

xpX(x)

∫ ∞−∞

ypY (x) = E[X]E[Y ]

• If X and Y are random variables then

Cov(XY ) = E[XY ]− E[X]E[Y ]

Proof

Cov(X, Y ) =E[(X − E[X])(Y − E[Y ])]

= E[XY − E[X]Y −XE[Y ]− E[X]E[Y ]]

= E[XY ]− E[X]E[Y ]− E[X]E[Y ] + E[X]E[Y ]

Therefore what is the covariance of independent random variables? Zero!!

Cov(X, Y ) = E[XY ]− E[X]E[Y ] = E[X]E[Y ]− E[X]E[Y ] = 0

• If X is a random variable

V ar(X) = E[X2]− E[X]2

• If X and Y are jointly distributed random variables:

V ar(aX + bY ) = a2V ar(X) + b2V ar(Y ) + 2abCov(X, Y )


Proof:

V ar(aX + bY ) =E[(aX + bY )2]− E[aX + bY ]2

=E[a2X2 + 2aXbY + b2Y 2]− (E[aX] + E[bY ])2

=a2E[X2] + 2abE[XY ] + b2E[Y 2]

− (E[aX]2 + 2E[aX]E[bY ] + E[bY ]2)

= a2E[X2]− a2E[X]︸︷︷︸a2V ar(X)

+ b2E[Y 2]− b2E[Y ]2︸︷︷︸b2V ar(Y )

+ 2abE[XY ]− 2abE[X]E[Y ]︸︷︷︸2abCov(X,Y )

9.3 Application to Finance

You are trying to choose how to allocate a portfolio of $1,000. There are five potentialinvestments: A, B, C. The following table gives the expected rate of return in eachinvestment and its standard deviation:

Investment Rate of Return Variance of ReturnA .05 .07B .05 .08C .05 .035

A, B each have a minimum investment of $500. C has a minimum investment of$1,000. B and C which have a covariance of −.02. What is the risk-minimizingportfolio? What is the variance of its return?

Answer: The “risk-minimizing” portfolio minimizes the variance of returns. An-swering this correctly requires taking advantage of the principle we just derived:

1. V ar(aX + bY ) = a2V ar(X) + b2V ar(Y ) + 2abCov(X, Y )


If you cut your $1,000 portfolio in half and invest half of it in A, and the other halfin B, the variance of your portfolio overall will be:

V ar

(1

2A+

1

2B

)=V ar

(1

2A

)+ V ar

(1

2B

)+ 2Cov

(1

2A,

1

2B

)=

(1

2

)2

V ar(A) +

(1

2

)2

V ar(B) + 2(1/2)(1/2)Cov(A,B)︸︷︷︸−.02

=.08 + .07 + 2(−.02)

4= .0275

If you allocate 100% of your portfolio to C it will be .035. Despite the fact that A andB have higher individual variances than C, together they produce a safer portfolio(less variance).

9.4 Variance of the Sum of Independent Random

Variables

RecallV ar(X + Y ) = V ar(X) + V ar(Y ) + 2Cov(X, Y )

Cov(X, Y ) = E[XY ]−E[X]E[Y ] = E[X]E[Y ]−E[X]E[Y ]] = 0 if X, Y independent

So the variance of X + Y if they are independent is simply

V ar(X + Y ) = V ar(X) + V ar(Y )

If you have many independent variables, then the variance of all of them addedtogether is the following:

V ar

(n∑i=1

Xi

)=

n∑i=1

V ar(Xi)

Suppose you have a binomial random variable with n of 1,000 and p of .2. What isthe variance?

• Treat each separate trial as a random variable equal to either 0 or 1.

• Find the variance of each trial.

• Add up variances


Let Xi be the outcome of the ith trial

E[Xi] = 0× (1− p) + 1× p = p

V ar(Xi) =(0− p)2 × (1− p) + (1− p)2 × p (9.1)

notag =p2(1− p) + (1− 2p+ p2)p

=p2 − p3 + p− 2p2 + p3

=p− p2 = p(1− p)

Let X be the binomial random variable so:

X =n∑i=1

Xi → V ar(X) = V ar

(n∑i=1

Xi

)=

n∑i=1

V ar(Xi) = n× p(1− p)

So the variance of a binomial random variable with n observations is np(1− p).

9.4.1 Importance of Sample Size

Suppose 55 percent of the population supports candidate A and 45 percent supportcandidate B. The population has 100 million people. Suppose you randomly choose16 people from the population and asked them which candidate they support, whatis the variance in the proportion that support candidate A?Answer:. Let X be the number of people supporting candidate A out of the sixteen,this is essentially a binomial random variable with n = 16 and p = .55. Therefore

E[X] = np = 16× .55 = 8.8 V ar(X) = np(1− p) = 16× .55× (1− .55) = 3.96

Let Y be the proportion of the 16 supporting A.

Y =X

16

What is E[Y ]? What is V ar(Y )?

E[Y ] = E[X/16] = E[X]/16 = 8.8/16 = .55

V ar(Y ) = V ar(X/16) = V ar(X)/162 ≈ .0155

This can be generalized. The variance of a binomial random variable divided by thenumber of success (aka the proportion of success) is:

Variance of binomial random variable

Number of trials squared=np(1− p)

n2=p(1− p)

n


So if we had a sample of 1,000 people instead of just 16, the variance of the proportionsupporting candidate A would be:

.55× (1− .55)

1, 000= .0002475

Chapter 10

MBA Stats tutors in New York

10.1 Standard Deviation

The standard deviation is the square root of the variance. Suppose a discrete randomvariable X has n potential values (x1, x2, x3, . . . , xn):

V ar(X) =n∑i=1

(xi − E[X])2pX(xi)

So the standard deviation is just the square root:

Standard deviation of X =√V ar(X) =

√√√√√ n∑i=1

(xi − E[X])2pX(xi)

Frequently, we use the symbol σ2X to refer to the variance of X, so σ is the standard

deviationV ar(X) = σ2

X

Standard deviation of X = σX

Just like the variance, the higher the standard deviation, the higher the probabilitythat the random variable will be equal to values far from the expected value. Thesmaller the standard deviation, the higher the probability that the random variablewill be equal to a value close to its expected value.

58


10.2 Normal Distribution

The normal or Gaussian distribution. Let X be a normal random variable with thefollowing parameters:

E[X] = µ V ar(X) = σ2

pX(x) =1√

2π × σe−12×(x−µσ )

2

The normal distribution forms an extremely famous bell curve:

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4

z

p(z)

One common way to analyze normally distributed random variables is to convertthem into z-scores. A z score is essentially the number of standard deviations awayfrom the mean of a random variable a value is. Let X be a normal random variable:

z =X − µxσx

By construction, the z distribution is a normally distributed random variable withµ = 0 and σ = 1. This produces a greatly simplified pdf:

pZ(z) =1√2πe−12z2


Approximately 68% of the normal distribution is within one standard deviation fromthe mean ∫ 1

−1

1√2πe−12z2

dz ≈ .68

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4

z

f(z)

Probability: 0.682689492137086

Approximately 95% is within 2 (more precisely 1.96):∫ 1.96

−1.96

1√2πe−12z2

dz ≈ .95


0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4

z

f(z)

Probability: 0.950004209703559

Approximately 2.5% of the distribution is 1.96 or more standard deviations abovethe mean: ∫ ∞

1.96

1√2πe−12z2

dz ≈ .025


0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4

z

f(z)

Probability: 0.0249662239063873

When there is a large n, a binomial random variable is well-approximated by a normalrandom variable:


0.00

0.02

0.04

0.06

0.08

0 25 50 75 100

values

prob

abili

ty

What is a quick and easy way to approximate the likelihood of getting between 40and 60 heads on 100 coin flips?

σ2H = 100× .5× (1− .5) = 25

σH =√

25 = 5

µH = 100× .5 = 50

Find the z scores of 40 and 60:

60− µHσH

=60− 50

5= 2

40− µHσH

=40− 50

5= −2

Based on properties of normal distribution, the probability of having a z score be-tween 2 and -2 is ≈95%.


10.3 Standard Normal Distribution

Suppose X is a normal random variable with mean µ and variance σ2. What is

E

[X − µσ

]=??

E

[X − µσ

]=E[X/σ]− E[µ/σ]

=E[X]/σ − µ/σ=µ/σ − µ/σ = 0

What is What is

V ar

(X − µσ

)=??

V ar

(X − µσ

)=V ar(X/σ) + V ar(−1× µ/σ) + 2Cov(X/σ,−1× µ/σ)

=V ar(X)

σ2+

(−1× µσ

)2

V ar(1)︸︷︷︸=0

+2× −1× µσ

Cov(X, 1)︸︷︷︸=0

=σ2

σ2= 1

We call a normal distribution with a mean of 0 and a standard deviation/varianceof 1 the z, or standard normal distribution.

10.4 Extended Example of Using Normal Distri-

bution

• Suppose NYU aims to have a freshman class of 5,000.

• The probability of a student admitted to NYU choosing to enroll is 50%

• NYU admits 10,000 students per year

• If 5,100 students enroll, the university will not have enough space for all thestudents. If 4,950 students enroll they will need to get students off the waitlist.


What is the likelihood that too many students enroll? What is the likelihood theytake students off the wait list?

• This is a binomial random variable with n = 10, 000 and p equal to .5.

P (Students = k) = PS(k) =10000!

k!(10000− k)!× .5k(1− .5)10000−k

This is a binomial random variable with n = 10, 000 and p = .5

V ar(S) = np(1− p) = 10, 000× .52 = 2, 500

V ar(S) = np(1− p) = 10, 000× .52 = 2, 500→ SD(S) =√

2, 5000 = 50

• When there are many independent trials (n is large) the binomial distributioncan be approximated with a normal distribution with the following parameters:

µ = E[S] = np = 10, 000× .5 = 5, 000

σ =√V ar(S) =

√np(1− p) =

√10, 000× .5× .5 = 50

The pdf of a normal distribution is:

pS(x) =1√2πσ

e−(x−µ)2/(2σ2) −∞ < x <∞

So the pdf of the number of enrolled students would be:

pX(x) =1√

2π × 50e−(x−5,000)2/(2(50)2) −∞ < x <∞

P (S ≥ 5, 100) ≈∫ ∞

5,100

1√2π × 50

e−(x−5,000)2/(2(50)2)dx

• z is the standard normal distribution, a normal distribution with µ = 0 andσ = 1.

• Any normal random variable X with mean µX and standard deviation σX canbe transformed into z in the following way:

z =X − µXσX


• z is essentially the number of standard deviations a normal random variable isfrom its expected value.

P (−1 < z < 1) ≈ .68

P (−2 < z < 2) ≈ .95

P (−3 < z < 3) ≈ .997

P (z < k) =

∫ k

−∞

1√2πe−(x)2/(2)dx

• Translate 5,100 and 4,500 into z scores using µ = 5, 000 and σ = 50

z =5, 100− 5, 000

50= 2

z =4, 950− 5, 000

50= −1

Use those z scores to find the probability of having too many students and of needingto use the wait list:

P (Waitlist) = P (z < −1) ≈ .1587

P (Too many students) = P (z > 2) = 1− P (z < 2) ≈ .02275

10.5 Example of Using Normal Distribution: Air-

line Passengers

• An airplane has 168 seats.

• 178 tickets were sold for its next flight

• For past flights on this route 90% of passengers who bought tickets show upfor the flight.

What is the likelihood that more than 168 passengers will show up to claim a seaton the plane?We can treat the number of passengers who show up, P , as a binomial randomvariable with n = 178 and p = 90%.

PP (k) = P (Passengers = k) =178!

k!(178− k)!× .9k(1− .9)178−k


P (Passengers > 168) = 1−178∑i=169

PP (i)

Seems like a lot of arithmetic and kind of boring. Is there an easier way to approxi-mate this?

E[P ] = np = 178× .9 = 160.2

V ar(P ) = np(1− p) = 178× .9× (1− .9) = 16.02

With a large n, the binomial distribution is approximately normal withµ = E[P ] and σ =

√V ar(P ) What is the z score of having 168.5 passengers (one

too many)?

z =168.5− 160.2√

(16.02≈ 2.199

P (z > 2.199) = 1− P (z < 2.199) = P (z < −2.199) = 1− .9861 = .0139

The actual probability of getting between 169 and 178 for a binomial random variablewith n = 178 and p = .9 is .0132, slightly lower than the approximation.


Figure 10.1: An example of a z-table

Table entry

Table entry for z is the area under the standard normal curveto the left of z.

Standard Normal Probabilities

z

z .00

–3.4–3.3–3.2–3.1–3.0–2.9–2.8–2.7–2.6–2.5–2.4–2.3–2.2–2.1–2.0–1.9–1.8–1.7–1.6–1.5–1.4–1.3–1.2–1.1–1.0–0.9–0.8–0.7–0.6–0.5–0.4–0.3–0.2–0.1–0.0

.0003

.0005

.0007

.0010

.0013

.0019

.0026

.0035

.0047

.0062

.0082

.0107

.0139

.0179

.0228

.0287

.0359

.0446

.0548

.0668

.0808

.0968

.1151

.1357

.1587

.1841

.2119

.2420

.2743

.3085

.3446

.3821

.4207

.4602

.5000

.0003

.0005

.0007

.0009

.0013

.0018

.0025

.0034

.0045

.0060

.0080

.0104

.0136

.0174

.0222

.0281

.0351

.0436

.0537

.0655

.0793

.0951

.1131

.1335

.1562

.1814

.2090

.2389

.2709

.3050

.3409

.3783

.4168

.4562

.4960

.0003

.0005

.0006

.0009

.0013

.0018

.0024

.0033

.0044

.0059

.0078

.0102

.0132

.0170

.0217

.0274

.0344

.0427

.0526

.0643

.0778

.0934

.1112

.1314

.1539

.1788

.2061

.2358

.2676

.3015

.3372

.3745

.4129

.4522

.4920

.0003

.0004

.0006

.0009

.0012

.0017

.0023

.0032

.0043

.0057

.0075

.0099

.0129

.0166

.0212

.0268

.0336

.0418

.0516

.0630

.0764

.0918

.1093

.1292

.1515

.1762

.2033

.2327

.2643

.2981

.3336

.3707

.4090

.4483

.4880

.0003

.0004

.0006

.0008

.0012

.0016

.0023

.0031

.0041

.0055

.0073

.0096

.0125

.0162

.0207

.0262

.0329

.0409

.0505

.0618

.0749

.0901

.1075

.1271

.1492

.1736

.2005

.2296

.2611

.2946

.3300

.3669

.4052

.4443

.4840

.0003

.0004

.0006

.0008

.0011

.0016

.0022

.0030

.0040

.0054

.0071

.0094

.0122

.0158

.0202

.0256

.0322

.0401

.0495

.0606

.0735

.0885

.1056

.1251

.1469

.1711

.1977

.2266

.2578

.2912

.3264

.3632

.4013

.4404

.4801

.0003

.0004

.0006

.0008

.0011

.0015

.0021

.0029

.0039

.0052

.0069

.0091

.0119

.0154

.0197

.0250

.0314

.0392

.0485

.0594

.0721

.0869

.1038

.1230

.1446

.1685

.1949

.2236

.2546

.2877

.3228

.3594

.3974

.4364

.4761

.0003

.0004

.0005

.0008

.0011

.0015

.0021

.0028

.0038

.0051

.0068

.0089

.0116

.0150

.0192

.0244

.0307

.0384

.0475

.0582

.0708

.0853

.1020

.1210

.1423

.1660

.1922

.2206

.2514

.2843

.3192

.3557

.3936

.4325

.4721

.0003

.0004

.0005

.0007

.0010

.0014

.0020

.0027

.0037

.0049

.0066

.0087

.0113

.0146

.0188

.0239

.0301

.0375

.0465

.0571

.0694

.0838

.1003

.1190

.1401

.1635

.1894

.2177

.2483

.2810

.3156

.3520

.3897

.4286

.4681

.0002

.0003

.0005

.0007

.0010

.0014

.0019

.0026

.0036

.0048

.0064

.0084

.0110

.0143

.0183

.0233

.0294

.0367

.0455

.0559

.0681

.0823

.0985

.1170

.1379

.1611

.1867

.2148

.2451

.2776

.3121

.3483

.3859

.4247

.4641

.01 .02 .03 .04 .05 .06 .07 .08 .09

Chapter 11

Stats Tutors New Haven

11.1 Central Limit Theorem

Let Z be the sum of many independent and identical random variables Xi:

Z = X1 +X2 + . . . Xn =n∑i=1

Xi

Then Z is approximately normally distributed if n is large, regardless of what kindof probability distribution X has.

11.2 Law of Large Numbers

Sample mean is the sum of all observations in a sample divided by the number ofobservations:

X =n∑i=1

Xi/n

V ar(X) = V arn∑i=1

Xi/n

)=

n∑i=1

V ar(Xi/n) =n∑i=1

V ar(Xi)/n2

Since all the observations are drawn from the same sample, they have the sameexpected value and variance.

E[Xi] = µX V ar(Xi) = σ2X

69

CHAPTER 11. LECTURE 11: MARCH 5TH, 2019 70

Therefore

V ar(X) =n∑i=1

V ar(Xi)/n2 =

n∑i=1

σ2/n2 =nσ2

n2= σ2/n

Therefore

Variance of sample mean =σ2

n

Regardless of how large the population variance is, if the sample is sufficiently large,the variance of the sample mean will converge towards zero.

This is a remarkably powerful finding. It means that using a sample of a muchlarger population, we can construct estimates of the population characteristics thatare accurate within a relatively narrow margin of error.

11.3 Example of Using Central Limit Theorem

and LLN

Suppose the following:

• Average income in NYC is $50,000

• Standard deviation of income is $10,000

• The income distribution in NYC is NOT NORMAL

• We draw a sample of 100 people

Answer the following

1. What is the probability that the sample mean is above $52,000 or below$48,000?

2. What is the probability that the sample mean is below $49,000?

3. What would you suspect if your sample mean was $55,000?

Variance of sample mean income =Variance of Income in NYC

100


Recall that the standard deviation of income in NYC, σIncome in NYC, is $10,000. Sovariance of income will be:

Variance of Income in NYC = σ2Income in NYC = 10, 0002

So the standard deviation of the sample mean will be:

σX =√σ2X

=

√10, 0002

100=

10, 000√100

= 1, 000

So X is normally distributed with

• µX = µincome in nyc = $50, 000

• σX = σincome in nyc/√

100 = 1, 000

0e+00

1e−04

2e−04

3e−04

4e−04

40000 50000 60000Income

Pro

babi

lity

Den

sity

distribution sample means population


0e+00

1e−04

2e−04

3e−04

4e−04

40000 50000 60000Income

Pro

babi

lity

Den

sity


≈5% of the potential sample means are above $52,0000 or below $48,000. Whatshould we conclude if the sample mean is $55,000?

0e+00

1e−04

2e−04

3e−04

4e−04

40000 50000 60000Income

Pro

babi

lity

Den

sity


Chapter 12

Brown University Tutors

12.1 Example

• You have a coin with an unknown probability, p, of landing on heads.

• You plan to estimate p by flipping the coin 100 times and using the followingformula:

p =X

100

Where X is the number of times it lands on heads

• Suppose X is 60 and therefore p is .6.

• How can we estimate what the standard deviation of p would be?

V ar(p) = V ar(X/100) = V ar(X)/1002 = 100p(1− p)/1002 = p(1− p)/100

So we can use p to construct an estimate of V ar(p).

V ar(p) = p× (1− p)/100 = .6× (1− .6)/100 = .0024

• What is the estimated standard deviation of p√V ar(p) =

√.0024 = .0499 = σp

• Find a range of values that we can be confident the true p lies within with 95%confidence?

73


AnswerWe know that p is going to be a normally distributed random variable, whichmeans it has a 95% probability of falling in the range between 1.96 standarddeviations below its expected value and 1.96 standard deviations above itsexpected value

P (E[p]− 1.96σp < p < E[p] + 1.96σp) = .95

P (−p− 1.96σp < −E[p] < −p+ 1.96σp) = .95

Multiply the entire inequality by negative 1 (which means you have to flip thedirection of the inequalities)

P (p+ 1.96σp > E[p] > p− 1.96σp) = .95

Of course E[p] = p, so

P (p+ 1.96σp > p > p− 1.96σp) = .95

12.2 Confidence Interval

Suppose we draw a sample of 300 NYU students and ask them the rent they pay

• Sample mean is $1,000

• We happen to know that the standard deviation of rent paid by all NYUstudents is $100.

Answer the following

1. What is the interval, with 95% confidence that we can say the true averagerent of NYU student lies?

P (µ− 1.96× σX < X < µ+ 1.96× σX) = .95

P (µ− 1.96× 100/√

300 < X < µ+ 1.96× 100/√

300) = .95

P (−X − 1.96× 100/√

300 < −µ < −X + 1.96× 100/√

300) = .95

P (X + 1.96× 100/√

300 > µ > X − 1.96× 100/sqrt300) = .95

P (1, 000 + 1.96× 100/√

300 > µ > 1, 000− 1.96× 100/sqrt300) = .95

We can be 95% confident that the true mean rent falls somewhere in the rangebetween 988.68 and 1011.31.

Chapter 13

Lecture 13: March 14th, 2019

13.1 Estimation

Probability distributions can be classified into families. We distinguish betweeneach member of the family based on parameters

• Binomial distribution. Parameters are n and p.

• Normal distribution. Parameters are µ and σ2

We write that X is a normal distribution with parameters µ and σ2 in the followingway:

X ∼ N (µ, σ2)

75


0.0

0.1

0.2

0.3

0.4

−10 −5 0 5 10x

Pro

babi

lity.

Den

sity

Parametersmean = 0, sd = 1mean = 0, sd = 2mean = 0, sd = 3mean = 5, sd = 1

The sample mean (adding up the values of all observations) is an example of anestimator. We use it to estimate the value of µ in the normal distribution.

13.2 Unbiasedness

An estimator is unbiased if its expected value is equal to the parameter it is used toestimate, no matter what the value of that parameter is.

13.2.1 Unbiasedness of a Sample Mean

Suppose we want to. estimate a mean of a population µ based on a sample of.n observations, X1, X2, X3, ..., Xn. One natural way to do this would be with thesample mean, which we will denote with X.

X =

∑ni=1Xi

n

Since the sample mean is based on a. sum of random variables, it is itself a. randomvariable. It may be higher than µ or lower than µ. We want to know if it’s expected


value is equal to. µ. If so, that would mean the estimator is unbiased, meaningthat on average the sample mean is equal to the population mean (µ).

E[X] = E

[∑ni=1Xi

n

]=

∑ni=1

=µ︷︸︸︷E[Xi]

n=n× µn

= µ

Therefore the sample mean, is an unbiased estimator of the population mean, µ.

13.2.2 Extended Example of Unbiasedness

You randomly choose five from a neighborhood in New York City with 100,000 peoplein it. The five people you randomly choose have the following ages:

Person # Age1 372 353 114 75 4

Mean age in the sample:

X =37 + 35 + 11 + 7 + 4

5

Variance of age in the sample:

σ2sample =

(37− X)2 + (35− X)2 + (11− X)2 + (7− X)2 + (4− X)2

5

What if we used the mean and variance of the sample, to estimate the mean ofvariance of age in this New York City neighborhood where these five people weresampled from (This is called method of moments estimation):Estimated mean = X = 18.8

Estimated variance = σ2 =5∑i=1

(Agei − X)2/5 = 202.6


Would these be unbiased estimators? Expectation of X

E[X] =E

[Age1 + Age2 + Age3 + Age4 + Age5

5

]=E[Age1]

5+E[Age2]

5+E[Age3]

5+E[Age4]

5+E[Age5]

5

=µAge + µAge + µAge + µAge + µAge

5= µAge

Therefore X is an unbiased estimator of the mean age in the neighborhood. Ex-pectation of σ2

E[σ2] =E

[(Age1 − X)2 + (Age2 − X)2 + · · ·+ (Age5 − X)2

5

]=E

[Age2

1 − 2Age1X + X2 + · · ·+ Age25 − 2Age5X + X2

5

]=E

[∑5i=1 Age

2i +

∑5i=1−2AgeiX + 5X2

5

]

=E

[∑5i=1 Age2

i

5

]− 2E

[X ×

∑5i=1 Agei

5

]+ E[5X2/5]

=

∑5i=1E[Age2

i ]

5− 2E[X2] + E[X2] = E[Age2]− E[X2]

=

=E[Age2]︷︸︸︷σ2

Age + E[Age]2−(

=E[X2]︷︸︸︷V ar(X)︸︷︷︸σ2

Age/n

+E[X]2︸︷︷︸E[Age]2

) = σ2Age − σ2

Age/5

E[σ2] = σ2Age − σ2

Age/5 =4

5σ2

Age 6= σ2Age

Therefore the estimator

σ2 =5∑i=1

(Agei − X)2/5

is biased. But given that E[σ2] = 4/5×σ2Age, it is very easy to construct an unbiased

estimator of variance:

E

[5

4σ2

]=

5

4E[σ2] = 5/4× 4/5× σ2

Age = σ2Age

5

4σ2 = 5/4×

5∑i=1

(Agei − X)2/5 =n∑i=1

(Agei − X)2/(n− 1)


13.2.3 Intuition Behind the Unbiased Estimator of Variance

This can be generalized to create an unbiased estimator of variance based on asample, σ2:

σ2 =n

n− 1

∑ni=1(Xi − X)2

n=

∑ni=1(Xi − X)2

n− 1

Intuition for why the unbiased estimator of population variance is not the same asthe method of moments estimator:

• We are estimating variance based on deviations from the estimated mean,not the true mean.

• The estimated mean is based on the values of the observations

• The estimated mean will be slightly closer to the observations on average thanthe true mean.

• The smaller the number of observations, the bigger a problem this is.

• We have to multiply by n/(n− 1) to correct for this.

• If the sample size is sufficiently large, this is practically irrelevant


13.3 Efficiency

0.0

0.1

0.2

0.3

0.4

−5 0 5Parameter Estimate

Pro

babi

lity

True value of the parameter is 0. Which estimator, red or blue, is:

• Unbiased?

• Likelier to be closer to the true value of the parameter?

• Suppose we are trying to estimate a parameter θ of a probability distributionbased on a random sample {X1, X2, . . . , Xn}.

• Suppose further that we have two different unbiased estimators: θ1 and θ2

which are both unbiased?

• We say that θ1 is more efficient if the following condition holds:

V ar(θ1) < V ar(θ2)

13.3.1 Example of Efficiency

• Let Y1, Y2, and Y3 be a random sample from a normal distribution.


• We have two estimators of µ: µ1 and µ2

µ1 =Y1

4+Y2

2+Y3

4

µ2 =Y1

3+Y2

3+Y3

3

• Which of the two estimators is unbiased?

• Which of the two estimators is more efficient?

Chapter 14

Stats free lessons in New York

14.1 Hypothesis Testing

• Your friend likes to post on Facebook the most interesting thing she learned inher classes at NYU at the end of each weekday.

• She usually posts at 8 PM

• The number of likes she gets on each post is normally distributed with anaverage of 50 and a standard deviation of 10.

• Someone suggests that if she posts at 8 AM instead of 8 PM she will get morelikes for her posts.

• How could you test if this is true?

She decides to run an experiment:

• She decides to post at 8 AM for the next 20 weeks (100 days)

• At the end of five weeks she collects her data and finds the following:

Sample mean = 51

Sample Variance = 100

• Does posting at 8 AM improve the number of likes she gets?Maybe!

– On the one hand: 51 is higher than the expected sample mean of 50.

82


– On the other hand: The number of likes is random. She could easily getone more like on average through random chance.

– How can we decide if this represents a true change in the number of likesshe gets, or just random chance?

• One question we could ask is: How likely would we be to get a sample mean of51 or higher if posting at 8 AM does absolutely nothing to increase the numberof likes she receives for her posts.

If posting at a different time does not affect the number of likes a post is expectedto get,

• We expect a sample mean using 100 observations to have a normal distribution:

– Mean µ of 50 likes

– Standard deviation of the sample mean of:

σX =10√100

– A sample mean of 51 with 100 observations would have a z-score of 1.

z =51− µσX

=51− 50

1= 1

14.2 Hypothesis Testing Process

• We start out with the hypothesis that changing the time at which posts onFacebook does not affect the number of likes she gets. We call this the nullhypothesis, and denote it with H0.

H0 : Posting at 8 AM will not increase the number of likes

We also have an alternative hypothesis, H1, which refers to H0 not being true.

H1 : Posting at 8 AM will increase the number of likes

• We find the probability of getting the sample mean we got, assuming the nullhypothesis is true.

P (Sample Mean ≥ 51|H0) = .158

We call this the p-value.


Figure 14.1: p-Value for 51 likes

0.0

0.1

0.2

0.3

0.4

45.0 47.5 50.0 52.5 55.0Number of Likes

Pro

babi

lity

Den

sity

Probability: 0.158654967279885


Figure 14.2: p-Value for 51.5 likes

0.0

0.1

0.2

0.3

0.4

45.0 47.5 50.0 52.5 55.0Number of Likes

Pro

babi

lity

Den

sity

Probability: 0.0668069146172862


• If the p-value is sufficiently low, we reject H0 and conclude that the alternativehypothesis is true.

If she had gotten 51.5, 52, or 53 likes, the p value would have been progressivelylower, and we would then be that much more confident that the difference between thesample mean and the pre-existing average (population mean) could not be attributedto random chance. What cutoff in the p-value should we use to decide whether toconclude whether the null hypothesis, H0 is false?

• The higher it is, the more likely you are to incorrectly conclude that the timeof day has an effect on likes, when it is just random chance.

• The lower it is the less likely you are to detect an effect of time of day on thenumber of likes even if one is really there.

• Typically economists use the cutoff of .05

• The cutoff is frequently called the significance level. It is often denoted withthe Greek letter α. So if economist writes α = .05, that means she is using asignificance level of 5%.

– A 5% significance level means that 5% of the time we will reject the nullhypothesis when it is true

P (reject H0|H0 is true) = .05

– The lower the significance level, the harder it is to reject the null hypoth-esis, even when it is false.

In this example, how high would her average likes have to be to have a p value of.05?

P (z > 1.645) = .05

x− 50

10/√

100> 1.645→ x > 51.645

Therefore, if the null hypothesis is true, she has a 5% probability of getting asample mean above 51.65. So if she does, we would reject the null hypothesis.

The significance level of a test is frequently denoted by the greek letter α. So ifyou are doing a test with a 5% significance level, then α = .05. The z-score thatcorresponds to the cutoff for a test with significance level, α, is written zα. So for atest with a 5% sigfnicance level, we would reject if our z score is greater than z.05.For a one-sided test, z.05 = 1.645 (more on what a one-sided test is up next).


Figure 14.3: 5% probability of getting 51.645 or more likes if null is true

0.0

0.1

0.2

0.3

0.4

45.0 47.5 50.0 52.5 55.0

Number of Likes

Pro

babi

lity

Den

sity

Probability: 0.0494711813820763


14.3 One-Sided versus Two Sided Tests

Suppose we were interested not in whether changing the timing of posts increasedthe number of likes, but whether it affected the number of likes at all, positivelyor negatively. In this case our null and alternative hypotheses would be slightlydifferent:

H0 : Posting at 8 AM will not change the number of likes

Recall that before it was:

H0 : Posting at 8 AM will not increase the number of likes

We will also have a new alternative hypothesis:

H1 : Posting at 8 AM will increase or decrease the number of likes

Before we had:

H1 : Posting at 8 AM will increase the number of likes

Why do these distinctions matter?Suppose we were simply interested in whether changing the time of the posts in-creased or decreased the number of likes.

• This is a two-sided test, meaning we could reject the null hypothesis by havinga very high z score (more likes) or by having a very low and negative z score(fewer likes).

• Suppose we used the cutoff we had before 1.645, but also used the negative ofthat, -1.645 as the cutoff for negative z-scores.

• Suppose we reject the null hypothesis if the sample mean is 1.645 standarddeviations above the mean, or if it is 1.645 standard deviations below themean.

• How likely would we be to reject the null hypothesis if it is in facttrue?


0.0

0.1

0.2

0.3

0.4

45.0 47.5 50.0 52.5 55.0Number of Likes

Pro

babi

lity

Den

sity

Probability = 2 times .05 = .1

Likelihood of z > 1.645 OR z < −1.645 is 2× .05 = .1

• If we use 1.645 as the cutoff for a two-sided test, the significance level will be10% instead of the desired 5%.

• To keep the 5% significance level, we need a different cutoff. We need a zα suchthat

2× (1− F (zα)) = .05

Or2× F (−zα) = .05→ F (−zα) = .025

What is the z-score we need to use in a two-tailed test to have a significance level of.05?

Chapter 15

NYU Classes Stats Course

15.1 Review of Hypothesis Testing

Suppose you want to test if a coin is weighted. In other words, you want to test ifthe probability that the coin lands on heads is .5

1. Formulate a null hypothesis and alternative hypothesis.

• H0: p = .5 or the coin is fair

• H1: p 6= .5 or the coin is NOT fair

2. Collect data by flipping the coin n times, and find p = .52 the fraction of timesit lands on heads.

3. Find the p-value. If you have a sample size of 625, and H0 is true, then p willhave a standard deviation of

σp =

√np(1− p)

n2=

√p(1− p)

n=

√.5(1− .5)

625= .5/

√625 = .02

and regardless of sample size

E[p] = p = .5

Since p is normally distributed, we can find this probability by converting .52into a z-score:

z =.52− E[p|H0 = T ]

σp=.52− .5.02

= 1

90


So we can find the p-value:

p-value = P (|p−p| ≥ .02|H0 is true, p = .5) = P (|z| > 1) = Fz(−1)︸︷︷︸P (z<−1)

+ (1− Fz(1))︸︷︷︸P (z>1)

= 2× Fz(−1)︸︷︷︸≈2×.16

0

5

10

15

20

0.45 0.50 0.55

p−bar

Pro

babi

lity

Den

sity

Probability: 0.317310507862914

4. Compare our p-value to α, our significance level. Typically, the significancelevel used is .05. Our p-value is not less than .05, so we do not reject H0. Wedo not have evidence to conclude the coin is not fair.

Suppose it had landed on heads 55% of the time (p = .55)? Then we would havegotten a different z score and p-value:

z =.55− .5.02

= 2.5

P (|z| > 2.5) = Fz(−2.5)︸︷︷︸P (z<−2.5)

+ (1− Fz(2.5))︸︷︷︸P (z>2.5)

= 2× Fz(−2.5)︸︷︷︸≈2×.006


0

5

10

15

20

0.45 0.50 0.55

p−bar

Pro

babi

lity

Den

sity

Probability: 0.0124193306515523

In this situation, our p value would be substantially below the significance valueof .05, so we would reject the null hypothesis, and conclude that the coin must beunfair.

15.1.1 Critical z values

• To reject the null hypothesis in a two-sided test with α = .05, you need |z| >1.96 because P (|z| > 1.96) = .05

• To reject the null hypothesis in one-sided test with α = .05, you need a z scoregreater than 1.645 because P (z > 1.645) = ..05

• To reject the null hypothesis in a two-sided test with α = .1 you need a |z| >1.645 because P (|z| > 1.645) = .1

15.2 Real Data: Examples of Hypthesis Testing

15.2.1 First Example: Sports Gambling

• In sports gambling, the point spread is a theoretical addition to the score ofthe weaker of two teams


• For example, if the Knicks (a bad team) are playing the Golden State Warriors(a good team) there might be a spread of 9.5 points.

– If the Knicks lose by 9 points, they will have “beat the spread.”

– If the Golden State Warriors win by 10 points, they will have “beat thespread.”

• Vegas oddsmakers set the spread so that they believe each team will a 50%chance of beating the spread

• This is a difficult thing to do

A recent study examined 124 NFL games, and found that in 54% of games, thestronger team beat the spread. Should we conclude that the Vegas Oddsmakers aredoing a bad job setting the spread and failing to give each team a 50% chance ofbeating the spread?

• If p, is the probability that a favored team beats the spread, what is the nullhypothesis?

H0 : p = .5

• Is this a one-sided or two-sided test? What is z.05?

• How do we find the z-score?

z =p− pH0

σp=

67/124− .5√.5× (1− .5)/124︸︷︷︸

σp if H0 true

= .9

• What do we conclude about the Vegas Oddsmakers’ spreads?

15.2.2 Second Example: Math Scores

• Bayview High was chosen to participate in the evaluation of a new algebra andgeometry curriculum.

• In the recent past, Bayview’s students were considered “typical” and earnedscores on standardized tests consistent with the national averages.

• A cohort of 86 Bayview students were randomly selected to receive the newcurriculum.


• After two years of the new curriculum, the 86 chosen students had an averagemath score of 502.

• The national average was 494, with a standard deviation of 124.

• What is our null hypothesis?

H0 : µ = 494

Where µ is what the national average would be if all students received the newcurriculum (identical to the national average under the new curriculum)

• We can think of the average of the scores of the 86 students with the newcurriculum as a sample mean, X, with n = 86. Therefore,

E[X] = µ

σX =124√

86

• Should we use a one-sided or two-sided test to determine if the curriculumhad an impact on test scores? Answer Two-sided, the curriculum could havereduced scores.

Can we claim that the new curriculum increased the scores of the randomly selectedstudents with a significance level of .05? Our cutoff z-statistic will be 1.96, since thisis a two-sided test with α = .05:

z =X − µH0

σX=

502− 494

124/√

86≈ .6

What is the p-value?p = 2× (1− F (.6)) = .548

What can we conclude about the new curriculum?

15.2.3 Time of Death Example

• There is a theory that people may hold on to life a little longer until after someevent, such as a birthday, that has particular meaning to them has passed.

• A study looked at obituaries in a newspaper for a period of time and foundthat among the 747 people who died, only 8% died in the 3 month period priorto their birthday.


• How many would be expected to die in a 3 month period if people are equallylikely to die before and after their birthday? 25%

• Is this evidence that people are less likely to die right before their birthday?

z =p− pH0

σp=

60/747− .25√.25× (1− .25)/747

= −10.7

Can the reduction in people’s likelihood of dying be explained by simply ran-dom chance? Or should we reject the null hypothesis?

Chapter 16

Prob Stats Tutor NY

16.1 Type 1 and Type 2 Errors

• The null hypothesis is either true or not

• When we conduct a hypothesis test, we either reject null the hypothesis or not.

• This means every hypothesis test has four possible outcomes

Null Hypothesis isResult of Hypothesis Test True False

Reject H0 Type 1 Error SuccessFail to Reject H0 Success Type 2 Error

• If α, significance level of our test, is .05, what is the probability of a Type 1error if null hypothesis is true?

• Probability of Type 1 Error is .05.

• What is the probability of a Type 2 error if the null hypothesis is false?

• Depends on the standard deviation of the estimator and how far off the nullhypothesis is

16.1.1 Example of Type 1 and 2 Errors: Sports Gambling

Let’s revisit the sports gambling example. There were 124 games played and the nullhypothesis was that:

H0 : p = .5

96

CHAPTER 16. LECTURE 16: APRIL 2, 2019 97

Where p is the probability of the favored team “beating the spread.” Assuming H0

is true, what is the probability we reject it? We reject the null hypothesis if thez-score is greater than 1.96 or less than -1.96. Letting p be the fraction of times thefavored team beats the spread.

p− .5√.5× (1− .5)/124

= z ≥ 1.96→ p ≥ .588

Orp− .5√

.5× (1− .5)/124= z ≤ −1.96→ p ≤ .412

0.00

0.02

0.04

0.06

40 50 60 70 80 90# of Times Favored Team Beats the Spread

f(z)

Probability of Type 1 Error: .05

Suppose our H0 is false and the favored team has a 55% probability of beating thespread

p 6= .5 p = .55

What is the probability that we would fail to reject the null hypothesis that p = .5?

• We need to find the z-scores associated with .588 and .412 if p = .55

.588− .55√.55× (1− .55)/124

= .86


.412− .55√.55× (1− .55)/124

= −3.09

Find Fz(.86)− Fz(−3.09) to find the probability of a type 2 error:

pnorm(.86) - pnorm(-3.09)

[1] 0.8041047

This means, that if the null hypothesis is false, and p = .55, then 80% of the timeour test will fail to detect it and fail to reject the null. The remaining 20% of thetime, our test will correctly reject the null. So the test has a Type 2 error rate, β of.8, and a power, 1− β, of .2.

16.1.2 Relationship between Statistical Significance and Type1 and 2 Error Rates

How would our likelihood of a Type 2 Error change if we used a .1 significance levelinstead? We reject the null hypothesis if:

p− .5√.5× (1− .5)/124

= z ≥ 1.645→ X ≥ .574

Orp− .5√

.5× (1− .5)/124= z ≤ −1.645→ X ≤ .426

Find the z-scores associated with .574 and .426 if p = .55

.574− .55√.55× (1− .55)/124

= .53

.426− .55√.55× (1− .55)/124

= −2.78

F (.53)− F (−2.78) ≈ 70 percent

In general, if we have a less strict significance level (higher Type 1 error rate), itlowers our Type 2 error rate. In other words, if we lower the threshold required toreject H0, we are more likely to reject H0 when it is true, but less likely to fail toreject it when it is false.


16.1.3 Larger Samples Produce Fewer Type 2 Errors

Suppose we had a sample of 1,000 games instead of a sample of just 124?We reject the null hypothesis if:

p− .5√.5× (1− .5)/1, 000

= z ≥ 1.96→ p ≥ .531

Orp− .5√

.5× (1− .5)/1, 000= z ≤ −1.96→ X ≤ .469

Find the z-scores associated with .469 and .531 if p = .55 instead of .5:

531− .55√.55× (1− .55)/1, 000

= −1.2

469− .55√1, 000× .55× (1− .55)/1, 000

= −5.12

F (−1.2)− F (−5.12) = 11.5 percent

So with 1,000 observations instead of 124, the probability of making a Type 2 errordrops from 80% to 11.5%. The power of the test rises from correctly rejecting thenull hypothesis 20% of the time to 88.5% of the time.

0.000

0.005

0.010

0.015

0.020

0.025

400 450 500 550 600# of Times Favored Team Beat the Spread

f(x)

Probability of Type 2 Error (1,000 obs): .115


16.1.4 Likelihood of Type 2 Error Depends on How Far Offthe Null Hypothesis Is

Suppose the bookies are simply way off, and in fact the favored team has a 70%probability of beating the spread.

p 6= .5 p = .7

What is the probability that we would fail to reject the null hypothesis that p = .5?

• We need to find the z-scores associated with .588 and .412 if p = .7

.588− .7√.7× (1− .7)/124

= −2.72

.412− .7√.7× (1− .7)/124

= −7

Find F (−2.72)− F (−7) to find the probability of a type 2 error:

> pnorm(-2.72) - pnorm(-7)

[1] 0.003264096

In summary, the likelihood of a Type 2 error depends on three factors:

• The significance level (the harder it is to reject H0, the higher the likelihood ofa Type 2 error

• The standard deviation of the estimator (which can be lowered with highersample size)

• How far the null hypothesis is off from the truth

Chapter 17

Stats Lessons near New York

17.1 Power

• α is our significance level, or the probability of making a Type 1 Error if H0 istrue

• β is the probability of making a Type 2 Error if H0 is false

• 1 - β, the likelihood of correctly rejecting the null hypothesis if it is false, iscalled the power of a test.

• Just like the probability of making a Type 2 error, the power of a test isdetermined by:

– The significance level

– How far off the null hypothesis is

– The standard deviation of the estimator (strongly affected by sample size)

We can graph how the power of a test changes depending on the true value of theparameter we are trying to estimate (in this case p). This is called a power curve.

101

CHAPTER 17. LECTURE 17: APRIL 4TH, 2019 102

0.05

0.15

0.25

0.35

0.45

0.55

0.65

0.75

0.85

0.95

0.3 0.4 0.5 0.6 0.7p

Pow

erPower Curve for Both .05 and .1 Sig. Levels

Which curve is for α = .05 and which one is for α = .1?


17.1.1 Sample Size Can Significantly Increase Power

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54 0.55p

Pow

er

Samplen100n500n2500n10000

Power Curves by Sample Size

If you have a sample of 10,000 and you make a Type 2 Error, how far from .5 wouldyou expect the true value of p to be?

17.2 Finding the Required Sample Size

Suppose you are trying to determine the average amount of money spent by NYUstudents on coffee per semester. Your null hypothesis is that the average studentspends no more than $100.

H0 : µ ≤ 100

H1 : µ > 100

Assume that coffee consumption is normally distributed, with a standard deviationof $14 per semester.

• To test this hypothesis, you plan to gather a random sample of students.

• You want to do this in the cheapest way possible and the more students yousample the more expensive it will be.


• Suppose you want your test to have a power of at least .6 if µ = 103, andα = .05

• How would you determine how large your sample needs to be?

Answer

• Is this a one-sided or two-sided test? One sided

• Given that, and α = .05, what z-score do we need to reject the null hypothesis?

1− F (z) = .05→ F (z) = .95

Find the z score with a cumulative probability of .95 in your z-table, or use R:

> qnorm(.95)

[1] 1.644854

Or approximately z = 1.645

• To reject the null hypothesis you need a z ≥ 1.645

• We need to find the cutoff at which we will reject H0:

y∗ − 100

14/√n

= 1.645→ y∗ = 100 + 1.645× 14√n

Where y∗ is the cutoff determining when you reject H0

(reject H0 if Y > y∗) and n is the sample size.

• We also need power to be .6 if µ = 103

P (Reject H0|µ = 103) = P (Y ≥ y∗|µ = 103) = .6

If Y > y∗, when µ = 103, then the z score of Y has to be greater than(y∗ − 103)/(14/

√n):

P

(Y − 103

14/√n≥ y∗ − 103

14/√n

)= .6

We need to find a z-score such that the probability of getting a z score greaterthan it is .6, and then we will be able to find another equation relating y∗ to


n. (Remember n is the sample size necessary to get power of .6 when µ = 103and H0 : µ ≤ 100).

1− F (z) = .6→ z = −.25

So ify∗ − 103

14/√n

= −.25

Then

P

(Y − 103

14/√n≥ y∗ − 103

14/√n

)= P (z > −.25) = .6

And our condition of having power of .6 will hold.

We have found two equations with two variables:

y∗ = 100 + 1.645× 14√n

y∗ − 103

14/√n

= −.25→ y∗ = 103− .25× 14√n

We can combine and solve

100 + 1.645× 14√n

= 103− .25× 14√n

Solving for n we find we need a sample of at least 78 students

Chapter 18

Stata Tutor in New York

18.1 Student’s t Distribution

18.1.1 Unkown Variance

• In prior examples, we were looking at the binomial distribution, in which nwas known, and we were making inferences about p.

– In the binomial distribution, p and n together determine variance andstandard deviation (σ2 = np(1− p))

• Frequently, we do not know the underlying distribution of the population, orits variance.

• In the last example, we assumed that the standard deviation of coffee con-sumption was $14 per semester.

– It is very unusual to know the standard deviation of a distribution if youdo not know the mean

• Most of the time, we have to make inferences about BOTH the standard devi-ation/variance AND the mean

18.1.2 The t statistic

• When we make inferences, we use a z-score

z =X − µσ/√n

106


Where X is the sample mean, µ is the H0 value of the population mean, σ isthe pop. standard deviation, and n is sample size.

• We know from the central limit theorem that z is approximately normallydistributed as n gets large

• If we don’t know what the true value of σ we estimate σ:

s =

√∑ni=1(Xi − X)2

n− 1

What is the distribution of the statistic we call t?

t =X − µs/√n

The statistic t has what’s called a Student’s t, or simply just t, distribution. LetY1, Y2, . . . , Yn be a random sample from a normal distribution with mean µ andstandard deviation σ. Then

Tn−1 =Y − µs/√n

t distributions differ based on the number of “degrees of freedom”, which is essentiallythe number of observations minus 1. The t-statistic has two sources of uncertainty:

• X can differ significantly from the population µ (same as z)

• s (estimated standard deviation) can differ significantly from σ (populationstandard deviation). This is the key difference between the t and z distribu-tions, and it accounts for why extreme t statistics are far more likely in smallsamples, than extreme z scores.

Reminder about what z, t, and s are:

z =X − µσ/√n

t =X − µs/√n

s =

√∑ni=1(Xi − X)2

n− 1


18.1.3 t-statistic Example

• Suppose you are drawing from a distribution with a mean of 10, and a standarddeviation of 2.

• You sample four observations

• The four observations are: 10.8, 11.1, 10.9, and 11.2

– What is X?

X =10.8 + 11.1 + 10.9 + 11.2

4= 11

– What is s?

s =

√(10.8− 11)2 + (11.1− 11)2 + (10.9− 11)2 + (11.2− 11)2

4− 1=

√.1

3

t =11− 10√.1/3/

√4

= 10.9 z =11− 10

2/√

4= 1

18.1.4 Extreme t statistics happen more frequently

The prior example illustrates how in a small sample, it is not that hard to get a tstatistic that is much larger than you would ever get in the normal distribution.

• It is much more likely to a sample mean that is 5 estimated standard devia-tions away from the true mean than it is to get a sample mean that is 5 truestandard deviations away.

• What if you just get four observations that happen to be above average andvery close together?

• Therefore t distributions have fatter tails than the standard normal (z) dis-tribution

18.1.5 Importance of Sample Size

• The way

t =X − µs/√n

is distributed varies with the sample size


• As n gets larger, not only does the denominator shrink (bc of the√n), but the

s becomes more and more accurate (s is a consistent estimator).

• This means that as the sample size increases, the t distribution converges ontothe normal distribution?

• The sample size is usually denoted using the “degrees of freedom” of the tdistribution, which is actually equal to the sample size minus 1.

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

f(x)

Normal

t, 1, d.f.

t, 5 d.f.

t, 10 d.f.

18.2 Using t statistics for hypothesis testing

• In practice, we usually compute t statistics, not z-scores, because the popula-tion standard deviation is unknown.

• Our acceptance or rejection of the H0 in these cases hinges on having a t-statistic greater than some critical threshold, not a z-score


• When using a z-score for the hypothesis test, the critical threshold for rejectingH0 is 1.96 for a two-sided test with α = .05. What would it be with when usinga t-statistic?

– It depends on the degrees of freedom, you have to check a t-table

If df are greater than 120, you can simply use the normal approximation (1.96 ifα = .05 and two-sided test)

Chapter 19

Stats homework help

19.1 Using t-distribution for Confidence Intervals

Let y1, y2, . . . , yn be a random sample of size n kfrom a normal distribution with(unkown) mean µ. A 100(1− α)% confidence interval for µ is the set of values:(

y − tα/2,n−1 ×s√n, yt + tα/2,n−1 ×

s√n

)Where tα/2,n−1 is the critical value for a two-sided hypothesis test with a significancelevel of α, and n− 1 degrees of freedom. So for example, if n = 10 and α = .05, then

tα/2,n−1 = t.025,9 = 2.262

So if the sample mean, y = 100, and the sample standard deviation, s = 20. Thenthe 95% confidence interval would be:(

100− 2.262× 20√10, 100 + 2.262× 20√

10

)19.1.1 Example

We have a sample of 20 observations of a random variable:

2.5 .1 .2 1.33.2 .1 .1 1.4.5 .2 .4 11.2.4 7.4 1.8 2.1.3 8.6 .3 10.1

111


We can skip the arithmetic: X = 2.6 and s = 3.6. How would we find a 95% confi-dence interval for the true mean of the population this data was drawn from?

2.6± t.025,19 ×3.6√

20

So we need to find t.025,19 on a t table, and it is equal to 2.093

> qt(.025, df = 19)

[1] -2.093024

2.6± 2.093× 3.6√20

= (.9, 4.3)

Table A.2 699

Table A.2 Upper Percentiles of Student t Distributions

0 tα, df

Area = α

Student t distributionwith df degrees of freedom

α

df 0.20 0.15 0.10 0.05 0.025 0.01 0.005

(cont.)


19.2 t-test Example

• 3 banks serve a metropolitan area’s inner city neighborhoods

– Federal Trust

– American United

– Thrid Union

• State banking commission is concerned that loan applications from inner-cityresidents are being denied more frequently than equally qualified loan applica-tions from rural areas.

• Records show that last year 62% of all the mortgage applications filed by ruralresidents were approved

• Both rural and inner-city residents feel they are being discriminated againstand receiving unfair treatment Below are data on loan approvals for 12 branchoffices that serve inner-city neighborhoods:

Location Affiliation % Approved3rd & Morgan AU 59Jefferson Pike TU 65

East 150th & Clark TU 69Midway Mall FT 53

N. Charter Highway FT 60Lewis & Abbot AU 53

West 10th & Lorain FT 58Highway 70 FT 64

Parkway Northwest AU 46Lanier & Tower TU 67

King & Tara Court AU 51Bluedot Corners FT 59

We want to test the H0 : µ = 62%, in other words, we want to test if inner-cityresidents have the same loan approval rate or not

Sample mean:

59 + 65 + 69 + 53 + 60 + 53 + 58 + 64 + 46 + 67 + 51 + 59

12= 58.667%

How can we test whether this is significantly different from 62% at a α = .05 signif-icance level?


• Find s (estimate the population standard deviation using the sample)

s =(59− 58.667)2 + (65− 58.667)2 + · · ·+ (59− 58.667)2

12− 1= 6.946

t =58.667− 62

6.946/√

12= −1.66

• Compare the t-statistic to the relevant cutoffs

Cutoffs: 1.796 for one-tail test, 2.201 for two-tailOur t-statistic is below both cutoffs, regardless of what type of test we do, thedifference is not statistically significant. Should we therefore conclude there is nodiscrimination? Maybe not:


AffiliationsAmerican United Third Union Federal Trust

59 65 5353 69 6046 67 5851 64

59.Average 52.25 67 58.8

s 5.38 2 3.96t -3.63 4.33 -1.81

Chapter 20

Lecture 20: April 16th, 2019

20.1 Hypothesis Tests with Two-Populations

• Previously, we have had data (a sample drawn from a population) and com-pared its characteristics against a H0

Data Null HypothesisFacebook posts at 8 AM 50 likes/post

Football games Favored team beats spreadDate of deaths relative to birthday 3 months before 25% of the time

Commute lengths in minutes Average commute time is 36 min.

• We hypothesize a population mean and then look at the data to test whetherour sample mean is similar or substantially different from our hypothesizedmean using a t or z statistic (depending on if σ is known)

z =X − µσ/√n

t =X − µs/√n

• If z or t is above the critical value (p-value < α) then we reject H0

Suppose you were interested in whether students in online courses learn material aswell as students in in-person courses

• You have two groups of students, none of whom have ever taken statisticsbefore

• The first group takes a course taught by a professor in person

117


• The second group views video lectures instead of in-person lectures and doesnot interact with the professor in person, but completes the same assignmentsand takes the same final exam.

• We have data on how both groups performed on the final exam

30 35 40 45 50 55 60 65 70 75 80 85 90 95 100Test Score

Class Typeinpersononline


inperson online

30 35 40 45 50 55 60 65 70 75 80 85 90 95100 30 35 40 45 50 55 60 65 70 75 80 85 90 95100Test Score

Do the in-person students score better? It seems that most of the top students areonline students, but so are most of the bottom students. Why is this the case? Mostlikely the variance of scores is simply higher for online students making them moreconcentrated at both extremes of the distribution. But that doesn’t help us figureout which group does better on average To do that, we’ll need to use statistics andtest the hypothesis that the two populations have different average scores:

• Let µi be the average test score for all potential in-person students, and Ti isthe average in our sample

• Let µo be the average test score for all potential online students, and To is theaverage in our sample

• How can we set up a hypothesis test to determine if there is a difference inoutcomes between the two types of instruction?

H0 : µi = µo

H1 : µi 6= µo

We can define a new variable equal to the difference in population means be-tween the two groups:

d = µi − µ0


We can then re-state the null hypothesis as:

H0 : d = 0

• Let d be the difference between Ti and To, what will be the variance of d? Keepin mind:

V ar(X − Y ) = V ar(X) + (−1)2V ar(Y ) + 2(−1)Cov(X, Y )

V ar(d) = V ar(Ti − To) = V ar(Ti) + (−1)2︸︷︷︸=1

V ar(To) + 2(−1)Cov(Ti, To)︸︷︷︸=0

V ar(d) = V ar(Ti − To) = V ar(Ti) + V ar(To)√V ar(d) =

√V ar(Ti) + V ar(To)

How can we estimate the standard deviation of d?

S2i =

1

n− 1

ni∑j=1

(Tij − Ti)2 S20 =

1

n− 1

no∑j=1

(Toj − To)2

sd ≈√S2i /ni + S2

o/no

What will be our t-statistic?

t =d− 0√

S2i /ni + S2

o/no

> data_summary <- test_data1 %>% group_by(Class_Type) %>%

summarise_at(vars(Test_Score), funs(mean, var, length))

> data_summary

Class_Type mean var length

<chr> <dbl> <dbl> <int>

1 inperson 77.11 116.7252 100

2 online 69.48 250.8178 100

t =77.11− 69.48√

250.8/100 + 116.7/100= 3.98

Degrees of Freedom ≈ (s2i /ni + s2

o/no)2

(s2i /ni)2

ni−1+ (s2o/no)

2

no−1

≈ 174.7


Cutoff for significance at .05 level is 1.97, so we reject H0 and conclude that in-personinstruction produces higher test scores. Finding the approximate degrees of freedomfor a two-sample t test is a real pain in the neck, and often finding the exact degreesof freedom is not necessary. The approximated degrees of freedom will always be atleast as large as the smaller of the two samples (in this case, both samples have asize of 100). One way to make this simpler would be to check if our t statistic of3.98 is statistically significant with 99 degrees of freedom. The cutoff for statisticalsignificance with 99 degrees of freedom is 1.984, so we would be able to reject the nullhypothesis even if d followed a t distribution with 99 degrees of freedom instead of174. Given that, it’s not really important to determine that the degrees of freddomare actually approximately 174.7 instead of 99, and that the critical value for rejectH0 is 1.97 instead of 1.98. When the t statistic is 3.98 those details simply don’tmatter that much (although they might matter slightly more for constructing a 95%confidence interval of the difference in means between the populations).

The R programming language has a default function for doing t-tests:

> t.test(Test_Score ~ Class_Type, data = test_data1)

Welch Two Sample t-test

data: Test_Score by Class_Type

t = 3.9799, df = 174.74, p-value = 0.0001009

alternative hypothesis: true difference in means is not

equal to 0

95 percent confidence interval:

3.846268 11.413732

sample estimates:

mean in group inperson mean in group online

77.11 69.48

20.2 Second Example: Two-Sample t-tests

• Does the size of a firm matter for its profitability?

• We have a sample of twelve large companies (sales $679-$738 million) andtwelve small companies (sales $25-$66 million


Return on EquityLarge Small

21 2123 2113 1422 317 1917 1919 1111 292 2030 2715 2743 24

• Let µL be the average ROE for large companies and µS be the average ROEfor small companies.

• We want to test if µL = µS, or if companies of a certain size tend to be moreprofitable

H0 : µL = µS

• L = 18.6 is the sample average for large companies and S = 21.9 is the sampleaverage for small companies.

• s2L = 116 is the sample variance for large companies and s2

S = 35.76

t =L− S√

s2S/12 + s2

L/12=

18.6− 21.9√116/12 + 35.76/12

= −.928

Computing the degrees of freedom produces an estimate of approximately 17.What do we conclude about firm size and profitability?

Chapter 21

Lecture 21: April 18th, 2019

21.1 two-sample t test with equal variances

Suppose you are looking at the difference in means between two populations, X andY . You have two samples:

Y1, Y2, . . . , Ym

X1, X2, . . . , Xn

The mean of the population Y is µY and the mean of the population X is µX . Bothpopulations share the same variance, σ2. What will be the variance of

d = X − Y

V ar(d) = V ar(X) + V ar(Y ) =σ2

n+σ2

m

V ar(d) = σ2 ×(

1

n+

1

m

)We need to find a way to estimate σ2 using our sample from both Y and X. Theunbiased estimator for the variance of two populations that do not necessarily havethe same mean is:

S2p =

∑ni=1(Xi − X)2 +

∑mi=1(Yi − Y )2

n+m− 2

How do we know if this is a good estimator? We need to determine if it is unbiased.Since

n∑i=1

(Xi − X)2 = (n− 1)S2x

123


we can re-write the estimator as:

S2p =

(n− 1)S2x + (m− 1)S2

y

n+m− 2

E[S2p ] = E

[(n− 1)S2

x + (m− 1)S2y

n+m− 2

]=

(n− 1)σ2

n+m− 2+

(m− 1)σ2

n+m− 2= σ2

So S2p is an unbiased estimator of the variance of X and Y . We can then use this to

estimate the variance of d

S2d = S2

p

(1

n+

1

m

)We can then use this to find the t-statistic when testing the null hypothesis thatd = 0:

t =Y − X − 0

Sd=

Y − XSp ×

√1/n+ 1/m

When both distributions have equal variance, the degrees of freedom are simplyn + m − 2, where n is the number of observations in the first sample, and m is thenumber of observations in the second sample.

21.2 Example: Two-sample t-test with equal vari-

ance

Suppose we are still working with the populations X and Y that each have their ownmean, but have the same variance. We draw a sample of 6 observations from X, anda sample 5 observations from Y the following is the data:

PopulationX Y

55.4 37.752.9 38.557.1 55.457.8 43.264.2 38.265.3

Sample Mean 58.78 42.6Sample Variance 24.33 56.1


What is our best estimate of the variance of X and Y . Remember that:

S2p =

5× 24.33 + 4× 56.1

6 + 5− 2= 38.45

So our best estimate of the variance of d would be:

S2d = S2

p ×(

1

n+

1

m

)= 38.45×

(1

6+

1

5

)= 14.1

So if we wanted to test the null hypothesis

H0 : µX = µY

We would then get:

t =X − Y√

S2p × (1/n+ 1/m)

=58.78− 42.6√

14.1= 4.3

What are the degrees of freedom for this t statistic?

DF = n+m− 2 = 6 + 5− 2 = 9

Critical value will be 2.26, and 4.3 is substantially above that, so we reject the nullhypothesis.

21.2.1 Two-sample test Using the Binomial Distribution

• In the 19th century, the rate of death due to surgery, was extremely high

• The primary reason was the surgery exposed patients to infection, and doctorsat the time did not accept the germ theory of disease so they did not take anyefforts to sterilize the environment where surgeries would take place.

• Joseph Lister, a British surgeon, speculated that the infections caused by surg-eries may be due to bacteria.

• To test this hypothesis, he experimented with using carbolic acid, which islethal to bacteria, but safe for large animals.

• He performed 40 amputations in which he used carbolic acid to sterilize theoperating environment, and 35 amputations in which he did not.


Carbolic Acid No AcidSurvived Died Survived Died

34 6 19 16Let pc be the probability of a patient dying when using carbolic acid, and pn be theprobability without it.

H0 : pc = p = pn H1 : pc > pn

If H0 is true, and the likelihood of surviving surgery is the same regardless of whethercarbolic acid is used, then the variance of each patient dying is also equal (p(1− p)).We can estimate this variance by finding an estimate of p using both groups.

p =34 + 19

40 + 35= .707→ sp =

√.707× (1− .707)

z =34/40− 19/35√

.707× (1− .707)√

1/40 + 1/35=.85− .543

.1053= 2.92

Chapter 22

Lecture 22

22.1 Testing Hypotheses about Variance

• Up until now, we have constructed hypotheses about the mean of various pop-ulations.

• Sometimes we may have hypotheses about the variance of one population, orwhether the variance of one population is greater or lower than the variance ofanother population.

• Example: Suppose we are interested in the level of risk in a class of investmentsand how it compares to the level of risk in another class of investments. Whileaverage returns are obviously important, to answer this question, we wouldneed to test hypotheses about the variance of returns

• In order to learn how to do this type of test, we need to learn about anotherprobability distribution, the χ2 (chi-squared)

The z distribution is a normal distribution with µ = 0 and σ2 = 1

22.1.1 χ2 distribution

A χ2 distribution is result of adding k independent draws from the z distributionand squaring each of them:

χ2k = z2

1 + z22 + z2

3 + . . . z2k

127

CHAPTER 22. LECTURE 22 128

The distribution has only one parameter (k), and it has a mean of k and a varianceof 2k. Recall that

s2 =1

n− 1

n∑i=1

(Xi − X)2

It can be shown that:

(n− 1)s2

σ2=

n∑i=1

(Xi − X)2

σ2

Has a χ2n−1 distribution

Suppose we are drawing 20 observations from a distribution and our null hypoth-esis is:

H0 : σ2 = 8

We get a sample variance of 16. How can we tell if we should reject our H0?

(n− 1)s2

σ2=

n∑i=1

(Xi − X)2

σ2

Has a χ2n−1 distribution, so

(20− 1)16

8= 38

> 1 - pchisq(38, df = 19)

[1] 0.005934709


0.00

0.02

0.04

0.06

0 20 40 60chi.squared

f(z)

Probability: 0.00593461702566422

702 Appendix Statistical Tables

Table A.3 Upper and Lower Percentiles of χ 2 Distributions

Area = 1 – p

! distribution withdf degrees of freedom

!2p, df

2

0

p

df 0.010 0.025 0.050 0.10 0.90 0.95 0.975 0.99


22.1.2 χ2 test Example

The Global Rock Fund claims it has an investment strategy that allows it to haveless volatility in its returns than its benchmark, the Lipper Average. Below are thereturns for the Global Rock fund over 19 years, over the same time the Lipper averagehad a standard deviation in its performance of 11.67%.

Year % Return1989 15.321990 1.621991 28.431992 11.911993 20.711994 -2.151995 23.291996 15.961997 11.121998 0.371999 27.432000 8.572001 1.882002 -7.962003 35.982004 14.272005 10.332006 15.942007 16.71

Sample standard deviation is 11.28.

χ2 =(19− 1)× 11.282

11.672= 16.82

We compare this to the critical values for a χ2 statistic with 18 degrees of freedom.

• If a one-sided test with α = .05, we have a critical value of 9.39. 5% of theprobability density in a χ2

18 distribution is below 9.39.

• If a two-sided test with α = .05, we have critical values of 8.231 and 31.526.2.5% of the probability density of the χ2

18 distribution is below 8.231 and 2.5%is above 31.526.


22.2 Testing for Equality of Variances Using Sam-

ples from Two Populations

• When comparing two populations, we may want to know whether they havethe same variance

• If we are testing for a difference in means, we may want to know if we can usepooled variance, or whether we should estimate the variances separately.

• We may simply want to know if the variance in one population is higher orlower than the other (i.e. evaluating risk of two potential investments)

• Testing equality of variances requires learning about a new probability distri-bution, the F -distribution

• The F -distribution is the probability distribution that results from a ratio oftwo different χ2 random variables:

Fdf1,df2 =χ2df1/df1

χ2df2/df2

The degrees of freedom of the χ2 random variable in the numerator is called thenumerator degrees of freedom, the degrees of freedom of the χ2 random variablein denominator is called the denominator degrees of freedom.


0.0

0.2

0.4

0.6

0.0 2.5 5.0 7.5 10.0

F

f(F,

10,

5)

F distribution with 10 numerator degrees of freedom and 5 denominator degrees offreedom

22.2.1 Example

Suppose we have 10 observations from two populations, X and Y . We want to testwhether the two populations they are drawn from have the same variance:

H0 : σ2X = σ2

Y H1 : σ2X 6= σ2

Y

We do not observe the population variances of each of these samples, but we doobserve the sample variances from each of these two populations, which are thefollowing:

s2x = .21 s2

y = .36

How can we test whether this is a statistically significant difference in variances, ora difference just due to random chance? Suppose H0 is true:

F9,9 =(10− 1)× s2x

σ2X/(10− 1)

(10− 1)× s2yσ2Y/(10− 1)

=s2x/σ

2

s2y/σ

2=s2x

s2y


We can construct an F statistic by taking the ratio of the sample variances. If thenull hypothesis is true, it should follow an F distribution:

F9,9 =s2x

s2y

= .21/.36 = .583

Based on the F -table for a distribution with 9 numerator and 9 denominator degreesof freedom, 95% of the probability density in that F distribution falls between .248and 4.03. Therefore, our F statistic of .583 is not at all surprising if we assume H0

is true, and this is not sufficient evidence for us to reject the null hypothesis at theα. = .05 level.

708 Appendix Statistical Tables

Table A.4 Percentiles of F Distributions (cont.)


22.2.2 Example: F -test

Suppose we are interested in determining which is a more reliable method of com-muting in NYC. We try commuting both ways by bike and by subway for a full week(10 commutes total for each method).

subway biking

1 31.0 34.52 37.6 37.33 45.5 36.64 32.3 30.35 40.5 31.96 41.4 33.47 22.6 31.58 27.3 37.19 34.0 37.610 29.9 31.1

How do we evaluate which method is better? Things to consider:

• Is one of the two methods shorter?

• Is one of the two methods more reliable?

• Suppose you can only afford to arrive at work after 9 AM 1% of the time.Which method of transportation would allow you to leave later?

Calculate sample statistics for each transportation method separately:

Subway BikingSample Mean 34.19 34.13

Sample Standard Deviation 7.02 2.85

First, test whether one method has better or worse average travel time:

H0 : µsubway = µbiking H1 : µsubway 6= µbiking

t =X − Y − 0√s2x/nx + s2

y/ny=

34.19− 34.13√7.022/10 + 2.852/10

= .025


Regardless of the degrees of freedom or level of significance, this t statistic is far toolow to reject the null hypothesis. Now let’s test if the two populations have differentvariances:

H0 : σ2subway = σ2

biking H1 : σ2subway 6= σ2

biking

F =S2x

S2y

=7.022

2.852= 6.07

The cutoffs for 5% significance in a two-sided test are .248 and 4.03. Therefore rejectthe null hypothesis at the 5% level. Suppose we ran the same experiment for 5 weeks(25 observations of each method), and the the sample variances were the same foreach one. Would we then be able to reject the null hypothesis?Next find when you would have to leave by each method to only arrive late 1% ofthe time:Subway: 34.19 + t.005,df=9︸︷︷︸

3.25

×7.02 minutes before 9 AM = 8:03 AM

Biking: 34.13 + t.005,df=9︸︷︷︸3.25

×2.85 minutes before 9 AM = 8:17 AM

Chapter 23

Lecture 23

23.1 Statistics with Jointly Distributed Random

Variables

Recall that two random variables X and Y are considered jointly distributed, ifthey are both functions of the same experiment.

• An experiment:randomly choose an individual from the population

• A random variable:That person’s hourly wage in dollars, denoted by W

• Another random variable:That person’s years of completed education, denoted by E.

E and W are jointly distributed because there are both determined by the samerandom process, namely which person is randomly chosen from the population.

• We have focused on using statistics to estimate a characteristic (the mean orvariance) of a single random variable.

• In the last two classes we have used hypothesis testing to determine whethertwo populations have the same mean or different means, but for each populationwe have still just looked at one variable at a time.

• Frequently, economists want to know how two variables move together

138



Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

Loosely, the covariance describes how two jointly distributed variables move to-gether.

X is E[Y |X] is Covariance isAbove E[X] Above E[Y] PositiveAbove E[X] Below E[Y] NegativeBelow E[X] Below E[Y] PositiveBelow E[X] Above E[Y] NegativeAnything E[Y] 0

Let L be the life expectancy of that country, and G, the log of its GDP per capita

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

● ●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●●

40

50

60

70

80


Life

Exp

ecta

ncy Continent

●

●

●

●

●

AfricaAmericasAsiaEuropeOceania


Is Cov(L,G) positive, negative, or close to zero?

1. Cov(X, Y ) = σXY = E[XY ]− E[X]E[Y ]

2. Cov(aX, Y ) = aCov(X, Y )

3. Cov(a,X) = 0 if a is a constant


4. Cov(X + Y, Z) = Cov(X,Z) + Cov(Y, Z)

5. Cov(X,X) = Var(X)


Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

We can use a slightly modified method of moments estimator:

sxy =1

n− 1

n∑i=1

(Xi − X)(Yi − Y )

> CPS %>% mutate(ed_bar = mean(educ),

earn_bar = mean(earn)) %>%

+ mutate(covary = (educ - ed_bar) * (earn - earn_bar)) %>%

+ summarise(sum(covary) / (length(covary) - 1))

sum(covary)/(length(covary) - 1)

1 7.855475

> with(CPS, cov(educ, earn))

[1] 7.855475

• So education and earnings have a covariance of 7.85 in our sample, and that isour best estimate of their covariance in the population as a whole.

• Does that mean they are strongly related? Weakly related? What wouldhappen to the covariance if earnings were measured if all dollar amounts weremeasured in pennies instead?Recall that:

Cov(aX, Y ) = aCov(X, Y )

Cov(earn in pennies, educe) = 100× Cov(earn in $, educ)

• So the covariance can tell also if a relationship is positive or negative, but nothow strong it is, because its size depends on the variables’ units.

23.1.1 Correlation

• Correlation is a way of measuring the strength of the relationship between twovariables, regardless of units


Correlation Coefficientρxy =

σxyσxσy

• Range of ρ goes from -1 to 1 regardless of the units of X and Y .

• The closer ρ is to 1 or -1, the stronger the relationship between the two vari-ables.

• If X and Y are independent, their correlation will be zero.

−2

−1

0

1

2

−2 −1 0 1 2

x

y

Correlation is 0.947988214532817


−2

0

2

4

−2 −1 0 1 2

x

yCorrelation is −0.751169644127516

−2

−1

0

1

2

3

−2 −1 0 1 2

x

y

Correlation is 0.00648858764318887


23.1.2 Correlation Coefficient

ρxy =σxyσxσy

So how do we estimate the correlation coefficient?

• Use method of momentsρ =

sxysxsy

> with(CPS, cov(educ, earn)/sd(earn)/ sd(educ))

[1] 0.3610493

> with(CPS, cor(educ, earn))

[1] 0.3610493

econtutor.com...contents 1 lecture 1: january 29th, 2019 1 1.1 goals . . . . . . . . . . . . . . . ....

Documents