~ econ 227d1 ~ final exam course pack fall 2017 · ~ econ 227d1 ~ final exam course pack fall 2017...

13
~ ECON 227D1 ~ FINAL EXAM COURSE PACK FALL 2017 www.sleepingpolarbear.ca CONCEIVED IN NORTHERN IRELAND BY DIABOLICAL ELVES

Upload: lethuan

Post on 06-May-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

~ ECON 227D1 ~ FINAL EXAM COURSE PACK

FALL 2017

www.sleepingpolarbear.ca

CONCEIVED IN NORTHERN IRELAND BY DIABOLICAL ELVES

Basic Definitions

• A distribution is a group of numbers that are being interpreted. o For example, the following is a distribution: [11, 13, 19, 23, 34, 47, 61] o Synonyms: Data, Data Set.

• A value is a specific number from our distribution o For example, 11 is a value from our distribution. o Synonyms: Score, Observation

• Data summary involves taking an entire distribution (for example, the GPAs of 200 randomly selected McGill students) and summarizing this distribution with just a few different values.

o The purpose of data summary is to describe the whole set of scores to someone with these few specific values, so that, without reading the entire data set, they can have a pretty good idea of what it looks like.

o The two main ways data are summarized are measures of central tendency and measures of variation.

Sample vs Population • Population refers to the entire group that we are interested in measuring with respect to the variable in

question. • Sample refers to a subset of this goup of interest. • For example, we may be interested in the IQ of current McGill undergrads.

o The population would be all McGill undergrad students. All 27,000 of them (according to Wikipedia).

o A sample would be, say, 100 randomly selected students from the entire undergrad student body. • The population mean is denoted as • The population standard deviation is denoted as • The sample mean is denoted as 𝑋 • The sample standard deviation is denoted as S.

o The population and sample mean are calculated the same way:

𝑋 =Σ𝑋𝑁

𝜇 =Σ𝑋𝑁

o There is a minor difference in how sample and population standard deviation are calculated:

𝑆 = √∑(𝑋 − 𝑋)2

𝑁 − 1 𝜎 = √∑(𝑋 − 𝜇)2

𝑁

You do not have to constantly worry about whether we are dealing with a sample or population in each example:

(1) Whether we are dealing with a sample or population, everything is calculated the same way, with the exception of standard deviation.

(2) In any given formula, you can replace 𝑋 and S with 𝜇 and 𝜎 or vice versa. This will always be fine. • For example, as we will soon see, the formula for calculating the Z-score of a specific value from

a distribution is as follows:

Population: Z(X) = X − μ

σ ⟷ Sample: Z(X) = X − X

S

• Therefore, even if you did not know whether you were dealing with a population or a sample,

you would still get the exact same result when calculating the Z-score for a particular value of X.

(3) Unless explicitly stated that we are dealing with a population, you can safely assume we are dealing with a sample and calculate standard deviation accordingly.

Measures of Central Tendency

• Measures of central tendency let us know where most of our values are centered or clustered around. • The three most common ones are mean, median and mode. Mean • The Mean or Average is obtained by dividing the sum of all values in our distribution (Σ𝑋) by the number

of values in our distribution (𝑁).

Sample: X =ΣXN

Population: ∶ μ =ΣXN

Median The Median is the middle value in our distribution. The median is greater than half the values and less than half the values in our distribution.

o If we have an odd number of values (N = 5, for example), the median will be an actual value from our distribution (the 3rd value in this case).

o If we have an even number of values (N = 6, for example,), the median will be the average of the two middle values (the average of the 3rd and 4th values in this case).

• The median is also known as the 50th percentile. o A value’s percentile is the percentage of values which it is greater than the median is greater than

half, or 50% of the values in its distribution (and smaller than half of the values).

Example: Find the median in the following distribution: 29, 22, 23, 56, 37, 28, 33 ➢ First, arrange all values in ascending order:

[22, 23, 28, 29, 33, 37, 56] ➢ Next, calculate the sample size and find the rank of the median: Sample Size = N = 7

𝐌𝐞𝐝𝐢𝐚𝐧 𝐑𝐚𝐧𝐤 = 𝐊 = 𝐍 + 𝟏

𝟐 =7 + 1

2 = 4

➢ Finally, find the kth value in our distribution:

[22, 23, 28, 29, 33, 37, 56] The 4th value in our distribution, starting from the smallest, is 29.

Median = 29

• Suppose we had an odd number of values in our distribution. Let’s add 77 as the final value:

[22, 23, 28, 29, 33, 37, 56, 77]

Sample Size = N = 8

Median Rank = K = N + 1

2 =8 + 1

2 = 4.5

• In this case, K = 4.5 tells us that to find the median we must take the average of the 4th and 5th values: 29

and 33.

𝐌𝐞𝐝𝐢𝐚𝐧 = 𝟐𝟗 + 𝟑𝟑

𝟐= 𝟑𝟏

• That’s it! Mean vs Median • When there are extreme outliers (values that are significantly less than or greater than most other values),

the median is often preferred as a measure of central tendency. • This is because the mean is affected by outliers but the median is not. • For example, suppose we were interested in the salaries of students their first year our of McGill. • We randomly sample 10 such students, and their salaries (in thousands) are as follows:

[32, 36, 38, 44, 47, 48, 55, 65, 77, 675]

o Median Salary in this group = 47.5; Mean Salary = 111.7 o 9 of 10 students have salaries between $32,000 and $77, 000 o The Median ($47,500) in this case is therefore a pretty accurate measure of central tendency.

o The Mean ($111,700) in this case is a very misleading measure of central tendency. o The extreme outlier of $675,000 (a student who got rich starting her own business) has “pulled the

mean upwards”. The mean is sensitive to outliers; the median is unaffected by outliers. • If we were to replace the student whose salary was $675,000 with one whose salary was $90,000: [32, 36, 38, 44, 47, 48, 55, 65, 77, 90]

o The Median would still = 47.5 o The Mean would now = 52.2 o The Median has stayed the same at $47,500, while the Mean has fallen from $111,700 to $52,200!

Mode • The Mode is the most common value in our distribution. • For example, consider the following data set: [14, 16, 23, 27, 27, 32, 35, 35, 35, 43, 68]

o The most common value in this distribution is 35, which occurs three times. o Therefore, Mode = 35.

Measures of Variation

• Measures of variation let us known the general spread within our distribution. • In other words, they indicate how far apart values tend to be from one another: whether they are relatively

close together (17, 17, 18, 19, 21) or far apart (98, 225, 436, 879, 7473) Standard Deviation & Variance • The standard deviation tells us the average distance of each value from the mean. • The variance is equal to the standard deviation squared.

Sample Standard Deviation = S = √∑(X − X)2

N − 1 = √∑ X2 − (∑ X)2

nN − 1

Population Standard Deviation = σ = √∑(X − μ)2

N = √∑ X2 − (∑ X)2

nN

Sample Variance = S2 =∑(X − X)

2

N − 1 =∑ X2 − (∑ X)2

nN − 1

Population Variance = σ2 =∑(X − μ)2

N =∑ X2 − (∑ X)2

nN

INTRO TO PROBABILITY Some Notation to Start A Æ event A happens

P(A) Æ probability that event A happens

AC Æ event A does not happen

P(AC) Æ probability that event A does not happen The probability of something happening means the “chances” or “likelihood” of it happening The probability of something happening is always somewhere between 0 and 1 0 Æ 0% Æ Impossible. It will never, ever happen. 1 Æ 100% Æ Guaranteed. It will happen every single time. “Percentage/Proportion of” and “Probability” mean the same thing 60% of McGill students are female Æ Probability a randomly selected McGill student is female is 60% Æ P(Female) = 0.6 The probability of some event happening + the probability of that event NOT happening = 1 • In other words, P(A) + P(AC) = 1 • By re-arranging the terms, we also get:

P(A) = 1 – P(AC) Æ the probability of something happening = 1 – the probability of it not happening P(AC) = 1 – P(A) Æ the probability of something not happening = 1 – the probability of it happening

P(Rain Tomorrow) + P(No Rain Tomorrow) = 1 P(Ben Affleck had eggs for breakfast today) + P(Ben Affleck didn’t have eggs for breakfast today) = 1

• We do not need to know either probability in order to know that their sum is equal to 1. • This is because… The sum of the probabilities of all possible outcomes in a scenario always equals 1. • When we roll a die, for example, there are six possible outcomes: it lands on 1, 2, 3, 4, 5 or 6. • The probability it lands on each number = 1

6

• Therefore: P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 16

• 6 = 1

• Example: 53% of McGill students are from Montreal, 12% are from Toronto, and 10% are from elsewhere in Canada. What % of McGill students are from outside of Canada?

P(MTL) + P(TO) + P(Elsewhere) + P(Outside) = 1 P(Outside) = 1 – P(MTL) – P(TO) – P(Elsewhere) = 1 – 0.53 -0.12 – 0.1 = 0.25 25% of McGill students come from outside of Canada.

HERE IS THE MOST IMPORTANT/BASIC DEFINITION OF PROBABILITY:

𝐏𝐑𝐎𝐁𝐀𝐁𝐈𝐋𝐈𝐓𝐘 𝐎𝐅 𝐄𝐕𝐄𝐍𝐓 "𝐀" 𝐇𝐀𝐏𝐏𝐄𝐍𝐈𝐍𝐆 =# 𝐎𝐟 𝐖𝐚𝐲𝐬 "𝐀" 𝐂𝐚𝐧 𝐇𝐚𝐩𝐩𝐞𝐧

𝐓𝐨𝐭𝐚𝐥 # 𝐎𝐟 𝐏𝐨𝐬𝐬𝐢𝐛𝐥𝐞 𝐎𝐮𝐭𝐜𝐨𝐦𝐞𝐬 • Example: A jar with 4 red and 6 black marbles. A marble is chosen at random. What is the probability of it

being red?

4 red marbles Æ 4 ways of choosing a red marble Æ # ways A can happen = 4 4 red + 6 black marbles Æ 10 total marbles to choose from Æ total # of possible outcomes = 10 P(Red) = 4/10

P(B|A) = probability that B happens/is true, given (taking into account) that A happened/is true • For example, let’s say that 70% of all McGill students like lollipops but only 40% of ECON students like

lollipops. • Let A = McGill student likes lollipops & let B = Student is in ECON

P(A) = 0.7 Æ The probability that a randomly selected McGill student likes lollipops is 0.7 P(A|B) = 0.4 Æ The probability that a randomly selected McGill student likes lollipops, taking into account that he/she is in ECON, is 0.4

The Two Big Rules Rule 1: The probability of A and be B both happening/being true = P(A∩B) = P(A) • P(B|A) Time go to on a slight tangent (be back to Rule 1 in a moment….) Statistical Independence

• Event A and Event B are statistically independent if knowing that A happened/is true does not affect the probability of B happening/being true.

• For example, let’s run an experiment: flip a coin and roll a die • A = coin lands on heads, B= die lands on 4 • Knowing whether or not the coin lands on heads does not affect the probability that the die lands on 4:

P(B) = 1

6

P(B|A) = 1

6

• Therefore, the coin landing on heads and the die landing on 4 are statistically independent. If A and B are statistically independent, then P(B|A) = P(B) Therefore, if A and B are independent, then P(A∩B) = P(A) • P(B) A bit later we will use the above equation when testing for statistical independence. OK back to Rule 1… Example #1: I randomly select 2 cards from a 52 card deck: A, 2, 3, 4, 5 , 6, 7, 8, 9, 10, J, Q, K, each of spades ♠, hearts ♥, diamonds ♦ and clubs ♣. What is the probability that they are both diamonds? Let A = 1st card is a diamond, and B = 2nd card is a diamond P(1st is a diamond) = P(A) = 13

52

P(2nd is a diamond given that 1st is a diamond) = P(B|A) = 1251

A & B are not independent (you do not need to state this when answering such questions) P(1st card is a diamond AND 2nd card is a diamond) = P(A) • P(B|A) = (13

52) (12

51) = 0.0588

Example #2: I flip a coin 3 times. What is the probability that it lands on heads all 3 times? A = heads first time, B = heads second time, C = heads third time P(1st is heads) = 1

2

P(2nd is heads given that 1st was heads) = P(2nd is heads) = 12

P (3rd is heads given that 1st & 2nd were heads = P(3rd is heads) = 12

CONDITIONAL PROBABILITY

• We are dealing with a conditional probability question when a population is being described along 2 different dimensions/variables.

o In other words, there are two things we want to know about each member of the population. o For example, imagine the population of interest is McGill students. For each student, I may want to

know (1) Nationality and (2) Level of Religiosity o As you will see later, this is in contrast to the Binomial, Hypergeometric, Poisson/Exponential, and

Normal Distributions, in which each population member is only being described along one dimension. • Furthermore, in such a question both variables are categorical.

o In other words, there are only a few different possible “values” or “groups” into which each member of the population can fall.

o In the above example, I may choose to categorize each student’s nationality as either (a) Canadian, (b) American, or (c) International, and each student’s level of religiosity as either (a) Religious, (b) Agnostic, or (c) Atheist.

Steps to solving conditional probability questions:

(1) Determine the population being described, the two variables of interest, and the categories for each variable.

(2) Draw a Joint Probability Table (JPT), arbitrarily placing one of the variables along the horizontal dimension and the other along the vertical dimension.

(3) Fill in all the cells we can based on the information provided in the question (usually we can fill in the

entire table).

(4) Solve each question using the data we have in our Joint Probability Table.

The dean of McGill has reported the following data regarding student demographics: 70% of McGill students are Canadian, 10% are American, and 20% are International. Furthermore, 20% of McGill students are religious, while 30% are agnostic and 50% atheist. Among Canadian students, 15% are religious and 25% are agnostic. Among International students 42.5% are religious and 47.5% are agnostic.

Step 1 Population: McGill Students Variables: Nationality (Canadian, American, Other) & Religiosity (Religious, Agnostic, Atheist) Step 2 Canadian American International Total Religious Agnostic Atheist Total 1

How to read & use this table: • Each cell contains a probability that is in reference to the entire population.

o For example, the cell where Religious & Canadian intersect tells us the percentage of all McGill students who are both religious and Canadian (the probability that a randomly selected McGill student is a religious Canadian).

o In other words, this cell tells us P(Religious ∩ Canadian). o This cell does not tell us the percentage of Canadian students who are religious, nor does it tell us the

percentage of religious students who are Canadian. o Once the table is filled, these is a way to determine the above probabilities, but the answer won’t be

directly in the table. o This is because % of Canadian students who are religious and % of religious students who are

Canadian are statements about a subset of the population, not the entire population. • It is essential to add in the Total column & row for the two variables in question. Sometimes KMack

will provide the JPT in the question, but without the totals. Add these in; the table is useless without them. o The % of all McGill students who are Canadian for example, denoted as P(Canadian), falls into the cell

that intersects Canadian and Total.

Canadian American International Total Religious Agnostic Atheist Total

Filling in the table Statements regarding the entire population 70% of McGill students are Canadian, 10% are American, and 20% are International: P(Canadian) = 0.7, P(American) = 0.1, P(International) = 0.2 Æ these goes in the “Total” cells 20% of McGill students are religious, while 30% are agnostic and 50% atheist: P(Religious) = 0.2, P(Agnostic) = 0.3, P(Atheist) = 0.5 Æ these also go in the “Total” cells

Canadian American International Total Religious 0.2 Agnostic 0.3 Atheist 0.5 Total 0.7 0.1 0.2 1

Discrete Versus Continuous Random Variables A discrete random variable is a variable for which there are a finite (countable) number of possible values. • Example: Over the next week, the number of days during which there is at least some rain.

o X = the number of rainy days over the next week o X has 8 different possible values: 0, 1, 2, 3, 4, 5, 6, or 7.

• Example: Flip a coin 4 times and count the number of times the coin lands on tails.

o X = number of times the coin lands on tails. o X has 5 different possible values: 0, 1, 2, 3, 4.

A continuous random variable is a variable for which there are an infinite number of possible values. • Examples: Weight of randomly selected McGill undergrad students.

o X = the weight (in pounds) of a randomly selected student o X has an infinite number of possible values. o Not only that, but it is not even possible to list a single possible value. o “What do you mean?”, you may be wondering. “Someone could weigh 150 pounds. X = 150.

THERE, I just listed a possible value of X. In your face!” o To which I would respond: “Really? Someone could weigh EXACTLY 150 pounds? This would

mean their weight in pounds is 150.0000000000000000000000000000…………forever!!! Impossible!

• For continuous random variables, the probability of any specific value is zero. • For continuous random variables, we can only generate probabilities for intervals between two values.

o For example, we may want to know the probability that someone weighs: o Below 100 pounds Æ P( X < 100) o Between 140 and 160 pounds) Æ P (140 < X < 160) o More than 115 Pounds Æ P( X > 115)

Binomial Probability Distribution We are dealing with a binomial distribution when the following four conditions are met:

(1) A specific number of trials (n) are being conducted as part of an experiment o Experiment: Flip a coin 7 times & observe # of heads o Trial: Flip a coin and observe whether it lands on heads or trails o Number of trials in our experiment: 7

(2) There are two possible outcomes in each trial (we arbitrarily label one success and the other failure)

o Flip a coin à heads or tails o Look out the window to check for rain à raining or not raining o Roll a die to see if it lands on 4 à lands on 4, or lands on any other number (1, 2, 3, 5, or 6)

(3) Independence: The probability of success in a given trial is not affected by the outcome in a previous

trial o P(Heads on 2nd trial) = 0.5 regardless of whether we got Heads or Tails in the 1st trial

(4) The probability of success (p) and therefore of failure (q, which equals 1-p) are known

o Let Heads = Success and Tails = Failure à p = P(Heads), q = P(Tails) o P(H) = 0.5, P(T) = 1 – 0.5 = 0.5 o Therefore, p = 0.5, q = 0.5 o Note that p + q must always = 1 since they cover all possible outcomes

• In a binomial probability distribution question, we are asked to calculate the probability of a specific

number of successes (x) occurring out of a certain (greater or equal) number of trials (n), using the follow equation:

P x = C%&p%q&)%

o The purpose of using the binomial equation is when we need to calculate the probability of x successes out of n trials, without knowing which trials contain the successes.

o For example, if we are asked to calculate the probability of a die landing on “3” exactly twice when rolling the die 12 times, it is not specified (nor do we care) where the two 3s will occur amongst the 12 trials.

o The two “3”s could occur on the 1st & 2nd trial, or they could occur on the 5th & 11th trials. It’s the same to us.

o In cases like this, the binomial equation is necessary, since there are many possible ways to achieve two successes in the twelve trials.

• If x = n (all trials are successes) or x = 0 (all trials are failures), then we know the outcome of each trial in the experiment. o In the above two cases, we can use the binomial equation if we wish, but it is not necessary. o Example: If we flip a coin 5 times, find the probability of it landing on heads all 5 times, and find the

probability of it not landing on heads at all.

P(H all 5 times) = P(H 1st time)• P(H 2nd time) • P(H 3rd time) • P(H 4th time) • P(H 5th time) = +,

-

P (No heads) = P(All Tails) = P(Tails all 5 times) = +,

-

HOW TO SOLVE ALL THE BINOMIAL QUESTIONS On each day in June there is a 20% chance of rain, regardless of what the weather has been on previous days. (A) What is the probability that, between June 1st and June 10th, it rains exactly 3 times?

o We know this is a binomial distribution question because the four conditions are met: Þ Experiment: Check the weather each day for 10 days and count the # of rainy days Þ There are 10 trials (n = 10), each with 2 possible outcomes: Rain (success) or No Rain (failure) Þ Trials are independent: If it rains on June 1st, P(Rain June 2nd) = 0.2. If it does not rain on June 1st,

P(Rain June 2nd) =0.8 Þ Probability of success is known: P(Rain) = 0.2, P(No Rain) = 1 - 0.8 = 0.2

Recall that . / = 0/12/31)/ P 3rainydaysoutof10 = P 3successesin10trials = CE+F0.2E0.8J = 0.2013

(B) What is the probability that, over the first 5 days of the month, it rains at least 3 times?

P(at least 3 rainy days) = P(3 or 4 or 5 rainy days) = P(3 rainy days) + P(4 rainy days) + P(5 rainy days) = P(3) + P(4) + P(5) P 3 = CE-0.2E0.8, = 0.0512 P 4 = CM-0.2M0.8+ = 0.0064 P 5 = C--0.2-0.8F = 0.00032

. O + . Q + . R = S. SRTU + S. SSVQ + S. SSSOU = S. SRWXU