week 3

26
Week 3 Chapters 5, 7, 12

Upload: remy

Post on 22-Mar-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Week 3. Chapters 5, 7, 12. Chapter 5. Outliers, Fences, Box plots. Outliers (p.95). An outlier is a value that is located very far away from almost all of the other values. An observation that is unusually large or small relative to the other values in a data set is called an outlier. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Week 3

Week 3

Chapters 5, 7, 12

Page 2: Week 3

Chapter 5Outliers, Fences, Box plots

Page 3: Week 3

Outliers

(p.95)

• An outlier is a value that is located very far away from almost all of the other values.

• An observation that is unusually large or small relative to the other values in a data set is called an outlier.

Outliers occur by:1. Being observed, recorded, or

entered into the computer incorrectly.

2. The data value is correct and represents a rare event.

Page 4: Week 3

Detecting OutliersDetermine the fences. Fences are cutoff points for

outlier.Lower Fence = Q1 – 1.5(IQR)Upper Fence = Q3 + 1.5(IQR)•If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier.

ExampleThe following data represent income (in thousands of dollars) for a sample of 12 students from Cornell University – 5 years after graduation.

35 29 44 72 34 64 41 50 54 104 39 58

Page 5: Week 3

Cont. Example

29 34 35 39 41 44 50 54 58 64 72 104

Lower Fence = Q1 – 1.5 x IQR = 37-1.5(61-37)=1Upper Fence = Q3 +1.5 x IQR= 61 +1.5(61-37)= 97

Outlier: 104

Q1=37

Q3=61

Min MaxMedian=(44+50)/2=47

Page 6: Week 3

Cont. Example

29 34 35 39 41 44 50 54 58 64 72 104

Lower Fence = Q1 – 1.5 x IQR = 37-1.5(61-37)=1Upper Fence = Q3 +1.5 x IQR= 61 +1.5(61-37)= 97

Outlier: 104

Q1=37

Q3=61

Min MaxMedian=(44+50)/2=47

100806040200 120

LF UFQ1 M Q3

Page 7: Week 3

Cont. Example

29 34 35 39 41 44 50 54 58 64 72 104

Lower Fence = Q1 – 1.5 x IQR = 37-1.5(61-37)=1Upper Fence = Q3 +1.5 x IQR= 61 +1.5(61-37)= 97

Outlier: 104

Q1=37

Q3=61

Min MaxMedian=(44+50)/2=47

100806040200 120

LF UFQ1 M Q3

Represents the highest and lowest values within LF, UF

Page 8: Week 3

Cont. Example

29 34 35 39 41 44 50 54 58 64 72 104

Lower Fence = Q1 – 1.5 x IQR = 37-1.5(61-37)=1Upper Fence = Q3 +1.5 x IQR= 61 +1.5(61-37)= 97

Outlier: 104

Q1=37

Q3=61

Min MaxMedian=(44+50)/2=47

100806040200 120

LF UFQ1 M Q3

Represents any outliers

Page 9: Week 3

Cont. Example

29 34 35 39 41 44 50 54 58 64 72 104

Lower Fence = Q1 – 1.5 x IQR = 37-1.5(61-37)=1Upper Fence = Q3 +1.5 x IQR= 61 +1.5(61-37)= 97

Outlier: 104

Q1=37

Q3=61

Min MaxMedian=(44+50)/2=47

100806040200 120

LF UFQ1 M Q3

“Box plot”“Whiskers”

Page 10: Week 3

Box Plots and Skewness (p.91)

Shape & Box PlotShape & Box Plot

RightRight--SkewedSkewedLeftLeft--SkewedSkewed SymmetricSymmetric

QQ11 MedianMedian QQ33QQ11 MedianMedian QQ33 QQ11 MedianMedian QQ33

NOTE: Often statisticians will represent box plots vertically.(this may happen on your test)Remember: Q1 is at the bottom, Q3 is at the top

Which of the vertical box plots is skewed right?Which of the vertical box has the highest median?Which of the vertical box has the biggest range?Which of the vertical box has the lowest IQR?

#3#1

All the same#2

Page 11: Week 3

Chapter 7Scatterplots, Association, and

Correlation

Page 12: Week 3

Scatter plot• A scatter plot is the most common display for comparing two quantitative variables .

• By just looking at them, you can see patterns, trends, and relationships.

Page 13: Week 3

Direction of the relationship

A pattern like this (runs from the upper left to the lower right) is said to be negative.

A pattern running the other way is called positive.

Page 14: Week 3

Strength of relationshipStrength: how much scatter.

Weak relationship

Strong relationship

Page 15: Week 3

Correlation coefficient

• Correlation coefficient (r) is a measure of relationship between two qualitative variables. It determines the degree of association.

• The correlation coefficient will vary from -1 to 1. A -1 indicates perfect negative correlation, and +1 indicates perfect positive correlation.

Page 16: Week 3

Scatterplots of Paired Data

Page 17: Week 3

Chapter 12: Sample Surveys

Page 18: Week 3

Sample VS Population“We’d like to know about an entire population of individuals, but examining all of them is usually impractical, if not impossible. So we settle for examining a smaller group of individuals—a sample—selected from the population”

--Page 303 & 304

Sample Survey: “…ask questions of a small group of people in the hope of learning something about the entire population.”

Example: You’re bringing pizza to a party, and you have two options – Pizza Hut or Papa Johns. Instead of calling the 100 friends that might show up, you decide to call 5 and ask for their preference.

Page 19: Week 3

Sample VS Population“We’d like to know about an entire population of individuals, but examining all of them is usually impractical, if not impossible. So we settle for examining a smaller group of individuals—a sample—selected from the population”

--Page 303 & 304

Biased Survey – “Sampling methods that tend to over- or underemphasize some characteristics of the population”

Example: If you want to know the proportion of Americans that consider themselves Republican, it would be a bad idea to survey people in Utah alone. Recent polls show that Utah is the most Republican state in the country.

Page 20: Week 3

Sample VS Population“We’d like to know about an entire population of individuals, but examining all of them is usually impractical, if not impossible. So we settle for examining a smaller group of individuals—a sample—selected from the population”

--Page 303 & 304

Randomizing – “[Protects us from bias, by] making sure that , on average, the sample looks like the rest of the population”

Example: It’s final exam week at Cornell University, and 1,200 calculus students are taking a standardized test in the library at 4pm. You want to sample 30 students to find out if the test was easier or harder than expected. What is a better idea (with respect to eliminating bias) – sampling the first 30 students to finish or randomly choosing the students selected for the survey? Why?

Page 21: Week 3

4 Types of Random SamplesType 1: Simple Random Sample (SRS)

When choosing a sample of size n from a given population,the sample is called a simple random sample if every possible sample of size n has an equal chance to be selected

In a SRS, every person is equally likely to be chosen

However, if you choose a sample such that everyone is equally likely to be chosen, it is not necessarily SRS.

Example 1: Suppose you want to select a sample of 100 students from a school where there are 100 males and 100 females. Choose your sample this way: flip a coin, if heads choose all the males, if tails, choose all the females. Every person has an equally likely chance of being selected (50%), BUT THIS IS NOT SRS!!! Why?

Page 22: Week 3

4 Types of Random SamplesType 1: Simple Random Sample (SRS)

When choosing a sample of size n from a given population,the sample is called a simple random sample if every possible sample of size n has an equal chance to be selected

In a SRS, every person is equally likely to be chosen

However, if you choose a sample such that everyone is equally likely to be chosen, it is not necessarily SRS.

Example 2: A better way to sample 100 students from a school with 200 students: Use Minitab to assign a unique random number to each student (1, 2, 3,… 200), then choose the first 100 numbers to be in your sample.

Page 23: Week 3

4 Types of Random SamplesType 2: Stratified Sample

“First [slice the population] into homogeneous groups, called strata, before the sample is selected. Then SRS is used within each stratum; combine the selections from each stratum into one large sample.” -- page 310

Example from book: page 310 – football example

Idea: Suppose we want to know how Akron University feels about using funds to support the football team. Men and women feel differently about using funds in this way. If the school is 60% men and 40% women, we want our sample to represent this. So, if we are going to sample 100 people, we should separate the men and women, then randomly sample exactly 40 women (40% of 100) and 60 men (60% of 100)

Page 24: Week 3

4 Types of Random SamplesType 3: Cluster Sample

“Splitting the population into representative clusters can make sampling more practical. Then we could simply select one or a few clusters at random and [include all observations in these clusters in our sample].”

Example from book: page 311 – Sentence Length

Idea: Suppose we want to know the length of the average sentence in the textbook. SRS is complicated in this case, because we would have to count each sentence individually. However, if we believe each page is “representative” of the entire book, then we can just choose a few pages at random and combine the sentences found on these pages as one sample. Each page is a “cluster” of “representative” sentences.

Page 25: Week 3

4 Types of Random SamplesType 3: Cluster Sample

“Splitting the population into representative clusters can make sampling more practical. Then we could simply select one or a few clusters at random and [include all observations in these clusters in our survey].”

IN CLUSTER SAMPLE:divide population into representative “clusters”main goal: makes sampling easier

IN STRATIFIED SAMPLE:divide population into nonrepresentative “stratum”main goal: gives a more representative sample

Page 26: Week 3

4 Types of Random SamplesType 4: Systematic Sample

Some samples select individuals systematically – for example: choose every 10th person in an alphabetical list. This is not SRS (why?), but it is still representative as long as the order you choose observations is not related to the variable(s) you’re measuring.