wednesday, august 11 (131 minutes) - stevewillott.comstevewillott.com/17-18 ap stats notes in word/1...

16
1 Name _____________________________ Chapter 1 Learning Objectives Section Related Example on Page(s) Relevant Chapter Review Exercise(s) Can I do this? Identify the individuals and variables in a set of data. Intro 3 R1.1 Classify variables as categorical or quantitative. Intro 3 R1.1 Display categorical data with a bar graph. Decide whether it would be appropriate to make a pie chart. 1.1 9 R1.2, R1.3 Identify what makes some graphs of categorical data deceptive. 1.1 10 R1.3 Calculate and display the marginal distribution of a categorical variable from a two-way table. 1.1 13 R1.4 Calculate and display the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table. 1.1 15 R1.4 Describe the association between two categorical variables by comparing appropriate conditional distributions. 1.1 17 R1.5 Make and interpret dotplots and stemplots of quantitative data. 1.2 Dotplots: 25 Stemplots: 31 R1.6 Describe the overall pattern (shape, center, and spread) of a distribution and identify any major departures from the pattern (outliers). 1.2 Dotplots: 26 R1.6, R1.9 Identify the shape of a distribution from a graph as roughly symmetric or skewed. 1.2 28 R1.6, R1.7, R1.8, R1.9 Make and interpret histograms of quantitative data. 1.2 33 R1.7, R1.8 Compare distributions of quantitative data using dotplots, stemplots, or histograms. 1.2 30 R1.8, R1.10 Calculate measures of center (mean, median). 1.3 Mean: 49 Median: 52 R1.6 Calculate and interpret measures of spread (range, IQR, standard deviation). 1.3 IQR: 55 Std. dev: 60 R1.9 Choose the most appropriate measure of center and spread in a given setting. 1.3 65 R1.7 Identify outliers using the 1.5 × IQR rule. 1.3 56 R1.6, R1.7, R1.9 Make and interpret boxplots of quantitative data. 1.3 57 R1.7 Use appropriate graphs and numerical summaries to compare distributions of quantitative variables. 1.3 65 R1.8, R1.10

Upload: trantruc

Post on 31-Aug-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Name _____________________________

Chapter 1 Learning Objectives Section

Related

Example

on Page(s)

Relevant

Chapter

Review

Exercise(s)

Can I do

this?

Identify the individuals and variables in a set of

data. Intro 3 R1.1

Classify variables as categorical or

quantitative. Intro 3 R1.1

Display categorical data with a bar graph.

Decide whether it would be appropriate to

make a pie chart.

1.1 9 R1.2, R1.3

Identify what makes some graphs of

categorical data deceptive. 1.1 10 R1.3

Calculate and display the marginal distribution

of a categorical variable from a two-way table. 1.1 13 R1.4

Calculate and display the conditional

distribution of a categorical variable for a

particular value of the other categorical

variable in a two-way table.

1.1 15 R1.4

Describe the association between two

categorical variables by comparing appropriate

conditional distributions.

1.1 17 R1.5

Make and interpret dotplots and stemplots of

quantitative data. 1.2

Dotplots: 25

Stemplots: 31 R1.6

Describe the overall pattern (shape, center, and

spread) of a distribution and identify any major

departures from the pattern (outliers).

1.2 Dotplots: 26 R1.6, R1.9

Identify the shape of a distribution from a

graph as roughly symmetric or skewed. 1.2 28

R1.6, R1.7,

R1.8, R1.9

Make and interpret histograms of quantitative

data. 1.2 33 R1.7, R1.8

Compare distributions of quantitative data

using dotplots, stemplots, or histograms. 1.2 30 R1.8, R1.10

Calculate measures of center (mean, median). 1.3

Mean: 49

Median: 52 R1.6

Calculate and interpret measures of spread

(range, IQR, standard deviation). 1.3

IQR: 55

Std. dev: 60 R1.9

Choose the most appropriate measure of center

and spread in a given setting. 1.3 65 R1.7

Identify outliers using the 1.5 × IQR rule. 1.3 56

R1.6, R1.7,

R1.9

Make and interpret boxplots of quantitative

data. 1.3 57 R1.7

Use appropriate graphs and numerical

summaries to compare distributions of

quantitative variables.

1.3 65 R1.8, R1.10

2

1.1 Analyzing Categorical Data

Read 2–4

Fr/Soph/Jr/Sr g.p.a

Email address

Name

Bus route

Phone number

Days absent

Address

Credits earned

Allergies

Current on immunizations

Exterior color mileage

Total car length

Number of cylinders

Cost

Model

VIN

Type of sound system

Size of fuel tank

What do we call these two kinds of variables? What’s the difference?

Why do people sometimes confuse the two kinds of variables?

What is a distribution? It’s all the values that a variable can take on and how often.

3

Alternate Example: Willott’s music

Here is information about 12 randomly selected songs in Willott’s music library.

Song Title Artist Album

year

Track

Length Genre

Tracks on

the album

Track

Number

Double Dare Bauhaus 1980 4:54 Gothic 9 1

Carpe Noctum Tiesto 2007 7:03 Dance/Electronic 12 4

She Wolf Shakira 2009 3:10 Latin 12 1

Come as You Are Nirvana 1991 3:39 Alternative 12 3

The Heinrich

Maneuver Interpol 2007 3:35 Alternative 11 4

Shake It Out Florence +

The Machine 2011 4:38 Alternative 12 2

My Songs Know What

You Did in the Dark

(Light Em Up)

Fall Out Boy 2013 3:07 Alternative 11 2

Locked Out of Heaven Bruno Mars 2012 3:53 Pop 10 2

Womanizer Britney

Spears 2008 3:44 Pop 13 1

Iceolate Front Line

Assembly 1990 5:13 Industrial 10 7

I Bet You Look Good

On The Dancefloor

Arctic

Monkeys 2006 2:54 Indie 13 2

Meat is Murder The Smiths 1985 6:06 Alternative 9 9

(a) Who are the individuals in this data set?

(b) What variables are measured? Identify each as categorical or quantitative. In what units were the

quantitative variables measured?

(c) Describe the individual in the first row.

Read 7–11

What's the difference between a data table, a frequency table, and a relative frequency table?

Data table Frequency table Relative frequency table

tells values of variables for

individuals

tells distribution of 1 variable in

table form

tells distribution of 1 variable as a

%, decimal, or fraction

Which one was the previous example?

When making pie charts and bar graphs, what do people often mess up?

4

Bar Graphs Pie Charts

Pros Quick & easy Show part-whole relationships well

Cons part-whole relationships are hard to see They’re hard to make by hand.

Don't use when percents don't add up to 100%.

Let's search "misleading graph" and see some examples.

Identify some particular problems many of these graphs share.

HW #11: page 7 (1, 3, 5, 7, 8), page 22 (11, 13, 15, 17, 18)

Read 12–18

Examples of:

…two-way table (2 variables are shown with counts or frequencies)

Senior Non-senior

Boy 8 3

Girl 15 4

…marginal distribution (totals for rows & columns; the distribution for each variable)

Senior Non-senior Totals

Boy 8 3 11

Girl 15 4 19

Totals 23 7 30

…conditional distribution (distribution of one variable as a % of the other variable)

Senior Non-senior

Boy 35% 43%

Girl 65% 57%

Totals 100% 100%

How do we know which variable to condition on? Divide by the explanatory variable totals.

Senior Non-senior Totals

Boy 73% 27% 100%

Girl 79% 21% 100%

Died Survived

Hospital A

Hospital B

5

What is a segmented (or stacked) bar graph?

Use a segmented bar graph to compare conditional distributions, to look for differences, and to look for

patterns.

When knowing the value of one variable helps predict the value of the other, we say that the variables are

associated. Association appears in a segmented bar graph when we see big differences in the proportions. The

proportions may be “flipped” or reversed.

Careful! An association does NOT

automatically mean that there is a

cause-and-effect relationship.

The boy/girl senior/non-senior graphs

did not show much association.

Alternate Example: Horseshoe Crabs

Two members of the University of Florida at Gainesville Department of Zoology collected data on Horseshoe

Crabs on a Delaware beach during 4 days in the late spring of 1992. Based on the color of the shells, they

classified each crab as Young, Intermediate, or Old and whether the crabs could right themselves when flipped

on their backs or whether they were stranded for at least a certain period of time. Here are the results.

Young Intermediate Old Total

Stranded 214 384 295 893

Not Stranded 1668 1204 216 3088

Total 1882 1588 511 3981

(a) Explain what it would mean if there was no association between age and strandedness.

(b) Does there appear to be an association between age and strandedness in this sample? Justify.

6

HW #12: page 22 (19, 21, 23, 25, 27–34)

And now, we change from categorical data to quantitative data…

1.2 Displaying Quantitative Data with Graphs Elmer and Ethel have retired and want to move someplace warm. The couple is considering nine different cities.

The dotplots below show the distribution of average daily high temperatures in December, January, and

February for each of these cities. Help them pick a city by answering the questions below, based on the data

shown in the graph.

1. What is the typical high temperature for these months in Phoenix, Orlando, and San Juan? Which of those 3 cities is

most similar in this respect to Palm Springs? (Look for the center: the average, median, or typical value.)

2. Are daily high temperatures for these months more predictable in Palm Springs or in Orlando? (Look at the spread:

the variation, including the range.)

3. What might be unique to Atlanta, San Diego, and Honolulu? (Look for outliers: unusual values.)

4. What makes San Juan and San Diego somewhat similar to one another? Likewise, Palm Springs, Phoenix, and

Orlando are similar to one another in this way, but different from the first group. (Look at the shape: symmetry vs.

asymmetry.)

palmspring...

atlantaH

phoenixH

sandiegoH

orlandoH

miamiH

keywestH

honoluluH

sanjuanH

60 65 70 75 80 85 90

Average High Temperatures Dot Plot

7

Read 25–27 Notice that we are now looking at quantitative data!

How should we describe the distribution of a quantitative variable? Use “SOCS”

Center- Typical value, such as the mean or the median

Spread- Range for now (we'll also use standard deviation and interquartile range "IQR")

Outliers- Unusual values for now (we'll eventually use the "1.5IQR Rule")

Shape- Address the graph's # of peaks and its symmetry

(unimodal, bimodal, multimodal, uniform, symmetric, asymmetric, skewed left, skewed right)

Read 27–29 Examples and descriptions of various shapes of distributions:

Unimodal Symmetric

Curve Dotplot Histogram

Heights on adult women Expected sums on 36 rolls

of two 6-sided dice Length of growing

seasons in St. Louis

Bimodal

Curve Dotplot Histogram

Heights of men and women Maximum angle of a

Observed sums on 35 rolls sample of roller coasters

of a 4-sided die and an 8-sided die

Unimodal Skewed Left

Curve Dotplot Histogram

Heights of kids at a

middle school dance Time to finish a difficult test Heights in my extended family

Unimodal Skewed Right

Curve Dotplot Histogram

Salaries of MLB players Selling prices of homes

in a new subdivision Scores on a multiple choice pre-test

over completely new material

Uniform

Curve Dotplot Histogram

Expected outcomes of spins of a

spinner with equally-sized spaces Outcomes of 36 rolls Ages of students

numbered 1-10 of a 6-sided die in a school district

8

Here are the number of calories per item for 16 convenience store sandwiches, along with a dotplot of the data.

360 430 440 440 440 450 450 460

470 480 480 490 490 490 500 510

Describe the shape, center, and spread of the distribution. Are there any outliers?

Read 29–30

When asked to compare two distributions, be sure that you compare and don’t just describe!

Be sure that you use “less”, “more”, and “-er” words.

How does the annual energy consumption (kWh/year) compare for top-loading washing machines and front-

loading washers? The data below is from the Home Depot website. There are 26 front-loaders and 32 top-

loaders included.

Read 31–32

Caution! Remember to include a key when making a stemplot (stem-and-leaf-plot).

If you write "19 | 7", is that 197, 19.7, 1970, ...?

9

How do gas prices in St. Charles County compare to those in Madison County, where Alton, Illinois is located?

A sample of gas prices was taken on several days in July 2015. Make a back-to-back stemplot and compare the

distributions. St. Charles Co.: 2.56, 2.56, 2.57, 2.57, 2.58, 2.58, 2.58, 2.58, 2.59, 2.59, 2.59, 2.59, 2.60, 2.60, 2.61

Madison Co.: 2.67, 2.68, 2.69, 2.69, 2.70, 2.70, 2.70, 2.71, 2.71, 2.71, 2.71, 2.72, 2.72, 2.73, 2.74

HW #13: page 41 (37, 39, 43, 45, 47)

1.2 Histograms The following table presents the total number of triples (3B) for the 30 MLB teams in the 2014 regular season.

Make a dotplot to display the distribution of triples for the season. Then, use your dotplot to make a histogram

of the distribution. Team 3B Team 3B Team 3B

Arizona 47 Pittsburgh 30 Toronto 24

San Francisco 42 San Diego 30 Tampa Bay 24

Colorado 41 Kansas City 29 Cleveland 23

LA Dodgers 38 Milwaukee 28 Atlanta 22

Miami 36 Texas 28 St. Louis 21

Oakland 33 Minnesota 27 Boston 20

Chicago Sox 32 Washington 27 Cincinnati 20

Seattle 32 Philadelphia 27 Houston 19

LA Angels 31 Detroit 26 NY Mets 19

Chicago Cubs 31 NY Yankees 26 Baltimore 16

Read 33–36

When you make a histogram...

...you can turn a dotplot into a histogram.

... be consistent with "fence sitters".

... be consistent with spacing and bin width.

10

Read 38–41

When might we want a relative frequency histogram rather than a frequency histogram?

…to see part-whole relationships or to compare 2 groups

HW #14: page 45 (51, 53, 55, 59–62)

1.3 Describing Quantitative Data with Numbers

Read 48–50

x is is a statistic; "x bar" is the sample mean. is a parameter; "mu" is the population mean.

When adding a very large or very small data value to a data set (or changing a data value to something very

large or very small) does not change the value of a statistic very much, or at all, we say that the statistic is

resistant.

The mean is not a resistant measure of center. Adding an extreme value, or altering a value to make it extreme,

will change the value of the mean quite a bit. Think about what happens to the average age of people in the

classroom when Mr. Willott walks in.

The mean is the balancing point.

Approximately where will the mean be located, when looking at a histogram or dotplot?

Read 51–53

The median is a resistant measure of center. Adding an extreme value, or altering a value to make it extreme,

will not change the value of the median much, if at all. Think about what happens to the median age of people

in the classroom when Mr. Willott walks in.

If we know the shape of a distribution, as shown below, then where are the mean and the median located in

relation to one another?

roughly symmetric exactly symmetric skewed

StL_winter_Avg_High_Temps

36 38 40 42 44 46 48 50

Average High Temperatures Dot Plot

11

Read 53–55

The range = highest data value minus lowest data value. The range is a single number and it is not a resistant

measure of spread. An extreme value will affect the value of the range. Think about what happens to the range

of ages of people in the classroom when Mr. Willott walks in.

The median divides an ordered list of data into two equal groups.

The quartiles divide an ordered list of data into four equal groups.

The interquartile range (IQR) is the spread of the middle 50% of the data. The IQR is a resistant measure of

spread. Think about what happens to the range of the middle 50% of ages of people in the classroom when Mr.

Willott walks in.

Here are data on the amount of fat (in grams) in 9 different Taco Bell menu items. Calculate the median,

quartiles, and IQR.

Read 57–58

What is the 1.5 IQR Rule for identifying outliers?

Illustration by

Kelly Boles

Item Fat (g)

Crunchy Taco 10

Nachos Supreme 24

Cheese Quesadilla 26

Chicken Quesadilla 27

Mexican Pizza 31

Taco Salad (steak) 37

Nachos BellGrande 39

XXL Grilled Stuft Burrito – Beef 41

Taco Salad (original) 42

12

How many fat grams would qualify as an outlier for the Taco Bell items?

Are there outliers among the 9 taco bell items?

Here are data for the calories for 12 McDonald’s menu items. Are there any outliers?

Read 56–58

The five-number summary: Minimum, Q1, Median, Q3, Maximum

A boxplot is a graph that is related to the five-number summary.

Draw a boxplot for the Taco Bell data. Check yours against the one that the graphing calculator makes.

Here are parallel boxplots for the heights of baseball players for 5 of the 2005 MLB teams. Compare these

distributions.

Sandwich Calorie

32 oz. Chocolate Shake 1160

Big Breakfast®

740

Big Mac® 540

Sausage Biscuit with Egg 510

McRib®

500

10 pc. McNuggets®

460

Double Cheeseburger 440

Quarter Pounder® 410

Filet-O-Fish®

380

McChicken®

360

Large Caramel Latte 330

Large Vanilla Iced Coffee 270

Item Fat (g)

Crunchy Taco 10

Nachos Supreme 24

Cheese Quesadilla 26

Chicken Quesadilla 27

Mexican Pizza 31

Taco Salad (steak) 37

Nachos BellGrande 39

XXL Grilled Stuft Burrito – Beef 41

Taco Salad (original) 42

13

HW #15: page 47 (65, 69–74), page 69 (79, 81, 83, 85, 86, 87, 89, 91, 93)

1.3 Standard Deviation Arnold ran each afternoon for 5 days. His distances (in miles) were 10, 10, 10, 10, and 10.

Find the mean (or average) number of miles that Arnold ran each day. ____________________

Complete the table:

Table for Arnold's distances

Distances Difference from the mean Square of difference from the

mean

10

10

10

10

10

Sum of squared differences:

Sum of squared differences divided by 4 (since there were 5 distances):

Square root of the sum of squared differences divided by 4:

That last value is the standard deviation for the distances Arnold ran. What are the units? ____________

The number above it is the variance for the distances. What are the units? ____________

Becky ran each afternoon for 5 days. Her distances (in miles) were 8, 9, 10, 11, and 12.

Find the mean (or average) number of miles that Becky ran each day. ____________________

Complete the table:

Table for Becky's distances

Distances Difference from the mean Square of difference from the

mean

8

9

10

11

12

Sum of squared differences:

Sum of squared differences divided by 4 (since there were 5 distances):

Square root of the sum of squared differences divided by 4:

That last value is the standard deviation for the distances Becky ran. What are the units? ____________

14

The number above it is the variance for the distances. What are the units? ______________

Caleb ran each afternoon for 5 days. His distances (in miles) were 7, 9, 10, 11, and 13.

Find the mean (or average) number of miles that Caleb ran each day. ____________________

Complete the table:

Table for Caleb's distances

Distances Difference from the mean Square of difference from the

mean

7

9

10

11

13

Sum of squared differences:

Sum of squared differences divided by 4 (since there were 5 distances):

Square root of the sum of squared differences divided by 4:

That last value is the standard deviation for the distances Caleb ran. What are the units? _____________

The number above it is the variance for the distances. What are the units? _________________

Donna ran each afternoon for 5 days. Her distances (in miles) were 3, 3, 4, 5, and 35.

Find the mean (or average) number of miles that Donna ran each day. ____________________

Complete the table:

Table for Donna's distances

Distances Difference from the mean Square of difference from the

mean

3

3

4

5

35

Sum of squared differences:

Sum of squared differences divided by 4 (since there were 5 distances):

Square root of the sum of squared differences divided by 4:

That last value is the standard deviation for the distances Donna ran. What are the units? ___________

The number above it is the variance for the distances. What are the units? ____________

15

The standard deviation measures the typical distance the data are from the mean.

The range, IQR, and standard deviation all measure variation or spread, but only the IQR is resistant.

Read 60–62

If s =4, then 𝑠2=16. If 𝑠2 =9, then s=3. If 𝜎2 =25, then 𝜎 =5. If 𝜎 =6, then 𝜎2 =36.

Four important properties of the standard deviation:

Standard deviation ≥ 0. (0 means no variation, a large number means lots of variation.)

Standard deviation units are the same as the units for the data.

Standard deviation is not resistant.

Standard deviation measures spread around the mean.

s=5 s=6.22 s=9.52 s=10.7

A random sample of 5 students was asked how many minutes they spent listening to music outside school hours

the previous day. They responded: 20, 30, 60, 90, 120. Calculate and interpret the standard deviation.

Read 63–66

Of mean, median, IQR, and standard deviation, which summary statistics will we typically use for each

situation?

Symmetric

Skewed

Center

Spread

Standard deviation Variance

Square root of variance Square of standard deviation

s= sample standard deviation 𝑠2= sample variance

𝜎= population standard deviation 𝜎2= population variance

16

HW #16: page 71 (95, 97, 99, 101–105, 107–110) FRAPPY! page 74

HW #17: page 76 Chapter Review Exercises

Review Chapter 1

HW #18: page 78 Chapter 1 AP Statistics Practice Test

Chapter 1 Test