chapter 2 – answer keyvisualizations of data

Chapter2–VisualizationsofData AnswerKey

CK-12AdvancedProbabilityandStatisticsConcepts 1

2.1 Histograms

Answers

1.

Number of Plastic Beverage Bottles per Week

Tally Frequency

1 || 2 2 |||| | 6

3 ||| 3 4 || 2 5 ||| 3 6 |||| || 7

7 |||| | 6

8 | 1

The number of students who replied “2” was 6. Thirty minus twenty-four is 6.

2. There is not enough information given to determine the answer.



3. a)

Number of Liters per Person Frequency [ )60 70− 4

[ )70 80− 3

[ )80 90− 0

[ )90 100− 1

[ )100 110− 3

[ )110 120− 2

[ )120 130− 1

[ )130 140− 0

[ )140 150− 0

[ )150 160− 1

b)

c) As the number of bottles increases, the frequency generally decreases. The data is unimodal and skewed right.



4. a)

Class Frequency Relative frequency (%)

Cumulative frequency

Relative cumulative frequency (%)

0-25 7 50 7 50 25-50 1 7.1 8 57.1 50-75 3 21.4 11 78.6 75-100 1 7.1 12 85.7 100-125 1 7.1 13 92.9 125-150 0 0 13 92.9 150-175 0 0 13 92.9 175-200 0 0 13 92.9 200-225 1 7.1 14 100

b)

c)

0102030405060

Rela

tive

Freq

uenc

y (%

BTU

)

Class

Relative Frequency

60

50

40

30

20

10

10

20

100 50 50 100 150 200 250

Relative Frequency (% BTUs)

BTUs



d) e) The distribution is skewed to the right meaning that most materials use recycling materials

to save energy. Only a few (aluminum cans, copper wire, and carpet) save little energy. f) Relative frequency total = 99.8%. The total should be 100% but there could be rounding

errors in the calculations. g) Thehorizontalsectionoftheogiveplotmeanstherewasnodatatoinput(frequency=0)so

accumulateddatadidnotchange.h) Mostofthedatainthechartisfrom0–50milliontonsofBTUspertoninenergysavings.Thiswill

bethesteepestpartoftheogiveplot.Thistellsyouthatthesteepestpartcontainsmostofthedata.

100

90

80

70

60

50

40

30

20

10

10

50 100 150 200 250

Cumulative Relative Frequency (%)

BTUs



5.a)Theoutliersarethefirstandlastbarsofthehistogram.ThefirstbarshowsatleastoneCEOmakingasalaryof$0andthefinalbarshowsatleastoneCEOmakingover$1000000.b)Thesalaryof$300000appearsthemostoften.Approximately14CEOsreporthavingthissalary.c)Approximately6CEOshaveasalaryof$5000006.Thereisanerrorinthetable.Thefirstvalueoftheeighthbinshouldbe66.

The data is grouped in bins such that the chosen interval is appropriate for the dataset. The histogram displays the center as being in the clustered region of 45 – 60. The histogram does not appear to be skewed and there are no obvious outliers. The data appears represent a normal distribution.

7. A dataset that represents continuous data is easily represented using frequency tables, histograms or frequency if the dataset is large. The range of the data must also be great enough to offer a large enough spread in order to create appropriate intervals or bin sizes.

8. A dataset that is large is best when representing data using frequency tables, histograms, frequency polygons or ogive plots. A frequency table helps to order the data but it does not present a visual representation that is easily interpreted. A histogram displays data spread uniformly over the entire interval. The shape of the histogram provides a great deal of information about the distribution of the data. The ogive plot has the cumulative frequency on its x-axis. This representation allows for the study of medians and quartiles.



9. When the distribution’s shape is much skewed or has extreme outliers, the mean will be pulled towards the skewed end making it not very representative of the normal center of the dataset. When the distribution displays a positive skew, the mean is greater than the median. When the distribution has a negative skew, the mean is less than the median. In a normal distribution, the mean is the center of the distribution.

10. Determine the range of the data. (maximum value – the minimum value). Decide how many classes you wish to display on your graph. (usually 7 – 10 bins provide a visual display of the distribution. When you have decided, divide the range by the number of bins to determine the number of values in each class.



2.2 Displaying Categorical Variables

Answers

1.

2.

Material Kilograms Approx. % of Total Weight Plastics 6.21 23 Lead 1.71 6.33 Aluminum 3.83 14.18 Iron 5.54 20.52 Copper 2.12 7.85 Tin 0.27 1 Zinc 0.60 2.22 Nickel 0.23 0.85 Barium 0.05 0.185 Other Elements and chemicals

6.44 23.85

01234567

Wei

ght (

kg)

Material

Weight of Materials in a Typical Desktop Computer



3.

4. Answers will vary. Bar graphs are easier to analyze here because there are so many categories.

5.

6.

7.

Grade # Students Approximate % of Total Grade A 14 48.28 B 7 24.14 C 4 13.79 D 3 10.34 F 1 3.45

02468

10121416

A B C D F

Grad

e

Grade

Grades for Statistics Class

Percentage of Materials in a Typical Desktop Computer

Plastics

Lead

Aluminum

Iron

Copper

Tin

Zinc

Approximate Percentage of Total Grade

A



8. Answers will vary. Although bar graphs are easier to analyze, the relatively few categories make the pie chart easy to analyze as well.

9.

0102030405060708090

Med

ian

Inco

me

(Tho

usan

ds)

Highest Level of Education

Income of Persons age 25+ versus Highest Level of Education



10.

11.

12. Answers will vary. Bar graphs are easier to analyze here because there are so many

categories.

13. The median eliminates outliers. Especially if the data is skewed, the median is used.

Highest Level of Education Median Income of Persons age 25+

Approximate Percentage of Total Income

High School $20,321 4.96 High School Graduate $26,505 6.47 Some College $31,056 7.58 Associate’s Degree $35,009 8.55 Bachelor’s Degree or Higher $49,303 12.04 Bachelor’s Degree $43,143 10.53 Master’s Degree $52,390 12.79 Professional Degree $82,473 20.13 Doctorate Degree $69,432 16.95

Percentage of Total Income

High School

High school graduate

Some college

Associate's degree

Bachelor's degree or higher



2.3 Displaying Univariate Data

Answers

1. Dot plot:

2. The distribution is uniform. The center of the data is approximately 25 with data somewhat

evenly spread from 5 through to 48.

3. Stem-and-leaf plot:

0 5 5

1 1 2 2 3 4 5 9 9

2 0 1 3 5 5 6 6 7 8 8 9

3 0 2 3 3 4 5 6 9

4 0 1 2 2 5 8

4. 27

5. The distribution is left skewed with no outliers.

6. The distribution is left skewed with one outlier.

7. The distribution is symmetric with no apparent outliers.

8. The distribution is right skewed with no apparent outliers.

9. The first data set is symmetric with no apparent outliers. The second data set is symmetric

with no apparent outliers. The third data set is bimodal. The fourth data set is evenly

distributed.

10 20 30 40 Percentage



10. The first data set is centered on 52 with a large peak at 52. The second data set is

centered on 52 with a peak at 52. The third data set is centered at 52 but has peaks at

25 and 85. The fourth data set has no center, all peaks are even.

11. The first dot plot has the smallest standard deviation.

12. The third dot plot has the largest standard deviation.

13. Dot plots are useful with small data sets that use categorical data. When the data

describes qualitative observations, measures of spread or shape are not used. These

characteristics to describe dot plots are used when the categories are numerical.

14. a) Stem-and-leaf plot

3 2 3 6 7 8

4 0 1 3 3 4 4 5 5 5 5 6 6 7 7 7 8 8 8 8 9

5 0 0 0 0 0 0 1 1 2 3 3 3 5 5 5 6 6 6 6 7 7 8 8 9

6 0 1 1 1 2 2 3 9 9

7 0 4

b) Dot Plot

c) The data set is symmetric.

d) Outliers could include 32, 33, and 74.

14

13

12

11

10

9

8

7

6

5

4

3

2

1

1

2

10 20 30 40 50 60 70 80 90

Ages of CEOs



15. The data set in this example is the measurement of pulse rate of 15 teenagers. If one of

the teenagers had their pulse rate measured after running a five mile marathon, this

measurement would be an outlier. If, however, all of the teenagers were in a five mile

marathon and had their pulse rates taken at the finish line, there would be no outliers.

16. Yes. The outliers can be seen as they lie outside the main group of numbers.

17. When using a five number summary, use the interquartile range (IQR) to determine if a

data set contains an outlier. The IQR is found by subtracting the first quartile value from

the third quartile value. Then multiply the IQR by 1.5. If you subtract 1.5 x IQR from the

first quartile value, any numbers less than this are outliers. If you add 1.5 x IQR from

the third quartile value, any numbers more than this are outliers.

18. a) Stem-and-leaf

5 5

6 1 7

7 5 5 6

8 0 0 1 4 5 7 8

9 0 3 3 4

b) Dot Plot

19. Web link does not exist.

12

11

10

9

8

7

6

5

4

3

2

1

1

2

3

10 20 30 40 50 60 70 80 90 100

Exam Scores



2.4 Displaying Bivariate Data

Answers

1. The independent variable is the explanatory variable and the dependent variable is the response variable. Therefore comparing the municipal waste to each state would have the explanatory variable as the state name and the response variable as the amount of waste. If comparing the percentage of each state in the union versus the amount of waste, the percentage would be the explanatory variable and the response variable would be the amount of waste.

2. 13 386 000 tons

3.

4. The direction is positive but there is a weak correlation between the two variables.

5. There is a decrease in the recycling rate of plastic bottles made from PET and an increase in the recycling rate of HDPE.

05000100001500020000250003000035000400004500050000

0 10 20 30 40 50 60

Amt o

f Was

ter (

thou

sand

tons

)

Percentage of State in Union

Percentage of State in Union vs Amount of Municipal Waste



6. The total change in PET recycling went from about 33% to about 22%, so from about 10-12% from the years 1995 to 2001.

7. One explanation was that there was an increase in the use of HDPE in recycling containers and this type of recycled material is used more often in the production of plastic lumber, tables, roadside curbs, benches, truck cargo liners, trash receptacles, stationery (e.g. rulers) and other durable plastic products.

8. This change was the most rapid from the middle of 1995 to the middle of 1996.

9. Dot plots allow for the interpretation of shape, center, and spread but are only used for small sets of data. Stem and leaf plots are useful for seeing the shape of the distribution of data. Both of these plots are used for univariate data sets to determine if the data is symmetric or skewed, to see any gaps and spot outliers. A scatter plot is useful for determining trends in data and the correlation between the explanatory and response variables. Scatter plots are used for bivariate data sets to see the general relationship between the variables.

10. Median and IQR can be used to describe any set of data but are particularly useful for skewed data. When data is skewed or has extreme outliers, the mean is pulled toward the skewed end. This makes the mean not representative of the middle



2.5 Box-and- Whisker Plots

Answers

1.

Min X Lower Quartile Median Upper Quartile Max X 35 53 67.5 75.5 95

2. 3 1

75.5 5322.5

IQR Q QIQRIQR

= −= −=

1 1.553 1.5(22.5) 19.5Q IQR− ∗

− = 3 1.5

75.5 1.5(22.5) 109.25Q IQR+ ∗

+ =

There are no data values less than 19.5 and none greater than 109.25. Therefore, there are no outliers.

3. The third quarter of the data is more densely concentrated in a smaller area. 50% of the data is between 53 and 75.5. The data is very close to being symmetric although the data does skew slightly to the left.

4. The median of the data is 67.5. The mean should be pulled left in the direction of the skewness and thus be smaller than the median. The mean of the data is 65.7.

5.

Min X Lower Quartile Median Upper Quartile Max X 0 72 82 89 105

6. 3 1

89 7217

IQR Q QIQRIQR

= −= −=

1 1.572 1.5(17) 46.5Q IQR− ∗

− = 3 1.589 1.5(17) 114.5Q IQR+ ∗

+ =

There are three data values 0, 4, and 46 that are less than 46.5. These values are outliers for this data set. There are no data values greater than 114.5.

7. The data in the lower 25% are widely spread compared to the other sections of the graph. 50% of the data is between 72 and 89. The data is moderately symmetric although it does skew to the left.



8. The median of the data is 82. The mean should be pulled left in the direction of the skewness and should be considerably smaller than the median. The mean of the data is 75.4.

9.

The median of the data for Utah is higher than that of Idaho. This indicates that the reservoirs in Idaho are less full than those in Utah. The IQR of the dataset for Idaho is 22.5 spread out between a capacity of 75.5 and 53. The IQR of the dataset for Utah it is 17 spread out between a capacity of 89 and 72. Therefore 50% of the capacity percentages in Utah are more concentrated than those of Idaho.



2.6 Effects on Box-and-Whisker Plots

Answers

1.

Min X Lower Quartile Median Upper Quartile Max X 3.12 3.22 3.282 3.393 3.528

2. 3 1

3.393 3.220.173

IQR Q QIQRIQR

= −= −=

1 1.53.22 1.5(0.173) 2.9605Q IQR− ∗

− = 3 1.5

3.393 1.5(0.173) 3.6525Q IQR+ ∗

+ =

There are no outliers since there are no data values less than 2.9605 and none greater than 3.6525.

3.

4.

Min X Lower Quartile Median Upper Quartile Max X .8242 .8506 .8670 .8963 .9320

The center and the measures of spread for the given dataset will decrease by a factor of 1/3.7854 or 0.2642. The boxplots for both datasets will have the same shape but the plot for US gallons will be stretched out more.



5.

State Average Price of a Gallon of Gas (US$)

Average Price of a Liter of Gas (US$)

Alaska 3.458 0.833 Washington 3.528 0.874

Idaho 3.26 0.867 Montana 3.22 0.861

North Dakota 3.282 0.824 Minnesota 3.12 0.914 Michigan 3.352 0.896 New York 3.393 0.859 Vermont 3.252 0.851

New Hampshire 3.152 0.932 Maine 3.309 0.886

6. This data was retrieved from the website http://fuelgaugereport.opisnet.com/sbsavg.html on July 5, 2014. The data will vary depending on the day.

State Average Price of a Gallon of Gas (US$) Alaska 4.121

Washington 4.120 Idaho 3.789

Montana 3.591 North Dakota 3.705

Minnesota 3.588 Michigan 3.869 New York 3.809 Vermont 3.756

New Hampshire 3.658 Maine 3.722

7. The prices are already in US$.



8. A dot plot and a stem-and-leaf plot are used when the dataset consists of a small number of values. A histogram and a box-and-whisker plot are used when the dataset is large.

9. Histograms and Box-and-Whisker Plots

10. The center of the distribution would change by the same scale factor. Calculations like the range, the IQR and the standard deviation will change proportionally by the same scale factor. The five-number summary would also change proportionally.

11. e

12. a) The median is 121.5

b) The lower quartile is 114. The upper quartile is 129.5

c) 3 1

129.5 11415.5

IQR Q QIQRIQR

= −= −=

d) 1 1.5114 1.5(15.5) 90.75Q IQR− ∗

− = 3 1.5129.5 1.5(15.5) 152.75Q IQR+ ∗

+ =

There are no outliers since no data values are less than 90.75 or greater than 152.75

e)

chapter 2 – answer keyvisualizations of data

Documents