l02: chapter 2 - weber state universityfaculty.weber.edu/brandonkoford/quant2600/l02statsi... ·...

Post on 12-Mar-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

L02: Chapter 2 Descriptive Statistics: Tabular and Graphical

Presentations

Introduction

Summarizing categorical Data

◦ Looking at things like “Female/Male” “State of Birth”

and organizing the data

Frequency Distribution: Table showing the

number (frequency) of items in nonoverlapping

classes.

◦ Classes: Groups or categories

Example: Female clothing store is trying to

decide if it should spend money advertising on

or near campus. The store wants to understand

the “shape” of gender data on campus.

◦ What are the classes of data of interest?

Frequency and Distributions

Frequency: Count occurrences

◦ Frequency Distribution: Puts frequency info in a ______

Relative Frequency: Turn occurrences into a decimal

◦ Relative Frequency Distribution: Puts relative frequency info

into a _______

Percent Frequency: Turn occurrences into a percent

◦ Percent frequency distribution: puts percent frequency

information into a _______

Lets add a couple of columns to our table to include

relative and percent frequency distributions.

Graphical Representations

Bar Graph: Graphical device for showing

categorical data from a frequency distribution

◦ Horizontal axis _______

◦ Vertical axis some measure of _______

◦ Bars themselves are the _______ width

Pie Chart: Another graphical device for

showing frequency distributions

◦ Draw a circle, then use relative frequencies to divide

the circle for each class.

◦ Relative frequency of .25 .25(360) = 90 Degrees

Summarizing Quantitative Data –

Ch02 Section 2 Frequency Distribution

Relative Frequency Distribution

Percent Frequency Distribution

Histogram

Cumulative Distributions

Example

Rock Climbing Gym in Ogden

◦ Most people responding to advertising for the

climbing between 22-23 years old

◦ Again we want to know if it should spend money on

advertising.

◦ What is different about the data of interest compared

to the female clothing store example?

◦ It is _______!

NOTE – With categorical data, classes are

determined by categories of the data

◦ With quantitative, researcher has to choose classes.

Not a precise science

Choosing Classes

Guidelines to choosing classes

◦ Use between 5 and 20 classes

◦ Data sets with more elements typically require more

classes.

◦ Choose classes so that you can see variation in the data

◦ If you have too many classes, some may contain only a few

data points.

◦ Choose classes so they don’t overlap.

◦ Takes some trial and error

What about this classroom would help us to choose classes

for the age variable?

Do I need 100 classes for age? Why not?

Do I need a 12-13 year old class? A 100 year old class?

In class experiment (Rock Climbing Gym, 22-

23 year olds)

Let’s try some age classes and see how it

works out for us.

Another option is to use this formula.

Let’s put the data on the board and see

what we get.

Human Frequency graph

Classes ofNumber

Value DataSmallest Value DataLargest

Example: Hudson Auto Repair

The manager of Hudson Auto

would like to have a better

understanding of the cost

of parts used in the engine

tune-ups performed in the

shop. She examines 50

customer invoices for tune-ups. The costs of parts,

rounded to the nearest dollar, are listed on the next

slide.

Example: Hudson Auto Repair

Sample of Parts Cost($) for 50 Tune-ups

91 78 93 57 75 52 99 80 97 62

71 69 72 89 66 75 79 75 72 76

104 74 62 68 97 105 77 65 80 109

85 97 88 68 83 68 71 69 67 74

62 82 98 101 79 105 79 69 62 73

•Find the max and min in order to

form class widths

•Let’s do 6 classes.

Frequency Distribution For Hudson Auto Repair, if we choose six classes:

Approximate Class Width = (109 - 52)/6 = 9.5 10

Parts Cost ($) Frequency

Cost ($) Freq. Relative Freq.

Percent Freq

50-59 2 0.04 4 60-69 13 0.26 26 70-79 16 0.32 32 80-89 7 0.14 14 90-99 7 0.14 14 100-109 5 0.1 10 Total 50 1 100

.04(100) 2/50

• Only ____ of the parts costs are in the $50-59 class.

• The greatest percentage (______ or almost one-third)

of the parts costs are in the $70-79 class.

• _____of the parts costs are under $70.

•______of the parts costs are $100 or more.

What are the insights from the distributions?

Would it be a good idea to have a Tune-up sale, Every car

$65

Relative Frequency and

Percent Frequency Distributions

Graphical Methods for Quantitative

Data After tabular methods for categorical data, what

did we do next?

Histogram: A graph that shows frequency

information for quantitative data

Above each class interval, a _______ that

represents the a measure of the class’s ______

No natural separation between

rectangles/classes

◦ This is why “no overlapping classes” is important

Where do I get the frequency Info?

Histogram

2

4

6

8

10

12

14

16

18

______ ______

__

__

______

5059 6069 7079 8089 9099 100-110

Tune-up Parts Cost

?

?

Histograms and Shape Histograms help us see the shape of data

Skew – A measure of the symmetry of

Data

Symmetric : Left tail mirrors the right tail.

Example: Heights and weights of people R

elat

ive

Fre

qu

ency

.05

.10

.15

.20

.25

.30

.35

0

Skewed left

◦ The left tail is ________

◦ Example: Exam Scores

Histograms and Shape

Rel

ativ

e F

req

uen

cy

.05

.10

.15

.20

.25

.30

.35

0

Histogram and Shape

Skewed Right

◦ A long tail to the ________

◦ Example: Executive Salaries

Rel

ativ

e F

req

uen

cy

.05

.10

.15

.20

.25

.30

.35

0

Cumulative Info: Tables and Graphs

Cumulative frequency Distribution: Table

that show the number of items with

values less than or equal to the upper

limit of each class

Cumulative relative frequency distribution:

show the proportions

Cumulative percent frequency

distribution: show the percentages

Parts Cost ($) Frequency & Cum Frequency

Cost ($) Freq. Cost ($) Cum. Freq.

Cum. Rel. Freq

Cum. Percent Freq.

50-59 2 ≤ 59 2 0.04 4

60-69 13 ≤ 69 15 0.3 30

70-79 16 ≤ 79 31 0.62 62

80-89 7 ≤ 89 38 0.76 76

90-99 7 ≤ 99 45 0.9 90

100-109 5 ≤ 109 50 1 100

Total 50

Frequency Cumulative Frequency

2 + 13 15/50 .30(100)

Graphical Representation of

Cumulative Data Ogive: A graph of the cumulative

distribution.

◦ Horizontal Axis: Data values

◦ Vertical Axis

Cumulative frequencies, or

Cumulative Relative Frequencies, or

Cumulative percent frequencies

◦ The cumulative frequencies are plotted as a

point. Put point above highest value in class

◦ Points are connected by lines (connect the

dots)

< 59

< 69

< 79

< 89

< 99

< 109

Cost ($) Cumulative Frequency

Cumulative Relative Frequency

Cumulative Percent Frequency

2

15

31

38

45

50

.04

.30

.62

.76

.90

1.00

4

30

62

76

90

100

Stem-and-Leaf Display

Def: Graph that shows order and shape of

data

Like a histogram on its side, but shows

actual data values

•All digits on left side are

called a STEM

•Stems are listed in

ascending order

•The LEAF for each data

point goes on the right

hand side.

•For each stem, leafs are

arranged in ascending

order.

Example: Hudson Auto Repair

Sample of Parts Cost($) for 50 Tune-ups

91 78 93 57 75 52 99 80 97 62

71 69 72 89 66 75 79 75 72 76

104 74 62 68 97 105 77 65 80 109

85 97 88 68 83 68 71 69 67 74

62 82 98 101 79 105 79 69 62 73

•A Data set we are familiar with –

Hudson Auto.

•Lets make a Stem-and-leaf display

Stem-and-Leaf Display

52 62

57 62

62

62

65

66

67

68

68

68

69

69

69

5

6

7

8

9

10

2 7

2 2 2 2 5 6 7 8 8 8 9 9 9

1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9

0 0 2 3 5 8 9

1 3 7 7 7 8 9

1 4 5 5 9

a stem

a leaf

Stem-and-Leaf Display

A single digit is used to define each leaf

In the preceding example, the leaf unit

was 1

Leaf units may be 100, 10, 1, 0.1, and so

on.

Example: Leaf Unit = 0.1

Using the following data, Create a stem-

and-leaf display using a leaf unit of .1

8

9

10

11

Leaf Unit = 0.1

__ __

1 4

_

0 7

8.6 11.7 9.4 9.1 10.2 11.0 8.8

A stem-and-leaf display of these data will be

Example: Leaf Unit = 10 Using the following data values, create a stem-and-leaf

diagram using a leaf unit of 10.

16

17

18

19

Leaf Unit = 10

_

_ _

0 3

1 7

1806 1717 1974 1791 1682 1910 1838

A stem-and-leaf display of these data will be

The 82 in 1682 is rounded down to 80 and is represented as an 8.

Cross Tabulations and Scatter

Diagrams So far: One Variable at a Time

Often, managers are interested in knowing

the relationship between two variables at

a time

Crosstabulations and scatter diagrams allow

us to summarize two variables at a time

Interesting Examples

◦ Height and Salary Two Quantitative Variables

◦ Beauty and Salary One Cat., One Quant.

◦ Art Quality and City Two categorical

Crosstabulation Finger Lakes Homes – Number sold for

each style and price for last two years

Price Range Colonial Log Split A-Frame Total

< $99,000

> $99,000

18 6 19 12 55

45

30 20 35 15 Total 100

12 14 16 3

Home Style

quantitative variable

categorical variable

Notice that the Classes are in the left and top margins

Crosstabulation – Row Percentages

Row Percentages answer the question: How many homes that

cost more than $99,000 were colonial, log, etc…

Price Range Colonial Log Split A-Frame Total

< $99,000

> $99,000

18 6 19 12 55

45

30 20 35 15 Total 100

12 14 16 3

Home Style

Price

Range Colonial Log Split A-Frame Total

< $99,000

> $99,000

32.73 10.91 34.55 21.82 100

100 26.67 31.11 35.56 6.67

Home Style

(Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100

Crosstabulation – Column Percentages

Column

Percentages

ask the question:

What percent of

Colonial Homes

were $99,000

etc…

Price Range Colonial Log Split A-Frame Total

< $99,000

> $99,000

18 6 19 12 55

45

30 20 35 15 Total 100

12 14 16 3

Home Style

Price

Range Colonial Log Split A-Frame

< $99,000

> $99,000

60.00 30.00 54.29 80.00

40.00 70.00 45.71 20.00

Home Style

100 100 100 100 Total

(Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100

Scatter Diagram

Def: A graphical presentation of the

relationship between two variables

One variable on horizontal axis

One variable on vertical axis

Variables are not grouped into classes

The general pattern of the plotted points gives

insight to the relationship between variables.

◦ Can everyone plot points on an X-Y plane?

A trendline approximates the relationship

Correlation does not equal causation!

A _______ Relationship

x

y

Scatter Diagram and Trendline

A _____ Relationship

x

y

Scatter Diagram and Trendline

No Apparent Relationship

x

y

Scatter Diagram and Trendline

Example: Panthers Football Team

Scatter Diagram and Trendline

A football team is interested

in investigating the relationship, if any,

between interceptions made and points

scored.

1

3

2

1

3

14

24

18

17

30

x = Number of

Interceptions

y = Number of

Points Scored

y

x

Number of Interceptions

Nu

mb

er o

f P

oin

ts S

core

d

5

10

15

20

25

30

0

35

1 2 3 0 4

Scatter Diagram and Trendline

Scatter Diagrams

Insights from Panthers

◦ Positive or negative relationship?

◦ More interceptions, more points?

◦ Perfectly linear relationship?

Tabular and Graphical Procedures: A review

Categorical Data Quantitative Data

Tabular

Methods

Tabular

Methods

Graphical

Methods

Graphical

Methods

• Frequency

Distribution

• Relative Freq.

Distribution

• Percent Freq.

Distribution

• Crosstabulation

• Bar Graph

• Pie Chart

• Frequency Dist.

• Rel. Freq. Dist.

• % Freq. Dist.

• Cum. Freq. Dist.

• Cum. Rel. Freq.

Distribution

• Cum. % Freq.

Distribution

• Crosstabulation

• Dot Plot

• Histogram

• Ogive

• Stem-and-

Leaf Display

• Scatter

Diagram

Data

top related