ch-2. describing of data

Post on 21-Jul-2016

62 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Describing of Data

TRANSCRIPT

Chapter 2 Methods for Describing

Sets of Data

Business Statistics

Business Statistics

Our market share far exceeds all competitors!

30%30%

32%32%

34%34%

36%36%

UsYYXX

Business Statistics

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBar

GraphPie

ChartPareto

Diagram

Presenting Qualitative Data

Business Statistics

PieChart

ParetoDiagram

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

Business Statistics

Summary Table1. Lists categories & number of elements in category2. Obtained by tallying responses in category3. May show frequencies (counts), % or both

Row Is Category

Tally:|||| |||||||| ||||

Major CountAccounting 130Economics 20Management 50Total 200

Business Statistics

PieChart

SummaryTable

Data Presentation

QualitativeData

QuantitativeData

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

ParetoDiagram

0

50

100

150

Acct. Econ. Mgmt.

Major

Business Statistics

Vertical Bars for Qualitative Variables

Bar Height Shows Frequency or %

Zero Point

Percent Used Also

Equal Bar Widths

Freq

uenc

y•Bar Graph

Business Statistics

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

Econ.10%

Mgmt.25%

Acct.65%

Business Statistics

Pie Chart1. Shows breakdown of

total quantity into categories

2. Useful for showing relative differences

3. Angle size• (360°)(percent)

Majors

(360°) (10%) = 36°

36°

Business Statistics

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

Business StatisticsPareto DiagramLike a bar graph, but with the categories arranged by height in descending order from left to right.

0

50

100

150

Acct. Mgmt. Econ.

Major Vertical Bars for Qualitative Variables

Bar Height Shows Frequency or %

Zero Point

Percent Used Also

Equal Bar Widths

Freq

uenc

y

Business StatisticsThinking ChallengeYou’re an analyst for IRI. You want to show the market shares held by Web browsers in 2006. Construct a bar graph, pie chart, & Pareto diagram to describe the data.

Browser Mkt. Share (%)Firefox 14Internet Explorer 81Safari 4Others 1

0%

20%

40%

60%

80%

100%

Firefox InternetExplorer

Safari Others

Business Statistics

Mar

ket S

hare

(%)

Browser

•Bar Graph Solution

Business Statistics

Market Share

Safari, 4%

Firefox, 14%

Internet Explorer,

81%

Others, 1%

•Pie Chart Solution

Business Statistics

0%

20%

40%

60%

80%

100%

InternetExplorer

Firefox Safari Others

Mar

ket S

hare

(%)

Browser

•Pareto Diagram Solution

Presenting Quantitative Data

Business StatisticsData

Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

Business Statistics

Stem-and-Leaf Display

1. Divide each observation into stem value and leaf value

• Stem value defines class

• Leaf value defines frequency (count)

2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41

262 144677

3 028

4 1

Business StatisticsData

Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

Business Statistics

Frequency Distribution Table Steps1. Determine range

2. Select number of classes Usually between 5 & 15 inclusive

3. Compute class intervals (width)

4. Determine class boundaries (limits)

5. Compute class midpoints

6. Count observations & assign to classes

Business Statistics Determine the range Range (R) = highest value – lowest value Number of classes C=1 + 10/3 x log N ( N = number of

observation) Class Interval CI = R/C (rounded) Class Limits/Boundaries Lowest Limits value <= lowest value Highest Limits value >= Highest Value Class Mid Point CM = (Lower + Upper Limits) / 2

Business StatisticsData

Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

012345

Business Statistics

Frequency

Relative Frequency

Percent

0 15.5 25.5 35.5 45.5 55.5

Lower Boundary

Bars Touch

Class Freq.15.5 – 25.5 325.5 – 35.5 535.5 – 45.5 2

Count

•Histogram

Business Statistics

Raw Data:

24, 26, 24, 21, 27 27 30, 41, 32, 38

20 18 42 25 57 26 35 29 34 40

33 21 56 45 51 23 36 54 20 19

Make Distribution Frequency Table !

Business Statistics

Relative Frequency Distribution

Class

18 – 23

2

24 – 29

1 42 – 47

3

Frequency %

30 – 35 36 – 41

54 – 59 48 – 53

4

587

10 3 713172723

Numerical Data Properties

Business StatisticsStandar Notation

Measure Sample Population

Mean X

StandardDeviation S

Variance S 2 2

Size n N

Business Statistics

Central Tendency (Location)

Variation (Dispersion)

Shape

Numerical Data Properties

Business StatisticsNumerical Data

Properties

Mean

Median

Mode

CentralTendency

Range

Variance

Standard Deviation

Variation

Percentiles

RelativeStanding

Interquartile Range Z–scores

Central Tendency

Business Statistics

MeanMeanMedian

Mode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Business StatisticsMean1. Measure of central tendency2. Most common measure3. Acts as ‘balance point’4. Affected by extreme values (‘outliers’)5. Formula (sample mean)

X

X

n

X X X

n

ii

n

n

1 1 2 …

Business StatisticsMean ExampleRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7

XX

nX X X X X Xi

i

n

1 1 2 3 4 5 6

6

10 3 4 9 8 9 117 6 3 7 76

8 30

. . . . . .

.

Business Statistics

Mean

MedianMedianMode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Business StatisticsMedian1. Measure of central tendency2. Middle value in ordered sequence

If n is odd, middle value of sequence If n is even, average of 2 middle values

3. Position of median in sequence

4. Not affected by extreme values

Positioning Point n 1

2

Business StatisticsMedian Example (Odd-sized sample)Raw Data: 24.1 22.6 21.5 23.7 22.6Ordered: 21.5 22.6 22.6 23.7 24.1Position: 1 2 3 4 5

Positioning Point

Median

n 12

5 12

3 0

22 6

.

.

Business StatisticsMedian Example (Even-sized Sample)Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7Ordered: 4.9 6.3 7.7 8.9 10.3 11.7Position: 1 2 3 4 5 6

Positioning Point

Median

n 12

6 12

3 5

7 7 8 92

8 30

.

. . .

Business Statistics

Mean

Median

ModeMode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Business Statistics

Mode

1. Measure of central tendency2. Value that occurs most often3. Not affected by extreme values4. May be no mode or several modes5. May be used for quantitative or qualitative

data

Business Statistics

Mode Example

No ModeRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7

One ModeRaw Data: 6.3 4.9 8.9 6.3 4.9 4.9

More Than 1 ModeRaw Data: 21 28 28 41 43 43

Business StatisticsThinking Challenge

You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.Describe the stock pricesin terms of central tendency.

Business StatisticsMean

XX

nX X Xi

i

n

1 1 2 8

8

17 16 21 18 13 16 12 118

15 5

.

Business Statistics

MedianRaw Data: 17 16 21 18 13 16 12 11Ordered: 11 12 13 16 16 17 18 21Position: 1 2 3 4 5 6 7 8

Positioning Point

Median

n 12

8 12

4 5

16 1622

16

.

Business Statistics

Mode

Raw Data: 17 16 21 18 13 16 1211

Mode = 16

Business Statistics

Summary of Central Tendency Measures Measure Formula DescriptionMean X i / n Balance Point

Median(n +1)

Position 2 Middle Value When Ordered

Mode none Most Frequent

Variation

Business Statistics

Mean

Median

Mode

RangeRange

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Business Statistics

Range1. Measure of dispersion2. Difference between largest & smallest observations

Range = Xlargest – Xsmallest

3. Ignores how data are distributed

77 88 99 1010 77 88 99 1010Range = 10 – 7 = 3 Range = 10 – 7 = 3

Business Statistics

Mean

Median

Mode

Range

Interquartile Range

VarianceVarianceStandard DeviationStandard Deviation

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Business Statistics

Variance & Standard Deviation1. Measures of dispersion2. Most common measures3. Consider how data are distributed

4 6 10 12

X = 8.3

4. Show variation about mean (X or μ)

8

Business Statistics

n - 1 in denominator! (Use N if Population Variance)

Sampel Variance Formula

X X X X X Xn

n1

2

2

2 2

1

( ) ( ) ( )…=

SX X

n

ii

n

2

2

1

1

( )

Business StatisticsStandar Deviation Formula

S S

X X

n

X X X X X Xn

ii

n

n

2

2

1

12

22 2

1

1

( )

( ) ( ) ( )…

Business Statistics

Variance ExampleRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7

SX X

nX

X

n

S

ii

n

ii

n

2

2

1 1

2

2 2 2

18 3

10 3 8 3 4 9 8 3 7 7 8 36 1

6 368

( )

( ) ( ) ( )where .

. . . . . .

.

Business Statistics

Thinking ChallengeYou’re a financial analyst

for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.

What are the variance and standard deviation of the stock prices?

Business Statistics

Variation SolutionRaw Data: 17 16 21 18 13 16 12

11

SX X

nX

X

n

S

ii

n

ii

n

2

2

1 1

2

2 2 21

15 5

17 15 5 16 15 5 11 15 58 1

1114

( )

( ) ( ) ( )where .

. . .

.

Business Statistics

Sample Standard Deviation

S SX X

n

ii

n

2

2

1

11114 3 34

( ). .

Business Statistics Summary of Variation Measures

Measure Formula DescriptionRange X largest – X smallest Total SpreadStandard Deviation(Sample)

X Xn

i

2

1

Dispersion aboutSample Mean

Standard Deviation(Population)

X

Ni X

2 Dispersion aboutPopulation Mean

Variance(Sample)

(X i X )2

n – 1Squared Dispersionabout Sample Mean

Interpreting Standard Deviation

Business StatisticsIntrepreting Standard Deviation : Chebyshev’s Theorem (Applies to any shape data set)

• No useful information about the fraction of data in the interval x – s to x + s

• At least 3/4 of the data lies in the interval x 2s to x + 2s

• At least 8/9 of the data lies in the interval x – 3s to x + 3s

• In general, for k > 1, at least 1 – 1/k2 of the data lies in the interval x – ks to x + ks

Business StatisticsInterpreting Standard Deviation: Chebyshev’s Theorem

sx 3 sx 3sx 2 sx 2sx xsx

No useful information

At least 3/4 of the data

At least 8/9 of the data

Business StatisticsChebyshev’s Theorem ExamplePreviously we found the mean

closing stock price of new stock issues is 15.5 and the standard deviation is 3.34.

Use this information to form an interval that will contain at least 75% of the closing stock prices of new stock issues.

Business Statistics

At least 75% of the closing stock prices of new stock issues will lie within 2 standard deviations of the mean.

x = 15.5 s = 3.34

(x – 2s, x + 2s) = (15.5 – 2∙3.34, 15.5 + 2∙3.34)

= (8.82, 22.18)

Business StatisticsInterpreting Standard Deviation : Empirical Rule Applies to data sets that are mound shaped and

symmetric Approximately 68% of the measurements lie in the

interval μ – σ to μ + σ Approximately 95% of the measurements lie in the

interval μ – 2σ to μ + 2σ Approximately 99.7% of the measurements lie in the

interval μ – 3σ to μ + 3σ

Interpreting Standard Deviation: Empirical Rule

μ – 3σ μ – 2σ μ – σ μ μ + σ μ +2σ μ + 3σ

Approximately 68% of the measurements

Approximately 95% of the measurements

Approximately 99.7% of the measurements

Empirical Rule ExamplePreviously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34. If we can assume the data is symmetric and mound shaped, calculate the percentage of the data that lie within the intervals x + s, x + 2s, x + 3s.

Empirical Rule Example

• Approximately 95% of the data will lie in the interval (x – 2s, x + 2s), (15.5 – 2∙3.34, 15.5 + 2∙3.34) = (8.82, 22.18)

• Approximately 99.7% of the data will lie in the interval (x – 3s, x + 3s), (15.5 – 3∙3.34, 15.5 + 3∙3.34) = (5.48, 25.52)

• According to the Empirical Rule, approximately 68% of the data will lie in the interval (x – s, x + s),

(15.5 – 3.34, 15.5 + 3.34) = (12.16, 18.84)

Numerical Measures of Relative Standing

Numerical DataProperties & Measures

Mean

Median

Mode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

PercentilesPercentiles

RelativeStanding

Z–scores

Numerical Measures of Relative Standing: Percentiles

Describes the relative location of a measurement compared to the rest of the data

The pth percentile is a number such that p% of the data falls below it and (100 – p)% falls above it

Median = 50th percentile

Percentile ExampleYou scored 560 on the GMAT exam. This score puts

you in the 58th percentile. What percentage of test takers scored lower than you

did?What percentage of test takers scored higher than you

did?

Percentile ExampleWhat percentage of test takers scored lower than you

did?58% of test takers scored lower than 560.

What percentage of test takers scored higher than you did?

(100 – 58)% = 42% of test takers scored higher than 560.

Numerical DataProperties & Measures

Mean

Median

Mode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scoresZ–scores

Numerical Measures of Relative Standing: Z–Scores

Describes the relative location of a measurement compared to the rest of the data

• Sample z–scorex – x

sz =

Population z–scorex – μσz =

• Measures the number of standard deviations away from the mean a data value is located

Z–Score ExampleThe mean time to assemble a

product is 22.5 minutes with a standard deviation of 2.5 minutes.

Find the z–score for an item that took 20 minutes to assemble.

Find the z–score for an item that took 27.5 minutes to assemble.

Z–Score Examplex = 20, μ = 22.5 σ = 2.5

x – μ 20 – 22.5σz = = 2.5 = –1.0

x = 27.5, μ = 22.5 σ = 2.5x – μ 27.5 – 22.5

σz = = 2.5 = 2.0

Quartiles & Box Plots

Quartiles1. Measure of noncentral tendency

25%25% 25%25% 25%25% 25%25%

QQ11 QQ22 QQ33

2. Split ordered data into 4 quarters

Positioning Point of Q i ni

14

( )3. Position of i-th quartile

Quartile (Q1) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7Ordered: 4.9 6.3 7.7 8.9 10.3

11.7Position: 1 2 3 4 5 6

Q Position

Q

1

1 14

1 6 14

175 2

6 31

n( ) ( ) .

.

Quartile (Q2) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7Ordered: 4.9 6.3 7.7 8.9 10.3

11.7Position: 1 2 3 4 5 6

Q Position

Q

2

2 14

2 6 14

3 5

7 7 8 92

8 32

n( ) ( ) .

. . .

Quartile (Q3) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7Ordered: 4.9 6.3 7.7 8.9 10.3

11.7Position: 1 2 3 4 5 6

Q Position

Q

3

3 14

3 6 14

5 25 5

10 33

n( ) ( ) .

.

Numerical DataProperties & Measures

Mean

Median

Mode

Range

Interquartile RangeInterquartile RangeVariance

Standard Deviation

Skew

Numerical DataProperties

CentralTendency Variation Shape

Interquartile Range1. Measure of dispersion

2. Also called midspread

3. Difference between third & first quartiles Interquartile Range = Q3 – Q1

4. Spread in middle 50%

5. Not affected by extreme values

Thinking ChallengeYou’re a financial analyst for

Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.

What are the quartiles, Q1 and Q3, and the interquartile

range?

Q1

Raw Data: 17 16 21 18 13 16 1211

Ordered: 11 12 13 16 16 17 1821

Position: 1 2 3 4 5 6 7 8

Quartile Solution*

Q Position

Q

1

1 14

1 8 14

3

131

n( ) ( )

Quartile Solution*Q3

Raw Data: 17 16 21 18 13 16 1211

Ordered: 11 12 13 16 16 17 1821

Position: 1 2 3 4 5 6 7 8Q Position

Q

3

3 14

3 8 14

6 75 7

183

n( ) ( ).

Interquartile Range Solution*

Interquartile RangeRaw Data: 17 16 21 18 13 16 12

11Ordered: 11 12 13 16 16 17 18

21Position: 1 2 3 4 5 6 7 8Interquartile Range Q Q3 1 18 0 13.0 5.

Box Plot1. Graphical display of data using 5-number summary

Median

44 66 88 1010 1212

Q3Q1 XlargestXsmallest

Shape & Box Plot

Right-SkewedLeft-Skewed Symmetric

QQ11 MedianMedian QQ33QQ11 MedianMedian QQ33 QQ11 MedianMedian QQ33

Graphing Bivariate Relationships

Graphing Bivariate Relationships

Describes a relationship between two quantitative variables

Plot the data in a Scattergram

Positive relationship

Negative relationship

No relationship

x xx

yy y

Scattergram ExampleYou’re a marketing analyst for Hasbro Toys.

You gather the following data:Ad $ (x) Sales (Units) (y)

1 12 13 24 25 4

Draw a scattergram of the data

Scattergram Example

01234

0 1 2 3 4 5

Sales

Advertising

Time Series Plot

Time Series PlotUsed to graphically display data produced over timeShows trends and changes in the data over timeTime recorded on the horizontal axisMeasurements recorded on the vertical axisPoints connected by straight lines

Time Series Plot ExampleThe following data shows

the average retail price of regular gasoline in New York City for 8 weeks in 2006.

Draw a time series plot for this data.

DateAverage

PriceOct 16, 2006 $2.219Oct 23, 2006 $2.173Oct 30, 2006 $2.177Nov 6, 2006 $2.158Nov 13, 2006 $2.185Nov 20, 2006 $2.208Nov 27, 2006 $2.236Dec 4, 2006 $2.298

Time Series Plot Example

2.05

2.1

2.15

2.2

2.25

2.3

2.35

10/16 10/23 10/30 11/6 11/13 11/20 11/27 12/4

Date

Price

Distorting the Truth with Descriptive Techniques

Errors in Presenting Data1. Using ‘chart junk’

2. No relative basis in comparing data batches

3. Compressing the vertical axis

4. No zero point on the vertical axis

‘Chart Junk’

Bad PresentationBad Presentation Good PresentationGood Presentation

1960: $1.00

1970: $1.60

1980: $3.10

1990: $3.80

Minimum Wage Minimum Wage

0

2

4

1960 1970 1980 1990

$

No Relative Basis

Good PresentationGood Presentation

A’s by Class A’s by Class

Bad PresentationBad Presentation

0

100

200

300

FR SO JR SR

Freq.

0%

10%

20%

30%

FR SO JR SR

%

Compressing Vertical Axis

Good PresentationGood Presentation

Quarterly Sales Quarterly Sales

Bad PresentationBad Presentation

0

25

50

Q1 Q2 Q3 Q4

$

0

100

200

Q1 Q2 Q3 Q4

$

No Zero Point on Vertical Axis

Good PresentationGood Presentation

Monthly Sales Monthly Sales

Bad PresentationBad Presentation

0204060

J M M J S N

$

36394245

J M M J S N

$

top related