Transcript
Page 1: Statistics  100 Lecture Set  6

Statistics 100Lecture Set 6

Page 2: Statistics  100 Lecture Set  6

Re-cap

• Last day, looked at a variety of plots

• For categorical variables, most useful plots were bar charts and pie charts

• Looked at time plots for quantitative variables

• Key thing is to be able to quickly make a point using graphical techniques

Page 3: Statistics  100 Lecture Set  6

Re-cap

• Recall:

– A distribution of a variable tells us what values it takes on and how often it takes these values.

Page 4: Statistics  100 Lecture Set  6

Histograms

• Similar to a bar chart, would like to display main features of an empirical distribution (or data set)

• Histogram– Essentially a bar chart of values of data – Usually grouped to reduce “jitteriness” of picture– Groups are sometimes called “bins”

Page 5: Statistics  100 Lecture Set  6

Histogram

• Uses rectangles to show number (or percentage) of values in intervals

• Y-axis usually displays counts or percentages• X-Axis usually shows intervals• Rectangles are all the same width

6 8 10 12 14 16

02

46

8

Data

Cou

nts

Page 6: Statistics  100 Lecture Set  6

Example (discrete data)

• In a study of productivity, a large number of authors were classified according to the number of articles they published during a particular period of time.

Number of articles

Frequency

1 784 2 204 3 127 4 50 5 33 6 28 7 19 8 19 9 6

10 7 11 6 12 7 13 4 14 4 15 5 16 3 17 3

Page 7: Statistics  100 Lecture Set  6

Example (discrete data)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

020

040

060

080

0

Number of Articles

Cou

nt

Page 8: Statistics  100 Lecture Set  6

Example (continuous data)

• Experiment was conducted to investigate the muzzle velocity of a anti-personnel weapon (King, 1992)

• Sample of size 16 was taken and the muzzle velocity (MPH) recorded

279.9 294.1 260.7 277.9

285.5 298.6 265.1 305.3

264.9 262.9 256.0 255.3

276.3 278.2 283.6 250.0

Page 9: Statistics  100 Lecture Set  6

Constructing a Histogram – continuous data

• Find minimum and maximum values of the data

• Divide range of data into non-overlapping intervals of equal length

• Count number of observations in each interval

• Height of rectangle is number (or percentage) of observations falling in the interval

• How many categories?

Page 10: Statistics  100 Lecture Set  6

Example

• Experiment was conducted to investigate the muzzle velocity of a anti-personnel weapon (King, 1992)

• Sample of size 16 was taken and the muzzle velocity (MPH) recorded

279.9 294.1 260.7 277.9

285.5 298.6 265.1 305.3

264.9 262.9 256.0 255.3

276.3 278.2 283.6 250.0

Page 11: Statistics  100 Lecture Set  6

• What are the minimum and maximum values?

• How do we divide up the range of data?

• What happens if have too many intervals?

• Too Few intervals?

• Suppose have intervals from 240-250 and 250-260. In which interval is the data point 250 included?

Page 12: Statistics  100 Lecture Set  6

Intervals Frequency[240,250)[250,260)[260,270)[270,280)[280,290)[290,300)[300,310)

Page 13: Statistics  100 Lecture Set  6

Histogram of Muzzle Velocity

240 260 280 300 320

01

23

4

Muzzle Velocity (MPH)

Freq

uenc

y

Page 14: Statistics  100 Lecture Set  6

Interpreting histograms

• Gives an idea of:– Location of centre of the distribution

– How spread are the data

– Shape of the distribution• Symmetric• Skewed left• Skewed right

• Unimodal• Bimodal• Multimodal

– Outliers• Striking deviations from the overall pattern

Page 15: Statistics  100 Lecture Set  6

Example – mid-term 1 grades (2011)

• Was out of 34 + a bonus question (n=344)

Page 16: Statistics  100 Lecture Set  6

Example – mid-term 1 grades (2011)

• Too many bins?

Page 17: Statistics  100 Lecture Set  6

Example – mid-term 1 grades (2011)

• Too few bins

Page 18: Statistics  100 Lecture Set  6

Example – mid-term 1 grades (2011)

• Potential outlier?

Page 19: Statistics  100 Lecture Set  6

60 70 80 90 100

0

2

4

6

8

10

12

14

Exam 1 Grades

Count

Exam Grades from a Stats Class

Page 20: Statistics  100 Lecture Set  6

60 70 80 90 100

0

2

4

6

8

10

Exam 1 Grades

Count

Exam Grades for Undergraduates

Example

Page 21: Statistics  100 Lecture Set  6

60 70 80 90 100

0

2

4

6

8

Exam 1 Grades

Count

Exam Grades for Graduate Students

Example

Page 22: Statistics  100 Lecture Set  6

Numerical Summaries (Chapter 12)

• Graphic procedures visually describe data

• Numerical summaries can quickly capture main features

Page 23: Statistics  100 Lecture Set  6

Measures of Center

• Have sample of size n from some population,

• An important feature of a sample is its central value

• Most common measures of center - Mean & Median

nxxx ,...,,

21

Page 24: Statistics  100 Lecture Set  6

Sample Mean

• The sample mean is the average of a set of measurements

• The sample mean:

nix

x

n

i

1

Page 25: Statistics  100 Lecture Set  6

Sample Median

• Have a set of n measurements,

• Median (M) is point in the data that divides the data in half

• Viewed as the mid-point of the data

• To compute the median: 1. For sample size “n”, compute position = (n+1)/2

1. If position is a whole number, then M is the value at this position of the sorted data

2. If position falls between two numbers, then M is the value halfway between those two positions in the sorted data

nxxx ,...,,

21

Page 26: Statistics  100 Lecture Set  6

Example

• Finding the Median, M, when n is odd

• Example: Data = 7, 19, 4, 23, 18

Page 27: Statistics  100 Lecture Set  6

Muzzle Velocity Example

• Data (n=16)

279.9 294.1 260.7 277.9

285.5 298.6 265.1 305.3

264.9 262.9 256.0 255.3

276.3 278.2 283.6 250.0

Page 28: Statistics  100 Lecture Set  6

Muzzle Velocity Example

• Mean:

x = 279.9 + 294.1+ ...+ 250.016

= 274.6

Page 29: Statistics  100 Lecture Set  6

Muzzle Velocity Example

• Median:

Page 30: Statistics  100 Lecture Set  6

Sample Mean vs. Sample Median

• Sometimes sample median is better measure of center

• Sample median less sensitive to unusually large or small values called

• For symmetric distributions the relative location of the sample mean and median is

• For skewed distributions the relative locations are

Page 31: Statistics  100 Lecture Set  6

Other Measures of interest

• Maximum

• Minimum

Page 32: Statistics  100 Lecture Set  6

• Percentiles– A percentile of a distribution is a value that cuts off the

stated part of the distribution at or below that value, with the rest at or above that value.• 5th percentile: 5% of distribution is at or below this value and

95% is at or above this value.• 25th percentile: 25% at or below, 75% at or above• 50th percentile: 50% at or below, 50% at or above• 75th percentile: 75% at or below, 25% at or above• 90th percentile: 90% at or below, 10% at or above• 99th percentile: ___% at or below, ___% at or above

Page 33: Statistics  100 Lecture Set  6

• Percentiles– Can be applied to a population or to a sample

• Usually don’t know population– Use sample percentiles to estimate pop. percentiles

– Standardized tests often measured in percentiles– Birth statistics often measured in percentiles

• First daughter – 10th percentile weight– 25th percentile length– 95th percentile head circumference

Page 34: Statistics  100 Lecture Set  6

Important Percentiles

• First Quartile

• Second Quartile

• Third Quartile

Page 35: Statistics  100 Lecture Set  6

Computing the quartiles

• You know how to compute the median

• Q1 =

• Q3 =

Page 36: Statistics  100 Lecture Set  6

Example

• Finding the other quartiles1. For Q1, find the median of all values below M. 2. For Q3, find the median of all values above M.

• Example: 4, 7, 18, 19, 23, M=18– Q1:– Q3:

• Example: 4, 7, 12, 18, 19, 23, M=15– Q1:– Q3:

Page 37: Statistics  100 Lecture Set  6

• 5 number summary often reported:

– Min, Q1, Q2 (Median), Q3, and Max

– Summarizes both center and spread

– What proportion of data lie between Q1 and Q3?

Page 38: Statistics  100 Lecture Set  6

Box-Plot

• Displays 5-number summary graphically

• Box drawn spanning quartiles

• Line drawn in box for median

• Lines extend from box to max. and min values.

• Some programs draw whiskers only to 1.5*IQR above and below the quartiles

6065

7075

8085

Undergrads

Exa

m S

core

s

Page 39: Statistics  100 Lecture Set  6

• Can compare distributions using side-by-side box-plots

• What can you see from the plot?

Page 40: Statistics  100 Lecture Set  6

Example - Moisture Uptake

• There is a need to understand degradation of 3013 containers during long term storage

• Moisture uptake is considered a key factor in degradation due to corrosion

• Calcination removes moisture

• Calcination temperature requirements were written with very pure materials in mind, but the situation has evolved to include less pure materials, e.g. high in salts (Cl salts of particular concern)

• Calcination temperature may need to be reduced to accommodate salts.

• An experiment is to be conducted to see how the calcination temperature impacts the mean moisture uptake

Page 41: Statistics  100 Lecture Set  6

Working Example - Moisture Uptake

• Experiment Procedure: – Two calcination temperatures…wish to compare the mean uptake for each

temperature– Have 10 measurements per temperature treatment– The temperature treatments are randomly assigned to canisters

• Response: Rate of change in moisture uptake in a 48 hour period (maximum time to complete packaging)

Page 42: Statistics  100 Lecture Set  6

Box-Plot

Page 43: Statistics  100 Lecture Set  6

Other Common Measure of Spread: Sample Variance

• Sample variance of n observations:

• Units are in squared units of data

1

)(1

2

2

n

xxs

n

ii

Page 44: Statistics  100 Lecture Set  6

Sample Standard Deviation

• Sample standard deviation of n observations:

• Has same units as data

1

)(1

2

n

xxs

n

ii

Page 45: Statistics  100 Lecture Set  6

Exercise

• Compute the sample standard deviation and variance for the Muzzle Velocity Example

Page 46: Statistics  100 Lecture Set  6

Comments

• Variance and standard deviation are most useful when measure of center is

• As observations become more spread out, s : increases or decreases?

• Both measures sensitive to outliers

• 5 number summary is better than the mean and standard deviation for describing (i) skewed distributions; (ii) distributions with outliers

Page 47: Statistics  100 Lecture Set  6

Comments

• Standard deviation is zero when

• Measures spread relative to

Page 48: Statistics  100 Lecture Set  6

• Empirical Rule– If a distribution is bell-shaped and roughly symmetric, then

• About 2/3 of data will lie within ±1s of • About 95% of data will lie within ±2s of • Usually all data will lie within ±3s of

– So you can reconstruct a rough picture of the histogram from just two numbers

More interpretation of s

x

x

x

Page 49: Statistics  100 Lecture Set  6

Example

• Mid-term:

Page 50: Statistics  100 Lecture Set  6

Example

• Mid-term:

• Empirical rule tells us


Top Related