statistics dealing with uncertainty. objectives describe the difference between a sample and a...

Post on 16-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Statistics

Dealing With Uncertainty

Objectives Describe the difference between a sample

and a population Learn to use descriptive statistics (data

sorting, central tendency, etc.) Learn how to prepare and interpret

histograms State what is meant by normal distribution

and standard normal distribution. Use Z-tables to compute probability.

Statistics

“There are lies, d#$& lies, and then there’s statistics.”

Mark Twain

Statistics is... a standard method for...

- collecting, organizing, summarizing, presenting, and analyzing data - drawing conclusions - making decisions based upon the

analyses of these data. used extensively by engineers (e.g.,

quality control)

Populations and Samples Population - complete set of all of

the possible instances of a particular object e.g., the entire class

Sample - subset of the population e.g., a team

We use samples to draw conclusions about the parent population.

Why use samples? The population may be large

all people on earth, all stars in the sky. The population may be dangerous to

observe automobile wrecks, explosions, etc.

The population may be difficult to measure subatomic particles.

Measurement may destroy sample bolt strength

Team Exercise: Sample Bias

To three significant figures, estimate the average age of the class based upon your team.

When would a team not be a representative sample of the class?

Measures of Central Tendency

If you wish to describe a population (or a sample) with a single number, what do you use?

Mean - the arithmetic average Mode - most likely (most common)

value. Median - “middle” of the data set.

What is the Mean? The mean is the sum of all data

values divided by the number of values.

Sample Mean

Where: is the sample mean xi are the data points n is the sample size

n

iixn

x1

1

x

Population Mean

Where: μ is the population mean

xi are the data points

N is the total number of observations in the population

N

iixN 1

1

What is the Mode? mode - the value that occurs the

most often in discrete data (or data that have been grouped into discrete intervals)

Example, students in this class are most likely to get a grade of B.

Mode continued

Example of a grade distribution with mean C, mode B

0

5

10

15

20

25

F D C B A

What is the Median?

Median - for sorted data, the median is the middle value (for an odd number of points) or the average of the two middle values (for an even number of points). useful to characterize data sets

with a few extreme values that would distort the mean (e.g., house price,family incomes).

What Is the Range? Range - the difference between

the lowest and highest values in the set. Example, driving time to Houston is 2

hours +/- 15 minutes. Therefore... Minimum = 105 min Maximum = 135 minutes Range = 30 minutes

Standard Deviation

Gives a unique and unbiased estimate of the scatter in the data.

Standard Deviation Population

Sample

2

1

)(1

N

iix

N

2

1

)()1(

1xx

ns

n

ii

Deviation

Variance = 2

Variance = s2

The Subtle Difference Between and σ

N versus n-1n-1 is needed to get a better

estimate of the population from the sample s.

Note: for large n, the difference is trivial.

A Valuable Tool Gauss invented standard deviation circa 1700 to explain the error observed in measured star positions.

Today it is used in everything from quality control to measuring financial risk.

Team Exercise In your team’s bag of M&M

candies, count the number of candies for each color the total number of candies in the bag

When you are done counting, have a representative from your team enter your data on the board

Using Excel, enter the data gathered by the entire classMore

Team Exercise (con’t)

For each color, and the total number of candies, determine the following:

maximum modeminimum medianrange standard deviationmean variance

Individual Exercise: Histograms

Flip a coin EXACTLY ten times. Count the number of heads YOU get.

Report your result to the instructor who will post all the results on the board

Open Excel Using the data from the entire class,

create bar graphs showing the number of classmates who get one head, two heads, three heads, etc.

Data Distributions The “shape” of the data is described

by its frequency histogram. Data that behaves “normally”

exhibit a “bell-shaped” curve, or the “normal” distribution.

Gauss found that star position errors tended to follow a “normal” distribution.

The Normal Distribution The normal distribution is

sometimes called the “Gauss” curve. 22 /

2

1

2

1RF

xe

mean

x

RF

RelativeFrequency

Standard Normal Distribution

Define:

Then

/ xz

2RF

2

2

1z

e

0.0

0.1

0.2

0.3

0.4

0.5

-4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0

Area = 1.00

z

Some handy things to know.

50% of the area lies on each side of the mid-point for any normal curve.

A standard normal distribution (SND) has a total area of 1.00.

“z-Tables” show the area under the standard normal distribution, and can be used to find the area between any two points on the z-axis.

Using Z Tables (Appendix C, p. 624)

Question: Find the area between z= -1.0 and z= 2.0 From table, for z = 1.0, area = 0.3413 By symmetry, for z = -1.0, area = 0.3413 From table, for z= 2.0, area = 0.4772 Total area = 0.3413 + 0.4772 = 0.8185 “Tails” area = 1.0 - 0.8185 = 0.1815

“Quick and Dirty” Estimates of and

(lowest + 4*mode + highest)/6 For a standard normal curve, 99.7%

of the area is contained within ± 3 from the mean.

Define “highest” = Define “lowest” = Therefore, (highest - lowest)/6

Example: Drive time to Houston

Lowest = 1 h Most likely = 2 h Highest = 4 h (including a flat tire,

etc.) = (1+4*2+4)/6 = 2.16 (2 h 12 min) = (4 - 1)/6= 0.5 h

This technique (Delphi) was used to plan the moon flights.

Team Exercise You want to put a scale on your

rubber-band car to relate a given scale setting and an expected distance traveled.

Design an experiment to establish a scale for your car.

More

Team Exercise continued. Some Issues to consider:

Sample size Range of distances Desired accuracy

Review Central tendency

mean mode median

Scatter range variance standard deviation

Normal Distribution

top related