chapter 1 1111----1111 overview overview what is ...an73773/slidesclass1f09.pdf · what is...

111111

Slide 1Chapter 1Chapter 1Chapter 1Chapter 1

Introduction to StatisticsIntroduction to StatisticsIntroduction to StatisticsIntroduction to Statistics

Click on the bars to advance to that specific part of the lesson1-1 Review and Preview

1-2 Statistical Thinking1-3 Types of Data1-4 Critical Thinking1-5 Collecting Sample Data

Slide 21111----1111 OverviewOverviewOverviewOverview

What is Statistics about? What is Statistics about? What is Statistics about? What is Statistics about? In a Nutshell:

Slide 3

The Three Major Parts of Statistics

Producing Data

Exploratory Data Analysis

Inference

Slide 4

Producing Data (details are coming later)� In statistics we need data to work with.

� Data can come from many sources: measurements, surveys, experiments, observational studies, etc.

� Weaknesses in data production account for most erroneous conclusions in statistical studies, therefore the production of good data requires careful planning.

Slide 5Exploratory Data Analysis (details are coming later)

� Once we have our data, we want to know what might be “in there”. Data have a story to tell, and our goal is to uncover that story.

� The statistical process called Exploratory Data Analysis (EDA) employs a variety of techniques to maximize insight into a data set. It includes some graphical and non-graphical techniques to analyze data.

Slide 6

Inference (details are coming later)

� When you taste a spoonful of your coffee and conclude that it needs more sugar, that's an inference.

� If there's a lot of sugar sitting on the bottom because you were too sleepy to stir it, coffee from the surface won't be representative, and you'll end up with an incorrect inference (and a coffee with too much sugar). But if you stir your coffee thoroughly before you taste, your spoonful of data can tell you about the whole cup of coffee.

222222

Slide 7

Copyright © 2004 Pearson Education, Inc.

� Data: observations (such as measurements, genders, survey responses)

that have been collected.

� Population: the complete collection of all elements (scores, people,

measurements, and so on) to be studied.

� Census: the collection of data from every member of the population.

� Sample: a sub-collection of elements drawn from a population.

Slide 8Population Examples

� All runners in the 2009 L.A. Marathon� All kindergarten kids at a school district� All 16 oz. bottled water manufactured by Evian� All Milano cookies made by Pepperidge Farm� All rats in the biology lab at CSUN� All ships arriving to the Long Beach port at a particular day� All purebred German Shepherd dogs in Los Angeles county� All tires made by Good Year� All lattes made at the Starbucks closest to your home in a month� All rainy days in Los Angeles

Slide 9

OK, we have a population. Then what?We want to learn something about the population.

Let’s say our population of interest is ALL runners in the 2009 L.A. Marathon.

We might be interested in

� the average age of ALL runners

� the proportion of ALL runners who completed the marathon under three hours

� the percent of female runners who completed the marathon

� the proportion of runners who are over the age of 50

� the mean time of ALL runners who completed the marathon

Slide 10

Or…Let’s say our population of interest is ALL kindergarten kids at a school district.

We might be interested in

� the average vocabulary score of ALL kindergarten kids at the school

� the proportion of ALL kindergarten kids who live in a single-parent home

� the mean height of ALL kindergarten kids at the school

� the percent of ALL kindergarten kids who need speech therapy

Slide 11

Parameter of interest� A parameter is a NUMBER describing some characteristic of a population. As the goal of inference, we wish to estimate this number, or test a hypothesized value of it.

� In this course we only consider two parameters of interest� Mean� Proportion

� Notation:� Population mean: µ� Population proportion: p

BOTH are PARAMETERS!

Slide 12

Example

The director of Personnel for a large firm has been assigned the task of developing a profile of the company’s 3500 managers.

� A couple of characteristics of interest are:� Average salary of ALL managers, µ

� The proportion who have completed the management training program, p

333333

Slide 13

Our Goal in Inference

If ALL the populations, whatever we are interested in, would be manageable in size, we would just figure out the population parameter. Then there would be no need for inference.

Slide 14

But...

� The population we are interested in is usually too big.

� Inference teaches you what to do in this case.

� Inference is mainly concerned with the rules or logic of how the results of a relatively small sample from a large population could be used to make inferences about the population.

� Let’s get back to our keywords.

Slide 15

Sample

� When the population is too big (ex.: all adults in the U.S) to find our parameter of interest, we have to take a sample form the population. Then we can use the sample result to make conclusions about the population parameter. This process is called INFERENCE.

Slide 16

Important note about the sample


The sample must be randomlyselected from the population. If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.

Slide 17

Sample or not sample?� Let’s say our population of interest is ALL the members of the U.S. Senate, and our parameters of interest are� the proportion of female Senates, p� the average age of ALL members, µ

� Since there are only 100 Senate members, the population is relatively small, so we don’t need to take a sample to estimate our parameters of interest. We can just look at them and see what percent of them are females, and we can find the average age of ALL members.

Slide 18

Sample or not sample?� A health advocacy agency suspects that the mean level of acetaminophen (an active ingredient in pain relievers, and cold medications) manufactured by a certain company is not the advertised value.

It is impossible to measure the amount of acetaminophen in ALL the pills made by this company, and therefore the mean acetaminophen level in ALL pills remains unknown. The agency will need to take a sample of pills, and measure the amount of acetaminophen in those pills.

444444

Slide 19

Sample statistic� A statistic is a NUMBER describing some characteristic of a sample.

� Notation:� Sample mean:

� Sample proportion: BOTH are STATISTICS!

inference

x

$p

Slide 20

Example

In random sample of 200 people from the U.S. 12 people had blood type 0. That’s 6% of the sample. This number was a little surprising because we know that about 4% of all people in the U.S. has blood type 0.

� What is the population of interest?� All people in the U.S.

� What is the parameter of interest?� The proportion (percent) of ALL people in the U.S. with blood type 0, which is 4%. With notation: p = 4%

� What is the sample?� The 200 randomly selected people

� What is the statistic?� The proportion (percent) of the 200 people who had blood type 0, which is 6%. With notation: $p = 6%

Slide 21

Section 1-3 Types of Data

Slide 22Population/Sample/Parameter/Statistic

The Basic Idea of InferenceThe Basic Idea of InferenceThe Basic Idea of InferenceThe Basic Idea of Inference

Parameter

Population Sample

Statistic

Data Production

Inference

a numerical measurement describing some characteristic of a

population

a numerical measurement describing some

characteristic of a sample

Slide 23Types of Data

Data

Categorical Quantitative

Discrete Continuous

Slide 24

Types of Data� Once we have our random sample, we want to collect data from them.

� Some data sets consists of numbers:� Age in years� Height in inches� Weight in pounds� Distance traveled in miles

� Some data sets consists of non-numerical answers:� Eye color� Gender� Yes/no answers� Course grades

555555

Slide 25Data

QuantitativeQuantitativeQuantitativeQuantitative CategoricalCategoricalCategoricalCategorical

� Some of these can be measured

� The average makes sense� The average distance, or average height makes sense

� You can sort the data into a “boxes”

� The average doesn’t make sense� The average eye-color, or average gender doesn’t make sense

Male Female

Slide 26

Examples: Categorical data� Gender

� Yes/no questions

� Color

� Satisfaction level

� Grade received at the end of the semester

� Type of car (small, midsize, full size)

� Zipcode

� Ethnicity

Slide 27

Examples: Quantitative data� Quantitative data:

� Weight� Height� Distance� Time� pH level� Amount of money� GPA� Amount of chemical ingredient� Age� Pulse rate

Slide 28Quantitative DataDiscrete Continuous

When the number of possible values is either a finite number or a ‘countable’ number of possible values.

0, 1, 2, 3, . . .

Example: The number of eggs that hens lay.

When data result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps.Example: The amount of milk that a cow produces; e.g. 2.34 gallons per day.

Slide 29


Levels of MeasurementLevels of MeasurementLevels of MeasurementLevels of Measurement

Another way to classify data is to use levels of measurement. Four of these levels are discussed in the following slides.

•Nominal level: categories only•Ordinal level: categories with some order•Interval level: meaningful differences but no natural starting point•Ratio level: meaningful differences and a natural starting point

Slide 30


Characterized by data that consist of names, labels, or

categories only. The data cannot be arranged in an ordering

scheme (such as low to high).

Examples: Survey responses of yes, no, undecidedPolitical affiliation

Nominal Level of MeasurementNominal Level of MeasurementNominal Level of MeasurementNominal Level of Measurement

666666

Slide 31


Involves data that may be arranged in some order, but

differences between data values either cannot be determined or

are meaningless

Example:Course grades A, B, C, D, or FRanks of colleges

Ordinal Level of MeasurementOrdinal Level of MeasurementOrdinal Level of MeasurementOrdinal Level of MeasurementSlide 32


Like the ordinal level, with the additional property that the difference

between any two data values is meaningful. However, there is no

natural zero starting point (where none of the quantity is present)

Examples:Years 1000, 2000, 1776, and 1492Body temperature

Interval Level of MeasurementInterval Level of MeasurementInterval Level of MeasurementInterval Level of Measurement

Slide 33


The interval level modified to include the natural zero starting

point (where zero indicates that none of the quantity is present).

For values at this level, differences and ratios are meaningful.

Examples:Prices of college textbooks ($0 represents no cost)Distance traveled by cars (0 milerepresents no distance traveled)

Ratio Level of MeasurementRatio Level of MeasurementRatio Level of MeasurementRatio Level of MeasurementSlide 34

Section 1-4 Critical Thinking

Slide 35


� Bad Samples

� Small Samples

�Misleading Graphs

� Pictographs

�Distorted Percentages

� Loaded Questions

�Order of Questions

�Refusals

�Correlation & Causality

� Self Interest Study

� Precise Numbers

� Partial Pictures

�Deliberate Distortions

Misuses of StatisticsMisuses of StatisticsMisuses of StatisticsMisuses of Statistics

Slide 36

Section 1-5 Collecting Sample Data

777777

Slide 37


Major PointsMajor PointsMajor PointsMajor Points

� If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical tutoring can salvage them.

� Randomness typically plays a critical role in determining which data tocollect.

Slide 38


� Observational Study observing and measuring specific characteristics without attempting to modify the subjects being studied

� Experiment apply some treatment and then observe its effects on the subjects

DefinitionsDefinitionsDefinitionsDefinitions

Slide 39


� Cross Sectional Study

Data are observed, measured, and collected at one point in time.

� Retrospective Study

Data are collected from the past by going back in time.

� Prospective Study

Data are collected in the future from groups sharing common factors.

Observational StudyObservational StudyObservational StudyObservational StudySlide 40


occurs in an experiment when the

experimenter is not able to distinguish between the effects of different factors

Try to plan the experiment so

confounding does not occur!

ConfoundingConfoundingConfoundingConfounding

Slide 41


Experiments: Controlling Experiments: Controlling Experiments: Controlling Experiments: Controlling Effects Effects Effects Effects of Variablesof Variablesof Variablesof Variables

� Blindingsubject does not know he or she is receiving a treatment or placebo

� Blocksgroups of subjects with similar characteristics

� Completely Randomized Experimental Designsubjects are put into blocks through a process of random selection

� Rigorously Controlled Designsubjects are very carefully chosen

Slide 42


� Replication repetition of an experiment when there are enough subjects to recognize the differences in different treatments

Replication andReplication andReplication andReplication andSample SizeSample SizeSample SizeSample Size

� Sample Size use a sample size that is large enough to see the true nature of any effects and obtain that sample using an appropriate method, such as one based on randomness

888888

Slide 43


GOOD SAMPLING METHODS

�Random

� Systematic

�Stratified

�Cluster

Methods of SamplingMethods of SamplingMethods of SamplingMethods of Sampling

BIASED SAMPLING METHODS

�Convenience

�Voluntary response

Slide 44


Convenience Convenience Convenience Convenience SamplingSamplingSamplingSampling

(biased sampling method)(biased sampling method)(biased sampling method)(biased sampling method)use results that are easy to getuse results that are easy to getuse results that are easy to getuse results that are easy to get

Slide 45Voluntary response samplingVoluntary response samplingVoluntary response samplingVoluntary response sampling

(biased sampling method) (biased sampling method) (biased sampling method) (biased sampling method) Individuals choose to be involvedIndividuals choose to be involvedIndividuals choose to be involvedIndividuals choose to be involved

Slide 46

Probability Probability Probability Probability or random sampling:random sampling:random sampling:random sampling: Individuals are randomly

selected. No one group should be over-represented.

Random samples rely on the absolute

objectivity of random numbers. There are

books and tables of random digits available for random sampling.

Sampling randomly gets rid of bias.Sampling randomly gets rid of bias.Sampling randomly gets rid of bias.Sampling randomly gets rid of bias.

Good sampling methods:Good sampling methods:Good sampling methods:Good sampling methods:

Slide 47


� Random Sample members of the population are selected in such a way that each individual member has an equal chance of being selected

DefinitionsDefinitionsDefinitionsDefinitions

�Simple Random Sample (of size n)subjects selected in such a way that every

possible sample of the same size n has the same chance of being chosen

Slide 48

� Random Sample members of the population are selected in such a way that each individual member has an equal chance of being selected

�Simple Random Sample (of size n)subjects selected in such a way that every

possible sample of the same size n has the same chance of being chosen

999999

Slide 49


Random Sampling Random Sampling Random Sampling Random Sampling selection so that each has an selection so that each has an selection so that each has an selection so that each has an

equal chance of being selectedequal chance of being selectedequal chance of being selectedequal chance of being selected

Slide 50


Systematic SamplingSystematic SamplingSystematic SamplingSystematic SamplingSelect some starting point and then Select some starting point and then Select some starting point and then Select some starting point and then

select every select every select every select every KKKKthththth element in the populationelement in the populationelement in the populationelement in the population

Slide 51


Stratified SamplingStratified SamplingStratified SamplingStratified Samplingsubdivide the population into at subdivide the population into at subdivide the population into at subdivide the population into at

least two different subgroups that share the same least two different subgroups that share the same least two different subgroups that share the same least two different subgroups that share the same

characteristics, then draw a sample from each subgroup characteristics, then draw a sample from each subgroup characteristics, then draw a sample from each subgroup characteristics, then draw a sample from each subgroup

(or stratum)(or stratum)(or stratum)(or stratum)

Slide 52


Cluster SamplingCluster SamplingCluster SamplingCluster Samplingdivide the population into sections divide the population into sections divide the population into sections divide the population into sections

(or clusters); randomly select some of those clusters; choose (or clusters); randomly select some of those clusters; choose (or clusters); randomly select some of those clusters; choose (or clusters); randomly select some of those clusters; choose

allallallall members from selected clustersmembers from selected clustersmembers from selected clustersmembers from selected clusters

Slide 53


� Sampling Error

the difference between a sample result and the true population result; such an error results from chance sample fluctuations

� Nonsampling Error sample data that are incorrectly collected, recorded, or analyzed (such as by selecting a biased sample, using a defective instrument, or copying the data incorrectly)

DefinitionsDefinitionsDefinitionsDefinitionsSlide 54Caution about sampling surveys

� Nonresponse: Nonresponse: Nonresponse: Nonresponse: People who feel they have something to hide

or who don’t like their privacy being invaded probably won’t

answer. Yet they are part of the population.

� Response bias:Response bias:Response bias:Response bias: Fancy term for lying when you think you

should not tell the truth. Like if your family doctor asks:

“How much do you drink?” Or a survey of female students

asking: “How many men do you date per week?” People also

simply forget and often give erroneous answers to questions

about the past.

� Wording effects:Wording effects:Wording effects:Wording effects: Questions worded like “Do you agree that it

is awful that…” are prompting you to give a particular

response.

101010101010

Slide 55� UndercoverageUndercoverageUndercoverageUndercoverage

Undercoverage occurs when parts of the

population are left out in the process of choosing

the sample.

Because the U.S. Census goes “house to house,” homeless

people are not represented. Illegal immigrants also avoid being

counted. Geographical districts with a lot of undercoverage

tend to be poor ones. Representatives from richer areas

typically strongly oppose statistical adjustment of the census.

Historically, clinical trials have avoided including

women in their studies because of their periods and

the chance of pregnancy. This means that medical

treatments were not appropriately tested for women.

This problem is slowly being recognized and

addressed.

Slide 56

Learning about populations from samples

The techniques of inferential statistics allow us to draw

inferences or conclusions about a population from a sample.

� Your estimate of the population is only as good as your

sampling design � Work hard to eliminate biases.

� Your sample is only an estimate—and if you randomly

sampled again, you would probably get a somewhat

different result.

� The bigger the sample the better.

chapter 1 1111----1111 overview overview what is ...an73773/slidesclass1f09.pdf · what is...

Documents