chapter 1 1111----1111 overview overview what is ...an73773/slidesclass1f09.pdf · what is...
TRANSCRIPT
111111
Slide 1Chapter 1Chapter 1Chapter 1Chapter 1
Introduction to StatisticsIntroduction to StatisticsIntroduction to StatisticsIntroduction to Statistics
Click on the bars to advance to that specific part of the lesson1-1 Review and Preview
1-2 Statistical Thinking1-3 Types of Data1-4 Critical Thinking1-5 Collecting Sample Data
Slide 21111----1111 OverviewOverviewOverviewOverview
What is Statistics about? What is Statistics about? What is Statistics about? What is Statistics about? In a Nutshell:
Slide 3
The Three Major Parts of Statistics
Producing Data
Exploratory Data Analysis
Inference
Slide 4
Producing Data (details are coming later)� In statistics we need data to work with.
� Data can come from many sources: measurements, surveys, experiments, observational studies, etc.
� Weaknesses in data production account for most erroneous conclusions in statistical studies, therefore the production of good data requires careful planning.
Slide 5Exploratory Data Analysis (details are coming later)
� Once we have our data, we want to know what might be “in there”. Data have a story to tell, and our goal is to uncover that story.
� The statistical process called Exploratory Data Analysis (EDA) employs a variety of techniques to maximize insight into a data set. It includes some graphical and non-graphical techniques to analyze data.
Slide 6
Inference (details are coming later)
� When you taste a spoonful of your coffee and conclude that it needs more sugar, that's an inference.
� If there's a lot of sugar sitting on the bottom because you were too sleepy to stir it, coffee from the surface won't be representative, and you'll end up with an incorrect inference (and a coffee with too much sugar). But if you stir your coffee thoroughly before you taste, your spoonful of data can tell you about the whole cup of coffee.
222222
Slide 7
Copyright © 2004 Pearson Education, Inc.
� Data: observations (such as measurements, genders, survey responses)
that have been collected.
� Population: the complete collection of all elements (scores, people,
measurements, and so on) to be studied.
� Census: the collection of data from every member of the population.
� Sample: a sub-collection of elements drawn from a population.
Slide 8Population Examples
� All runners in the 2009 L.A. Marathon� All kindergarten kids at a school district� All 16 oz. bottled water manufactured by Evian� All Milano cookies made by Pepperidge Farm� All rats in the biology lab at CSUN� All ships arriving to the Long Beach port at a particular day� All purebred German Shepherd dogs in Los Angeles county� All tires made by Good Year� All lattes made at the Starbucks closest to your home in a month� All rainy days in Los Angeles
Slide 9
OK, we have a population. Then what?We want to learn something about the population.
Let’s say our population of interest is ALL runners in the 2009 L.A. Marathon.
We might be interested in
� the average age of ALL runners
� the proportion of ALL runners who completed the marathon under three hours
� the percent of female runners who completed the marathon
� the proportion of runners who are over the age of 50
� the mean time of ALL runners who completed the marathon
Slide 10
Or…Let’s say our population of interest is ALL kindergarten kids at a school district.
We might be interested in
� the average vocabulary score of ALL kindergarten kids at the school
� the proportion of ALL kindergarten kids who live in a single-parent home
� the mean height of ALL kindergarten kids at the school
� the percent of ALL kindergarten kids who need speech therapy
Slide 11
Parameter of interest� A parameter is a NUMBER describing some characteristic of a population. As the goal of inference, we wish to estimate this number, or test a hypothesized value of it.
� In this course we only consider two parameters of interest� Mean� Proportion
� Notation:� Population mean: µ� Population proportion: p
BOTH are PARAMETERS!
Slide 12
Example
The director of Personnel for a large firm has been assigned the task of developing a profile of the company’s 3500 managers.
� A couple of characteristics of interest are:� Average salary of ALL managers, µ
� The proportion who have completed the management training program, p
333333
Slide 13
Our Goal in Inference
If ALL the populations, whatever we are interested in, would be manageable in size, we would just figure out the population parameter. Then there would be no need for inference.
Slide 14
But...
� The population we are interested in is usually too big.
� Inference teaches you what to do in this case.
� Inference is mainly concerned with the rules or logic of how the results of a relatively small sample from a large population could be used to make inferences about the population.
� Let’s get back to our keywords.
Slide 15
Sample
� When the population is too big (ex.: all adults in the U.S) to find our parameter of interest, we have to take a sample form the population. Then we can use the sample result to make conclusions about the population parameter. This process is called INFERENCE.
Slide 16
Important note about the sample
Copyright © 2004 Pearson Education, Inc.
The sample must be randomlyselected from the population. If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.
Slide 17
Sample or not sample?� Let’s say our population of interest is ALL the members of the U.S. Senate, and our parameters of interest are� the proportion of female Senates, p� the average age of ALL members, µ
� Since there are only 100 Senate members, the population is relatively small, so we don’t need to take a sample to estimate our parameters of interest. We can just look at them and see what percent of them are females, and we can find the average age of ALL members.
Slide 18
Sample or not sample?� A health advocacy agency suspects that the mean level of acetaminophen (an active ingredient in pain relievers, and cold medications) manufactured by a certain company is not the advertised value.
It is impossible to measure the amount of acetaminophen in ALL the pills made by this company, and therefore the mean acetaminophen level in ALL pills remains unknown. The agency will need to take a sample of pills, and measure the amount of acetaminophen in those pills.
444444
Slide 19
Sample statistic� A statistic is a NUMBER describing some characteristic of a sample.
� Notation:� Sample mean:
� Sample proportion: BOTH are STATISTICS!
inference
x
$p
Slide 20
Example
In random sample of 200 people from the U.S. 12 people had blood type 0. That’s 6% of the sample. This number was a little surprising because we know that about 4% of all people in the U.S. has blood type 0.
� What is the population of interest?� All people in the U.S.
� What is the parameter of interest?� The proportion (percent) of ALL people in the U.S. with blood type 0, which is 4%. With notation: p = 4%
� What is the sample?� The 200 randomly selected people
� What is the statistic?� The proportion (percent) of the 200 people who had blood type 0, which is 6%. With notation: $p = 6%
Slide 21
Section 1-3 Types of Data
Slide 22Population/Sample/Parameter/Statistic
The Basic Idea of InferenceThe Basic Idea of InferenceThe Basic Idea of InferenceThe Basic Idea of Inference
Parameter
Population Sample
Statistic
Data Production
Inference
a numerical measurement describing some characteristic of a
population
a numerical measurement describing some
characteristic of a sample
Slide 23Types of Data
Data
Categorical Quantitative
Discrete Continuous
Slide 24
Types of Data� Once we have our random sample, we want to collect data from them.
� Some data sets consists of numbers:� Age in years� Height in inches� Weight in pounds� Distance traveled in miles
� Some data sets consists of non-numerical answers:� Eye color� Gender� Yes/no answers� Course grades
555555
Slide 25Data
QuantitativeQuantitativeQuantitativeQuantitative CategoricalCategoricalCategoricalCategorical
� Some of these can be measured
� The average makes sense� The average distance, or average height makes sense
� You can sort the data into a “boxes”
� The average doesn’t make sense� The average eye-color, or average gender doesn’t make sense
Male Female
Slide 26
Examples: Categorical data� Gender
� Yes/no questions
� Color
� Satisfaction level
� Grade received at the end of the semester
� Type of car (small, midsize, full size)
� Zipcode
� Ethnicity
Slide 27
Examples: Quantitative data� Quantitative data:
� Weight� Height� Distance� Time� pH level� Amount of money� GPA� Amount of chemical ingredient� Age� Pulse rate
Slide 28Quantitative DataDiscrete Continuous
When the number of possible values is either a finite number or a ‘countable’ number of possible values.
0, 1, 2, 3, . . .
Example: The number of eggs that hens lay.
When data result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps.Example: The amount of milk that a cow produces; e.g. 2.34 gallons per day.
Slide 29
Copyright © 2004 Pearson Education, Inc.
Levels of MeasurementLevels of MeasurementLevels of MeasurementLevels of Measurement
Another way to classify data is to use levels of measurement. Four of these levels are discussed in the following slides.
•Nominal level: categories only•Ordinal level: categories with some order•Interval level: meaningful differences but no natural starting point•Ratio level: meaningful differences and a natural starting point
Slide 30
Copyright © 2004 Pearson Education, Inc.
Characterized by data that consist of names, labels, or
categories only. The data cannot be arranged in an ordering
scheme (such as low to high).
Examples: Survey responses of yes, no, undecidedPolitical affiliation
Nominal Level of MeasurementNominal Level of MeasurementNominal Level of MeasurementNominal Level of Measurement
666666
Slide 31
Copyright © 2004 Pearson Education, Inc.
Involves data that may be arranged in some order, but
differences between data values either cannot be determined or
are meaningless
Example:Course grades A, B, C, D, or FRanks of colleges
Ordinal Level of MeasurementOrdinal Level of MeasurementOrdinal Level of MeasurementOrdinal Level of MeasurementSlide 32
Copyright © 2004 Pearson Education, Inc.
Like the ordinal level, with the additional property that the difference
between any two data values is meaningful. However, there is no
natural zero starting point (where none of the quantity is present)
Examples:Years 1000, 2000, 1776, and 1492Body temperature
Interval Level of MeasurementInterval Level of MeasurementInterval Level of MeasurementInterval Level of Measurement
Slide 33
Copyright © 2004 Pearson Education, Inc.
The interval level modified to include the natural zero starting
point (where zero indicates that none of the quantity is present).
For values at this level, differences and ratios are meaningful.
Examples:Prices of college textbooks ($0 represents no cost)Distance traveled by cars (0 milerepresents no distance traveled)
Ratio Level of MeasurementRatio Level of MeasurementRatio Level of MeasurementRatio Level of MeasurementSlide 34
Section 1-4 Critical Thinking
Slide 35
Copyright © 2004 Pearson Education, Inc.
� Bad Samples
� Small Samples
�Misleading Graphs
� Pictographs
�Distorted Percentages
� Loaded Questions
�Order of Questions
�Refusals
�Correlation & Causality
� Self Interest Study
� Precise Numbers
� Partial Pictures
�Deliberate Distortions
Misuses of StatisticsMisuses of StatisticsMisuses of StatisticsMisuses of Statistics
Slide 36
Section 1-5 Collecting Sample Data
777777
Slide 37
Copyright © 2004 Pearson Education, Inc.
Major PointsMajor PointsMajor PointsMajor Points
� If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical tutoring can salvage them.
� Randomness typically plays a critical role in determining which data tocollect.
Slide 38
Copyright © 2004 Pearson Education, Inc.
� Observational Study observing and measuring specific characteristics without attempting to modify the subjects being studied
� Experiment apply some treatment and then observe its effects on the subjects
DefinitionsDefinitionsDefinitionsDefinitions
Slide 39
Copyright © 2004 Pearson Education, Inc.
� Cross Sectional Study
Data are observed, measured, and collected at one point in time.
� Retrospective Study
Data are collected from the past by going back in time.
� Prospective Study
Data are collected in the future from groups sharing common factors.
Observational StudyObservational StudyObservational StudyObservational StudySlide 40
Copyright © 2004 Pearson Education, Inc.
occurs in an experiment when the
experimenter is not able to distinguish between the effects of different factors
Try to plan the experiment so
confounding does not occur!
ConfoundingConfoundingConfoundingConfounding
Slide 41
Copyright © 2004 Pearson Education, Inc.
Experiments: Controlling Experiments: Controlling Experiments: Controlling Experiments: Controlling Effects Effects Effects Effects of Variablesof Variablesof Variablesof Variables
� Blindingsubject does not know he or she is receiving a treatment or placebo
� Blocksgroups of subjects with similar characteristics
� Completely Randomized Experimental Designsubjects are put into blocks through a process of random selection
� Rigorously Controlled Designsubjects are very carefully chosen
Slide 42
Copyright © 2004 Pearson Education, Inc.
� Replication repetition of an experiment when there are enough subjects to recognize the differences in different treatments
Replication andReplication andReplication andReplication andSample SizeSample SizeSample SizeSample Size
� Sample Size use a sample size that is large enough to see the true nature of any effects and obtain that sample using an appropriate method, such as one based on randomness
888888
Slide 43
Copyright © 2004 Pearson Education, Inc.
GOOD SAMPLING METHODS
�Random
� Systematic
�Stratified
�Cluster
Methods of SamplingMethods of SamplingMethods of SamplingMethods of Sampling
BIASED SAMPLING METHODS
�Convenience
�Voluntary response
Slide 44
Copyright © 2004 Pearson Education, Inc.
Convenience Convenience Convenience Convenience SamplingSamplingSamplingSampling
(biased sampling method)(biased sampling method)(biased sampling method)(biased sampling method)use results that are easy to getuse results that are easy to getuse results that are easy to getuse results that are easy to get
Slide 45Voluntary response samplingVoluntary response samplingVoluntary response samplingVoluntary response sampling
(biased sampling method) (biased sampling method) (biased sampling method) (biased sampling method) Individuals choose to be involvedIndividuals choose to be involvedIndividuals choose to be involvedIndividuals choose to be involved
Slide 46
Probability Probability Probability Probability or random sampling:random sampling:random sampling:random sampling: Individuals are randomly
selected. No one group should be over-represented.
Random samples rely on the absolute
objectivity of random numbers. There are
books and tables of random digits available for random sampling.
Sampling randomly gets rid of bias.Sampling randomly gets rid of bias.Sampling randomly gets rid of bias.Sampling randomly gets rid of bias.
Good sampling methods:Good sampling methods:Good sampling methods:Good sampling methods:
Slide 47
Copyright © 2004 Pearson Education, Inc.
� Random Sample members of the population are selected in such a way that each individual member has an equal chance of being selected
DefinitionsDefinitionsDefinitionsDefinitions
�Simple Random Sample (of size n)subjects selected in such a way that every
possible sample of the same size n has the same chance of being chosen
Slide 48
� Random Sample members of the population are selected in such a way that each individual member has an equal chance of being selected
�Simple Random Sample (of size n)subjects selected in such a way that every
possible sample of the same size n has the same chance of being chosen
999999
Slide 49
Copyright © 2004 Pearson Education, Inc.
Random Sampling Random Sampling Random Sampling Random Sampling selection so that each has an selection so that each has an selection so that each has an selection so that each has an
equal chance of being selectedequal chance of being selectedequal chance of being selectedequal chance of being selected
Slide 50
Copyright © 2004 Pearson Education, Inc.
Systematic SamplingSystematic SamplingSystematic SamplingSystematic SamplingSelect some starting point and then Select some starting point and then Select some starting point and then Select some starting point and then
select every select every select every select every KKKKthththth element in the populationelement in the populationelement in the populationelement in the population
Slide 51
Copyright © 2004 Pearson Education, Inc.
Stratified SamplingStratified SamplingStratified SamplingStratified Samplingsubdivide the population into at subdivide the population into at subdivide the population into at subdivide the population into at
least two different subgroups that share the same least two different subgroups that share the same least two different subgroups that share the same least two different subgroups that share the same
characteristics, then draw a sample from each subgroup characteristics, then draw a sample from each subgroup characteristics, then draw a sample from each subgroup characteristics, then draw a sample from each subgroup
(or stratum)(or stratum)(or stratum)(or stratum)
Slide 52
Copyright © 2004 Pearson Education, Inc.
Cluster SamplingCluster SamplingCluster SamplingCluster Samplingdivide the population into sections divide the population into sections divide the population into sections divide the population into sections
(or clusters); randomly select some of those clusters; choose (or clusters); randomly select some of those clusters; choose (or clusters); randomly select some of those clusters; choose (or clusters); randomly select some of those clusters; choose
allallallall members from selected clustersmembers from selected clustersmembers from selected clustersmembers from selected clusters
Slide 53
Copyright © 2004 Pearson Education, Inc.
� Sampling Error
the difference between a sample result and the true population result; such an error results from chance sample fluctuations
� Nonsampling Error sample data that are incorrectly collected, recorded, or analyzed (such as by selecting a biased sample, using a defective instrument, or copying the data incorrectly)
DefinitionsDefinitionsDefinitionsDefinitionsSlide 54Caution about sampling surveys
� Nonresponse: Nonresponse: Nonresponse: Nonresponse: People who feel they have something to hide
or who don’t like their privacy being invaded probably won’t
answer. Yet they are part of the population.
� Response bias:Response bias:Response bias:Response bias: Fancy term for lying when you think you
should not tell the truth. Like if your family doctor asks:
“How much do you drink?” Or a survey of female students
asking: “How many men do you date per week?” People also
simply forget and often give erroneous answers to questions
about the past.
� Wording effects:Wording effects:Wording effects:Wording effects: Questions worded like “Do you agree that it
is awful that…” are prompting you to give a particular
response.
101010101010
Slide 55� UndercoverageUndercoverageUndercoverageUndercoverage
Undercoverage occurs when parts of the
population are left out in the process of choosing
the sample.
Because the U.S. Census goes “house to house,” homeless
people are not represented. Illegal immigrants also avoid being
counted. Geographical districts with a lot of undercoverage
tend to be poor ones. Representatives from richer areas
typically strongly oppose statistical adjustment of the census.
Historically, clinical trials have avoided including
women in their studies because of their periods and
the chance of pregnancy. This means that medical
treatments were not appropriately tested for women.
This problem is slowly being recognized and
addressed.
Slide 56
Learning about populations from samples
The techniques of inferential statistics allow us to draw
inferences or conclusions about a population from a sample.
� Your estimate of the population is only as good as your
sampling design � Work hard to eliminate biases.
� Your sample is only an estimate—and if you randomly
sampled again, you would probably get a somewhat
different result.
� The bigger the sample the better.