b.lect1

23
Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Lecture 1 Chapter 1: Basic Statistical Concepts M. George Akritas M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Upload: ankit-katiyar

Post on 12-May-2015

132 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Lecture 1Chapter 1: Basic Statistical Concepts

M. George Akritas

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 2: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Why Statistics?

Populations, Samples, and Census

Some Sampling Concepts

Representative Samples

Simple Random and Stratified Sampling

Sampling With and Without Replacement

Non-representative Sampling

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 3: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Example (Examples of Engineering/Scientific Studies)

I Comparing the compressive strength of two or more cementmixtures.

I Comparing the effectiveness of three cleaning products inremoving four different types of stains.

I Predicting failure time on the basis of stress applied.

I Assessing the effectiveness of a new traffic regulatory measurein reducing the weekly rate of accidents.

I Testing a manufacturer’s claim regarding a product’s quality.

I Studying the relation between salary increases and employeeproductivity in a large corporation.

What makes these studies challenging (and thus to requireStatistics) is the inherent or intrinsic variability:

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 4: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

I The compressive strength of different preparations of the samecement mixture will differ. The figure in http://sites.

stat.psu.edu/~mga/401/fig/HistComprStrCement.pdf

shows 32 compressive strength measurements, in MPa(MegaPascal units), of test cylinders 6 in. in diameter by 12in. high, using water/cement ratio of 0.4, measured on the28th day after they are made.

I Under the same stress, two beams will fail at different times.

I The proportion of defective items of a certain product willdiffer from batch to batch.

Intrinsic variability renders the objectives of the case studies, asstated, ambiguous.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 5: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

The objectives of the case studies can be made precise if stated interms of averages or means.

I Comparing the average hardness of two different cementmixtures.

I Predicting the average failure time on the basis of stressapplied.

I Estimation of the average coefficient of thermal expansion.

I Estimation of the average proportion of defective items.

Moreover, because of variability, the words ”average” and ”mean”have a technical meaning which can be made clear through theconcepts of population and sample.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 6: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Definition

Population is a well-defined collection of objects or subjects, ofrelevance to a particular study, which are exposed to the sametreatment or method. Population members are called units.

Example (Examples of populations:)

I All water samples that can be taken from a lake.

I All items of a certain manufactured product.

I All students enrolled in Big Ten universities during the2007-08 academic year.

I Two types of cleaning products. (Each type corresponds to apopulation.)

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 7: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

The objective of a study is to investigate certain characteristic(s)of the units of the population(s) of interest.

Example (Examples of characteristics:)

I All water samples taken from a lake. Characteristics: Mercuryconcentration; Concentration of other pollutants.

I All items of a certain manufactured product (that have, or willbe produced). Characteristic: Proportion of defective items.

I All students enrolled in Big Ten universities during the2007-08 academic year. Characteristics: Favorite type ofmusic; Political affiliation.

I Two types of cleaning products. Characteristic: cleaningeffectiveness.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 8: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

I In the example where different (but of the same type) beamsare exposed to different stress levels:

I the characteristic of interest is time to failure of a beam undereach stress level, and

I each stress level used in the study corresponds to a separatepopulation which consists of all beams that will be exposed tothat stress level.

I This emphasizes that populations are defined not only by theunits they consist of, but also by the method or treatmentapplied to these units.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 9: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

I Full (i.e. population-level) understanding of a characteristicrequires the examination of all population units, i.e. a census.

I For example, full understanding of the relation between salaryand productivity of a corporation’s employees requiresobtaining these two characteristics from all employees.

I However,I taking a census can be time consuming and expensive: The

2000 U.S. Census costed $6.5 billion, while the 2010 Censuscosted $13 billion.

I Moreover, census is not feasible if the population ishypothetical or conceptual, i.e. not all members areavailable for examination.

I Because of the above, we typically settle for examining allunits in a sample, which is a subset of the population.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 10: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Due to the intrinsic variability, the sample properties/attributes ofthe characteristic of interest will differ from those of thepopulation. For example

I The average mercury concentration in 25 water samples willdiffer from the overall mercury concentration in the lake.

I The proportion in a sample of 100 PSU students who favorthe use of solar energy will differ from the correspondingproportion of all PSU students.

I The relation between bear’s chest girth and weight in asample of 10 bears, will differ from the corresponding relationin the entire population of 50 bears in a forested region.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 11: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

The GOOD NEWS is that, if the sample is suitably drawn, thensample properties approximate the population properties.

20 25 30 35 40 45 50 55

100

200

300

400

Chest Girth

Weight

Figure: Population and sample relationships between chest girth andweight of black bears.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 12: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Sampling Variability

I Samples properties of the characteristic of interest also differfrom sample to sample. For example:

1. The number of US citizens, in a sample of size 20, who favorexpanding solar energy, will (most likely) be different from thecorresponding number in a different sample of 20 US citizens.

2. The average mercury concentration in two sets of 25 watersamples drawn from a lake will differ.

I The term sampling variability is used to describe suchdifferences in the characteristic of interest from sample tosample.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 13: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

20 25 30 35 40 45 50 55

100

200

300

400

Chest Girth

Weight

Figure: Illustration of Sampling Variability.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 14: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

I Population level properties/attributes of characteristic(s) ofinterest are called (population) parameters.

I Examples of parameters include averages, proportions,percentiles, and correlation coefficient.

I The corresponding sample properties/attributes ofcharacteristics are called statistics. The term sports statisticscomes from this terminology.

I Sample statistics approximate the corresponding populationparameters but are not equal to them.

I Statistical inference deals with the uncertainty issues whicharise in approximating parameters by statistics.

I The tools of statistical inference include point and intervalestimation, hypothesis testing and prediction.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 15: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Example (Examples of Estimation, Hypothesis Testing andPrediction)

I Estimation (point and interval) would be used in the task ofestimating the coefficient of thermal expansion of a metal, orthe air pollution level.

I Hypothesis testing would be used for deciding whether to takecorrective action to bring the air pollution level down, orwhether a manufacturer’s claim regarding the quality of aproduct is false.

I Prediction arises in cases where we would like to predict thefailure time on the basis of the stress applied, or the age of atree on the basis of its trunk diameter.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 16: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Representative SamplesSimple Random and Stratified SamplingSampling With and Without ReplacementNon-representative Sampling

I For valid statistical inference the sample must berepresentative of the population. For example, a sample ofPSU basketball players is not representative of PSU students,if the characteristic of interest is height.

I Typically it is hard to tell whether a sample is representativeof the population. So, we define a sample to be representativeif . . . (cyclical definition!!)

it allows for valid statistical inference.

I The only guarantee for that comes from the method used toselect the sample (sampling method).

I The good news is that there are several sampling methodsguarantee representativeness.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 17: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Representative SamplesSimple Random and Stratified SamplingSampling With and Without ReplacementNon-representative Sampling

Definition

A sample of size n is a simple random sample if the selectionprocess ensures that every sample of size n has equal chance ofbeing selected.

I To select a s.r.s. of size 10 from a population of 100 units, anyof the 100!/(10!90!) samples of size 10 must be equally likely.

I In simple random sampling every member of the populationhas the same chance of being included in the sample. Thereverse, however, is not true.

Example

To select a sample of 2 students from a population of 20 male and20 female students, one selects at random one male and onefemale students. Is this a s.r.s.? (Does every student have thesame chance of being included in the sample?)

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 18: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Representative SamplesSimple Random and Stratified SamplingSampling With and Without ReplacementNon-representative Sampling

Another sampling method for obtaining a representative sample iscalled stratified sampling.

Definition

A stratified sample consists of simple random samples from eachof a number of groups (which are non-overlapping and make upthe entire population) called strata.

I Examples of strata include: ethnic groups, age groups, andproduction facilities.

I If the units in the different strata differ in terms of thecharacteristic under study, stratified sampling is preferable tos.r.s. For example, if different production facilities differ interms of the proportion of defective products, a stratifiedsample is preferable.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 19: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Representative SamplesSimple Random and Stratified SamplingSampling With and Without ReplacementNon-representative Sampling

How do we select a s.r.s. of size n from a population of N units?

I STEP 1: Assign to each unit a number from 1 to N.

I STEP 2: Write each number on a slips of paper, place the Nslips of paper in an urn, and shuffle them.

I STEP 3: Select n slips of paper at random, one at a time.

Alternatively, the entire process can be performed in software likeR. We will see this in the next lab session.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 20: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Representative SamplesSimple Random and Stratified SamplingSampling With and Without ReplacementNon-representative Sampling

I Sampling without replacement simply means that apopulation unit can be included in a sample at most once. Forexample, a simple random sample is obtained by samplingwithout replacement: Once a unit’s slip of paper is drawn, itis not placed back into the urn.

I Sampling with replacement means that after a unit’s slip ofpaper is chosen, it is put back in the urn. Thus a populationunit could be included in the sample anywhere between 0 andn times. Rolling a die can be thought of as sampling withreplacement from the numbers 1, 2, . . . , 6.

I Though conceptually undesirable, sampling with replacementis easier to work with from a mathematical point of view.

I When a population is very large, sampling with and withoutreplacement are practically equivalent.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 21: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Representative SamplesSimple Random and Stratified SamplingSampling With and Without ReplacementNon-representative Sampling

I Non-representative samples arise whenever the sampling planis such that a part, or parts, of the population of interest areeither excluded from, or systematically under-represented in,the sample. This is called selection bias.

I Two examples of non-representative samples are self-selectedand convenience samples.

I A self-selected sample often occurs when people are asked tosend in their opinions in surveys or questionnaires. Forexample, in a political survey, often those who feel that thingsare running smoothly or who support an incumbent will(apathetically) not respond, whereas those activists whostrongly desire change will voice their opinions.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 22: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Representative SamplesSimple Random and Stratified SamplingSampling With and Without ReplacementNon-representative Sampling

I A convenience sample is a sample made up from units thatare most easily reached. For example, randomly selectingstudents from your classes will not result in a sample that isrepresentative of all PSU students because your classes aremostly comprised of students with the same major as you.

I A famous example of selection bias is the following.

Example (The Literary Digest poll of 1936)

The magazine had been extremely successful in predicting theresults in US presidential elections, but in 1936 it predicted a3-to-2 victory for Republican Alf Landon over the Democraticincumbent Franklin Delano Roosevelt. Worth noting is that thisprediction was based on 2.3 million responses (out of 10 millionquestionnaires sent). On the other hand Gallup correctly predictedthe outcome of that election by surveying only 50,000 people.

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

Page 23: B.lect1

OutlineWhy Statistics?

Populations, Samples, and CensusSome Sampling Concepts

Representative SamplesSimple Random and Stratified SamplingSampling With and Without ReplacementNon-representative Sampling

I Go to next lesson http://www.stat.psu.edu/~mga/401/

course.info/b.lect2.pdf

I Go to the Stat 401 home pagehttp://www.stat.psu.edu/~mga/401/course.info/

I http://www.stat.psu.edu/~mga

I http://www.google.com

M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts