introduction to statistics - university of texas at el pasoutminers.utep.edu/crboehmer/introduction...

35
Introduction to Statistics Measures of Central Tendency

Upload: others

Post on 25-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Introduction to Statistics

Measures of Central Tendency

Page 2: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Two Types of StatisticsTwo Types of Statistics• Descriptive statistics of a POPULATION• Relevant notation (Greek):

– µ mean– N population size– ∑ sum

• Inferential statistics of SAMPLES from a population.– Assumptions are made that the sample reflects

the population in an unbiased form. Roman Notation:

– X mean– n sample size– ∑ sum

Page 3: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

• Be careful though because you may want to use inferential statistics even when you are dealing with a whole population.

• Measurement error or missing data may mean that if we treated a population as complete that we may have inefficient estimates.– It depends on the type of data and project.– Example of Democratic Peace.

Page 4: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

• Also, be careful about the phrase “descriptive statistics”. It is used generically in place of measures of central tendency and dispersion for inferential statistics.

• Another name is “summary statistics”, which are univariate:– Mean, Median, Mode, Range, Standard

Deviation, Variance, Min, Max, etc.

Page 5: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Measures of Central TendencyMeasures of Central Tendency• These measures tap into the average

distribution of a set of scores or values in the data. – Mean– Median– Mode

Page 6: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

What do you What do you ““MeanMean””??

The “mean” of some data is the average score or value, such as the average age of an MPA student or average weight of professors that like to eat donuts.

Inferential mean of a sample: X=(∑X)/nMean of a population: µ=(∑X)/N

Page 7: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Problem of being Problem of being ““meanmean””• The main problem associated with the

mean value of some data is that it is sensitive to outliers.

• Example, the average weight of political science professors might be affected if there was one in the department that weighed 600 pounds.

Page 8: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

DonutDonut--Eating ProfessorsEating ProfessorsWeightWeightProfessor

248.3194.6

227227Calzone

199199Googles-Boop

132132Queenie

151151Boehmer

308308Zingers

251251Honkey-Doorey

148148Levin

165165Schnickerson

610187Homer

410189Pallitto

213213Bopsey

165165Schmuggles

Page 9: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

The Median (not the cement in the middle of the road)

• Because the mean average can be sensitive to extreme values, the median is sometimes useful and more accurate.

• The median is simply the middle value among some scores of a variable. (no standard formula for its computation)

Page 10: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

What is the Median?

194.6227Calzone199Googles-Boop132Queenie151Boehmer308Zingers251Honkey-Doorey148Levin165Schnickerson187Homer189Pallitto213Bopsey165Schmuggles

WeightProfessor

308251227213199189187165165151148132

Weight

Rank order and choose middle value.

If even then average between two in the middle

Page 11: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

PercentilesPercentiles

• If we know the median, then we can go up or down and rank the data as being above or below certain thresholds.

• You may be familiar with standardized tests. 90th percentile, your score was higher than 90% of the rest of the sample.

Page 12: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

The Mode (hold the pie and the ala)(What does ‘ala’ taste like anyway??)

• The most frequent response or value for a variable.

• Multiple modes are possible: bimodal or multimodal.

Page 13: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Figuring the Mode

227Calzone199Googles-Boop132Queenie151Boehmer308Zingers251Honkey-Doorey148Levin165Schnickerson187Homer189Pallitto213Bopsey165Schmuggles

WeightProfessor

What is the mode?

Answer: 165

Important descriptive information that may help inform your research and diagnose problems like lack of variability.

Page 14: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Measures of DispersionMeasures of Dispersion (not something you cast…)

• Measures of dispersion tell us about variability in the data. Also univariate.

• Basic question: how much do values differ for a variable from the min to max, and distance among scores in between. We use:– Range– Standard Deviation– Variance

Page 15: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

• Remember that we said in order to glean information from data, i.e. to make an inference, we need to see variability in our variables.

• Measures of dispersion give us information about how much our variables vary from the mean, because if they don’t it makes it difficult infer anything from the data. Dispersion is also known as the spread or range of variability.

Page 16: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

The RangeThe Range (no Buffalo roaming!!)

• r = h – l– Where h is high and l is low

• In other words, the range gives us the value between the minimum and maximum values of a variable.

• Understanding this statistic is important in understanding your data, especially for management and diagnostic purposes.

Page 17: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

The Standard Deviation The Standard Deviation • A standardized measure of distance from

the mean.

• Very useful and something you do read about when making predictions or other statements about the data.

Page 18: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Formula for Standard DeviationFormula for Standard Deviation

=square root∑=sum (sigma)X=score for each point in data_X=mean of scores for the variablen=sample size (number of observations or cases

1)-(n

2)( XX −∑S =

Page 19: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

X X- mean x-mean squaredSmuggle 165 -29.6 875.2Bopsey 213 18.4 339.2 Pallitto 189 -5.6 31.2Homer 187 -7.6 57.5Schnickerson 165 -29.6 875.2Levin 148 -46.6 2170.0Honkey-Doorey 251 56.4 3182.8Zingers 308 113.4 12863.3Boehmer 151 -43.6 1899.5Queeny 132 -62.6 3916.7Googles-boop 199 4.4 19.5Calzone 227 32.4 1050.8Mean 194.6 2480.1 49.8We can see that the Standard Deviation equals 165.2 pounds. The weight of Zinger is still likely skewing this calculation (indirectly through the mean).

Page 20: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Example of S in useExample of S in use

• Boehmer- Sobek paper.– One standard deviation increase in

the value of X variable increases the Probability of Y occurring by some amount.

Page 21: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Table 2: Development and Relative Risk of Territorial Claim

Probability* % Change

Baseline 0.0401development 0.0024 -94.3

pop density 0.0332 -17.3pop growth 0.0469 16.8Capability 0.0813 102.5Openness 0.0393 -2Capability and pop growth 0.0942 134.8

% Change in prob after 1 sd change in given x variable, holding others at their means

Page 22: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Let’s go to computers!

• Type in data in the Excel sheet.

Page 23: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

VarianceVariance

1)-(n

2)( XX −∑S2 =

• Note that this is the same equation except for no square root taken.

• Its use is not often directly reported in research but instead is a building block for other statistical methods

Page 24: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Organizing and Graphing Data

Page 25: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Goal of Graphing?

1. Presentation of Descriptive Statistics2. Presentation of Evidence

3. Some people understand subject matter better with visual aids

4. Provide a sense of the underlying data generating process (scatter-plots)

Page 26: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

What is the Distribution?

• Gives us a picture of the variability and central tendency.

• Can also show the amount of skewness and Kurtosis.

Page 27: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Graphing Data: Types

Page 28: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Creating Frequencies• We create frequencies by sorting data

by value or category and then summing the cases that fall into those values.

• How often do certain scores occur? This is a basic descriptive data question.

Page 29: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Ranking of Donut-eating Profs. (most to least)

132Queeny

148Levin

151Boehmer

165Smuggle

165Schnickerson

187Homer

189Pallitto

199Googles-boop

213Bopsey

227Calzone

251Honkey-Doorey

308Zingers

Page 30: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Here we have placed the Professors into weight classes and depict with a histogram in columns.

Weight Class Intervals of Donut-Munching Professors

0

0.5

1

1.5

2

2.5

3

3.5

130-150 151-185 186-210 211-240 241-270 271-310 311+

Number

Page 31: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Here it is another histogram depicted as a bar graph.

Weight Class Intervals of Donut-Munching Professors

0 0.5 1 1.5 2 2.5 3 3.5

130-150

151-185

186-210

211-240

241-270

271-310

311+

Number

Page 32: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Pie Charts:

Proportions of Donut-Eating Professors by Weight Class

130-150151-185186-210211-240241-270271-310311+

Page 33: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Actually, why not use a donut graph. Duh!

Proportions of Donut-Eating Professors by Weight Class

130-150151-185186-210211-240241-270271-310311+

See Excel for other options!!!!

Page 34: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Line Graphs: A Time Series

0

10

20

30

40

50

60

70

80

90

100

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

Month

App

rova

l

Approval

Economic approval

Page 35: Introduction to Statistics - University of Texas at El Pasoutminers.utep.edu/crboehmer/Introduction to Statistics53004300.pdfFiguring the Mode. Calzone 227 Googles-Boop 199 Queenie

Scatter Plot (Two variable)

Presidential Approval and Unemployment

0

20

40

60

80

100

0 2 4 6 8 10 12

Unemployment

App

rova

l

Approve