plan statistical basics - unil

9
Quantitative approaches Lesson 9: Statistical basics Quantitative approaches Plan 1. Types of analysis 2. Types of variables : nominal, ordinal, interval, metric 3. Measures of central tendency: mode, median, mean 4. Degrees of freedom 5. Measures of variability: variance and standard deviation 6. Central limit theorem and the normal distribution 7. A measure of unreliability : the standard error 8. Confidence intervals 9. Statistical tests 10. Other distributions and tests : T, F, Chi-square, Poisson, Binomial Quantitative approaches Useful resources http://onlinestatbook.com/rvls/index.html Rice Virtual Lab in Statistics Quantitative approaches 1. Types of analysis

Upload: others

Post on 24-May-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Plan Statistical basics - UNIL

Quantitative approaches

Lesson 9:

Statistical basics

Quantitative approaches

Plan

1. Types of analysis

2. Types of variables : nominal, ordinal, interval, metric

3. Measures of central tendency: mode, median, mean

4. Degrees of freedom

5. Measures of variability: variance and standard deviation

6. Central limit theorem and the normal distribution

7. A measure of unreliability : the standard error

8. Confidence intervals

9. Statistical tests

10. Other distributions and tests : T, F, Chi-square, Poisson,Binomial

Quantitative approaches

Useful resources

http://onlinestatbook.com/rvls/index.html

Rice Virtual Lab in Statistics

Quantitative approaches

1. Types of analysis

Page 2: Plan Statistical basics - UNIL

Quantitative approaches

Types of analysis

- descriptive or inferential

- univariate, bivariate, multivariate

Quantitative approaches

Descriptive vs. inferential analysis

"Descriptive analysis is about the data you have in hand.Inferential analysis involves making statements -inferences - about the world beyond the data you have inhand."

"When you say that the average age of a group oftelephone survey respondents is 44.6 years, that's adescriptive analytic statement. When you say that there is a95% statistical probability that the true mean of thepopulation from which you drew your sample ofrespondents is between 42.5 and 47.5 years, that's aninferential statement. You infer something about the rest ofthe world from data in your sample."

(Bernard, 2000: 502)

Quantitative approaches

Univariate, bivariate, multivariate

- univariate : uses 1 variable

- bivariate: uses 2 variables

- multivariate: uses 3 and more variables

Quantitative approaches

Univariate, bivariate, multivariate

- "Univariate analysis involves getting to know dataintimately by examining variables precisely and in detail.Bivariate analysis involves looking at axssociationsbetween pairs of variables and trying to understand howthose associations work. Multivariate analysis involves,among other things, understanding the effects of more thanone independent variable at a time on a dependentvariable."

(Bernard, 2000: 502)

Page 3: Plan Statistical basics - UNIL

Quantitative approaches

Univariate, bivariate, multivariate: how to

proceed

1. Look at the variables one by one: what is their range,mean, median, variance (is there variance!?), distribution(univariate)

2. Inspect associations between pairs of variables. How doesthe independent variable "influence" the dependentvariable? (bivariate)

3. Look at the associations of several variablessimultaneously. How do two or more independentvariables influence a dependent variable at the same time?(multivariate)

Quantitative approaches

Bivariate analysis: questions to ask

1. How big/important is the covariation? In other words,how much better could we predict the score of a dependentvariable in our sample if we knew the score of someindependent variable? Covariation coefficients answer thisquestion

2. Is the covariation statistically significant? Is it due tochance, or is it likely to exist in the overall population towhich we want to generalize? Statistical tests answer thisquestion.

3. What is it direction? (look at graphs)

4. What is its shape? Is it linear or non linear? (look atgraphs)

Quantitative approaches

Multivariate analysis: questions to ask

1. How is a relationship between two variables changed if athird variable is controlled? (Multiple crosstabs, partialcorrelation, multiple regression, MANOVA)

2. What is the overall variance of a dependent variable thatcan be explained by several independent variables. Whatare the relative strenghs of different predictors(independent variables)? (Multiple regression)

3. What groups of variables tend to correlate with each other,given a multitude of variables? (Factor analysis)

4. Which individuals tend to be similar concerning selectedvariables? (Cluster analysis)

Quantitative approaches

2. Types of variables :

nominal, ordinal, interval, metric

Page 4: Plan Statistical basics - UNIL

Quantitative approaches

Levels of measurement and covariation:

AnalysisDepend. Nominal Ordinal Interval/ratio

Independ.

Nominal Crosstabs (ANOVA ANOVA

Crosstabs) (Means)

Ordinal (Corr/Regr (Corr/Regr)

Crosstabs)

Interval/ratio Logistic (Corr/Regr) Correlation

Regression Regression

Quantitative approaches

3. Measures of central tendency:

mode, median, mean

Quantitative approaches

Definitions : Mode, Median, Mean

Mode = Value in the distribution of the variablethat comes up most frequently

Median = Value in the distribution that has 50% ofthe values «! to its right! » and 50% of thevalues «!to its left!».

Mean = Sum of the values divided by n

Quantitative approaches

Variables : nominal, ordinal, interval

Variables :

1. Nominal have no inherent order

example: party preference, male-female

2. Ordinal are ordered, but the distances are not quantifiable (we cannot add or subtract)

example: agree a lot, agree a bit, disagree abit, disagree a lot

3. Interval can be measured numerically; it makessense to to additions or subtraction

example : height, weight, income, numberof cars

Page 5: Plan Statistical basics - UNIL

Quantitative approaches

Example : Size of 11 dwarfs

1. Size of 11 dwarfs:

13, 7, 5, 12, 9, 15, 7, 11, 9, 7, 12 (cm)

= 5, 7, 7, 7, 9, 9, 11, 12, 12, 13, 15

5 7 7 7 9 9 11 12 12 13 15

Median

9,7272

Mean

Mode

Quantitative approaches

Example : Size of 11 dwarfs

Mode

Median

5, 7, 7, 7, 9, 9, 11, 12, 12, 13, 15

Mean

9.7272

Quantitative approaches

Calculating mean, mode, median

mean = y =y!

n

mean = y =5 + 7 + 7 + 7 + 9 + 9 +11+12 +12 +13+15

11

mean = y =107

11= 9.727273

median = 5, 7, 7, 7, 9, 9, 11, 12, 12, 13, 15

mode= 5, 7, 7, 7, 9, 9, 11, 12, 12, 13, 15

Quantitative approaches

4. Degrees of freedom

Page 6: Plan Statistical basics - UNIL

Quantitative approaches

Degrees of freedom : definition

Degrees of freedom (df) =

number of values in the calculation of an estimate (e.g.mean, variance, standard error) that are «!free to vary!».

Degrees of freedom =

number of independent values that go into the estimate (=n) minus the number of parameters estimated

Quantitative approaches

Degrees of freedom : example

We have 5 dwarfs, their meansize is 4.

What is the sum of their sizes?It must be 20, otherwise themean could not be 4.

So now let’s think about eachof the five dwarfs in turn

We are free to choose the firstfour numbers, but we are notfor the last one - what is it?

(example adapted from

Crawley 2005)

2

2

2

2

2

6

6

6

6

8

8

8

3

3 ?

Quantitative approaches

Degrees of freedom : example

The last dwarf must have thesize = 1, since the sum = 20.

Therefore, we are not «!freeto choose this last number!».

This means, that we have df= 4 in this case.

Check:

n - number of parameters tobe estimated:

5 - 1 = 4

2

2

2

2

2

6

6

6

6

8

8

8

3

3 1

Quantitative approaches

5. Measures of variability:

variance and standard deviation

Page 7: Plan Statistical basics - UNIL

Quantitative approaches

Variance and standard deviation : definitions

Variance and standard deviation are measures of the«! variability! » of a variable. In other words: how muchthey «!vary!» around the mean.

Variance = the sum of the square of the individualdepartures from the mean divided by the degrees offreedom

Standard deviation = the square root of the variance.

Quantitative approaches

Variance

mean = y =y!

n

variance =sum of squares

degrees of freedom= s

2=

(y " y)2!

(n "1)

standard deviation = s =(y " y)2

!(n "1)

Quantitative approaches

Example: Dwarfs in 3 gardens

Quantitative approaches

Size of dwarfs in 3 gardens

Garden A

Garden B

Garden C

Page 8: Plan Statistical basics - UNIL

Quantitative approaches Quantitative approaches

Size of dwarfs in 3 gardens

A B C

3 5 3

4 5 3

4 6 2

3 7 1

2 4 10

3 4 4

1 3 3

3 5 11

5 6 3

2 5 10

Garden

mean(A) = yA= 3

mean(B) = yB= 5

mean(C) = yC= 5

var(A) = sA2= 1.3

var(B) = sB2= 1.3

var(C) = sC2= 14.2

Quantitative approaches

Computing variance of dwarfs in garden A

Var = s2=

(y ! y)"n !1

; y = 5

VarC =(3 ! 5)

2+ (3 ! 5)

2+ (2 ! 5)

2+ (1 ! 5)

2+ (10 ! 5)

2+ (4 ! 5)

2+ (3 ! 5)

2+ (11 ! 5)

2+ (3 ! 5)

2+ (10 ! 5)

2

(10 ! 1)

VarC =(!2)

2

+ (!2)2

+ (!3)2

+ (!4)2

+ (5)2

+ (!1)2

+ (!2)2

+ 62

+ (!2)2

+ (5)2

9

VarC =4 + 4 + 9 +16 + 25 +1+ 4 + 36 + 4 + 25

9

VarC =128

9= 14.2

Quantitative approaches

Computing variance of dwarfs in garden C

Var = s2=

(y ! y)"n !1

; y = 5

VarA =(3 ! 5)

2+ (3 ! 5)

2+ (2 ! 5)

2+ (1 ! 5)

2+ (10 ! 5)

2+ (4 ! 5)

2+ (3 ! 5)

2+ (11 ! 5)

2+ (3 ! 5)

2+ (10 ! 5)

2

(10 ! 1)

VarA =(!2)

2

+ (!2)2

+ (!3)2

+ (!4)2

+ (5)2

+ (!1)2

+ (!2)2

+ 62

+ (!2)2

+ (5)2

9

VarA =4 + 4 + 9 +16 + 25 +1+ 4 + 36 + 4 + 25

9

VarA =128

9= 14.2

Page 9: Plan Statistical basics - UNIL

Quantitative approaches

Boxplot = graphical summary of the

variability of a variable

75% quartile

Median (50% quartile)

Whiskers = lowest data point

that are not outliers or extreme

values.

Boxplots

25% quartile

Quantitative approaches

Outliers = values that are between 1.5 and 3 timesthe interquartile range

Extreme values = values that are more than 3 times theinterquartile range

Interquartile range = distance between the quartiles

In boxplots, outliers and extreme values are represented bycircles beyond the whiskers.

Outliers and extreme values in boxplots

Quantitative approaches

Showing differences between means and

variance graphically with „boxplots“