statistics. intro to statistics presentations more on who to do qualitative analysis tututorial time

58
Statistics

Upload: adrian-parsons

Post on 31-Dec-2015

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Statistics

Page 2: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

• Intro to statistics

• Presentations

• More on who to do qualitative analysis

• Tututorial time

Page 3: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Inferential statistics

Page 4: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time
Page 5: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time
Page 6: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time
Page 7: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time
Page 8: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Descriptive vs Inferential statistics

• Descriptive statistics like totals (how many people came?), percentages (what proportion of the total were adolescents?) and averages (how much did they enjoy it?) use numbers to describe things that happen. Descriptive data page

• Inferential statistics infer or predict the differences and relationships between things. They also tell us how certain or confident we can be about the predictions.

Page 9: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Why statistics are importantStatistics are concerned with difference – how much

does one feature of an environment differ from another

Suicide rates/100,000 people

Page 10: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Why statistics are importantRelationships – how does much one feature of the environment

change as another measure changes The response of the fear centre of white people to black faces

depending on their exposure to diversity as adolescents

Page 11: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

The two tasks of statisticsMagnitude: What is the size of the difference or the

strength of the relationship?

Reliability. What is the degree to which the measures of the magnitude of variables can be replicated with other samples drawn from the same population.

Page 12: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Magnitude – what’s our measure?

Suicide rates/100,000 people

• Raw number?• Some aggregate of numbers? Mean, median, mode?

Page 13: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Arithmetic mean or averageMean (M or X), is the sum (X)

of all the sample values ((X1 +

X2 +X3.…… X22) divided by the sample size (N). Mean/average = X/N

A B A*B C A*COverall rating

General   Unitec  

2 1 2 1  3 0 0 2  4 3 12 0  5 4   7  6 3   6  7 12   8  8 38   16  9 28   10  

10 57 ___  14 ___ N 146   64  

Page 14: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Compute the mean

General UnitecTotal (X) 1262 493

N 146 64

mean 8.64 7.70

Page 15: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

The median• median is the "middle" value of the sample. There are

as many sample values above the sample median as below it.

• If the number (N) in the sample is odd, then the median = the value of that piece of data that is on the (N-1)/2+1 position of the sample ordered from smallest to largest value. E.g. If N=45, the median is the value of the data at the (45-1)/2+1=23rd position

• If the sample size is even then the median is defined as the average of the value of N/2 position and N/2+1. If N=64, the median is the average of the 64/2 (32nd) and the 64/2+1(33rd) position

Page 16: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Other measures of central tendency• The mode is the single most frequently occurring

data value. If there are two or more values used equally frequently, then the data set is called bi-modal or tri-modal, etc

• The midrange is the midpoint of the sample - the average of the smallest and largest data values in the sample. (= (2+10)/2 =6 for both groups

• The geometric mean (log transformation) =8.46 (general) and 7.38 (Unitec)

• The harmonic mean (inverse transformation) =8.19 (general) and 6.94 (Unitec)

• Both these last measures give less weight to extreme scores

Page 17: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Compute the median and mode

Overall rating General Unitec

2 1 13 0 24 3 05 4 76 3 67 12 88 38 169 28 10

10 57 14N 146 64

Page 18: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Means, median, mode

General Unitec

N 146 64

mean 8.64 7.70

median 9 8

mode 10 8

geometric mean 8.49 7.38

harmonic mean 8.19 6.94

Page 19: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time
Page 20: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

The underlying distribution of the data

0

0.05

0.1

0.15

0.2

0.25

2 4 6 8 10 12 14

Prop

ortio

n of

scor

es

Overall adults OAP rating

Mean =8.36 Median=8.36Mode = 8.36

Page 21: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Normal distribution

Page 22: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Data that looks like a normal distribution

Page 23: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Three things we must know before we can say events are different

1. the difference in mean scores of two or more events

- the bigger the gap between means the greater the difference

2. the degree of variability in the data

- the less variability the better, as it suggests that differences between are reliable

Page 24: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Variance and Standard DeviationThese are estimates of the spread of data. They

are calculated by measuring the distance between each data point and the mean

variance (s2) is the average of the squared deviations of each sample value from the mean = s2 = X-M)2/(N-1)

The standard deviation (s) is the square root of the variance.

Page 25: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Calculating the

Variance (s2) and the Standard Deviation (s) for the

Unitec sample

X n (X-Mu) (X-Mu)2*nOverall rating Unitec    

2 1 -5.70 32.53 2 -4.70 44.24 0 -3.70 0.05 7 -2.70 51.16 6 -1.70 17.47 8 -0.70 4.08 16 0.30 1.49 10 1.30 16.8

10 14 2.30 73.9N 64   241.4

Mean Unitec (Mu)= 7.70 Variance= 3.83

SD or s= 1.96

Page 26: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

All normal distributions have similar properties. The percentage of the scores that is between one standard

deviation (s) below the mean and one standard deviation above is always 68.26%

s

Page 27: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Is there a difference between Unitec and General overall OAP rating scores

Page 28: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Is there a significant difference between Unitec and General OAP rating scores

ss

Page 29: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Three things we must know before we can say events are different

3. The extent to which the sample is representative of the population from which it is drawn

- the bigger the sample the greater the likelihood that it represents the population from which it is drawn

- small samples have unstable means. Big samples have stable means.

Page 30: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Estimating difference The measure of stability of the mean is the Standard

Error of the Mean = standard deviation/the square root of the number in the sample.

So stability of mean is determined by the variability in the sample (this can be affected by the consistency of measurement) and the size of the sample.

The standard error of the mean (SEM) is the standard deviation of the normal distribution of the mean if we were to measure it again and again

Page 31: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Yes it’s significant. The mean of the smaller sample (Unitec) is not too variable. Its Standard Error of the Mean = 0.24. 1.96 *SE

= 0.48 = the 95% confidence interval. The General mean falls outside this confidence interval

ss

Page 32: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Is the difference between means significant?

What is clear is that the mean of the General group is outside the area where there is a 95% chance that the mean for the Unitec Group will fall, so it is likely that the General mean comes from a different population as the Unitec mean.

The convention is to say that if mean 2 falls outside of the area (the confidence interval) where 95% of mean 1 scores is estimated to be, then mean 2 is significantly different from mean 1. We say the probability of mean 1 and mean 2 being the same is less than 0.05 (p<0.05) and the difference is significant

p

Page 33: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

The significance of significance• Not an opinion• A sign that very specific criteria have been met• A standardised way of saying that there is a

There is a difference between two groups – p<0.05;There is no difference between two groups – p>0.05;There is a predictable relationship between two

groups – p<0.05; orThere is no predictable relationship between two

groups - p>0.05.

• A way of getting around the problem of variability

Page 34: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

If you argue for a one

tailed test – saying the

difference can only be in one direction, then you can add 2.5% error from side

where no data is expected to the side where

it is

2.5% of distri-bution

2.5% of distri-bution

95% of distri-bution

2-tailed test

1-tailed test

-1.96 +1.96 Standard deviations

One and two tailed

tests

Page 35: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

T-test result

t-Test: Two-Sample Assuming Unequal VariancesGeneral adults Unitec adults

Mean 8.64 7.7Variance 2.34 3.83Observations 146 64

t Stat for p<0.05 3.41p one-tail 0.00t Critical one-tail 1.66p two-tail 0.00t Critical two-tail 1.98

Page 36: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

  Massey Unsworth HeightsMean 9.23 8.33Variance 1.20 4.24Observations 52 15t Stat for p<0.05 1.62p one-tail 0.06t Critical one-tail 1.75p two-tail 0.12t Critical two-tail 2.12

  male female Mean 8.94 8.65Variance 1.55 2.28Observations 83 125t Stat for p<0.05 1.52p one-tail 0.07t Critical one-tail 1.65p two-tail 0.13t Critical two-tail 1.97

Page 37: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Correlations and Chi-square

Page 38: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

The correlation with the glacier went unnoticed.The debate proceeded and receded with slow heated monotonous cold regularityalthough never reversing at the same point of disagreement.

The correlation with the glacier went. . . The weight of paper and opinionnow far-exceeding the frozen mountain, even at its zenith.But no amount of FSC vellum could paper over the crevasse cracked argument.

The correlation with the glacier . . . . The blue-green water vein bled But no aerial artery replenished the source.The constant melt etching the messageof increased bloodletting from the waning carcase

Page 39: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

The correlation with the . . . . . Lost in the science of the unknown.The pre-historic signpost, scarred by graffiti,slowly shrank and collapsedIts incremental deficit matched by political will.

The correlation . . . . . .We are,    we were,    the new dinosaurs,like the sun-burnt beached bergdoomed for demise in the new non-ice age. No-one will record its disappearance or ours.

The correlation with humanity went unnoticed.

Correlation by John S http://allpoetry.com/poem/9257026-http://allpoetry.com/poem/9257026-Correlation-by-JohnSCorrelation-by-JohnS

Page 40: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Yes it’s significant. The mean of the smaller sample (Unitec) is not too variable. Its Standard Error of the Mean = 0.24. 1.96 *SE

= 0.48 = the 95% confidence interval. The General mean falls outside this confidence interval

ss

Page 41: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Chi-square test - comparing OAP samples with the local populations

Massey OAPEuropean 56% 49%Māori 16% 28%Pacific peoples 18% 13%Asian + MELAA 16% 8%Other ethnicity 9% 1%Total 115% 100%

population 49413 300

The question: Is the Massey OAP sample representative of the cultural mix of the Massey population?

Page 42: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

What would we predict?

Massey 2006 Census OAP 2013

European 146 148 Māori 42 85 Pacific peoples 47 39 Asian + MELAA 42 25 Other ethnicity 23 3

300 300

In red are the number of participants we would predict (we EXPECT) based on the percent in each category in the Massey population (2006). In blue is what we got (we OBSERVED). Is the match sufficiently close?

Page 43: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

The chi-square (χ2) testCulture O E O-E (0-E)2 (0-E)2/EEuropean 148 146 1.91 3.66 0.03Māori 85 42 43.26 1871.50 44.84Pacific peoples 39 47 -7.96 63.31 1.35Asian + MELAA 25 42 -16.74 280.20 6.71

Other ethnicity 3 23 -20.48 419.36 17.86N= 300 300 chi-square=

(the sum of (0-E)2/E)70.79

Degrees of freedom = N-1 = 299Value of chi-square (χ2) for p<0.05=335Actual χ2 is less than 335, therefore there is no significant difference between the OAP sample and Massey population(O=Observed (OAP), E=Expected (2006 Census, Massey)Chi-square tableChi-square table click here to get the Chi-Square table

Page 44: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

All the OAP sample show no significant difference (NS) compared with their local population

chi-square

(χ2) df=N-1

value to reach significance

p<0.05 outcomeTotal 126.24 1009 1075 NSMassey 70.79 299 335 NSGlen Eden 25.09 238 275 NSUnsworth Heights 62.54 85 102 NSAvondale 39.71 120 147 NSGlendene 67.53 263 300 NS

If the sample has the same cultural mix as the general population, that helps us in the claim that the outcomes of the research can be generally applied.

Page 45: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

r=0.904N=33p<0.00

Page 46: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

r =( (X – MX)*((Y – MY))/(N*sX*sY)

X = GDP purchasing power in $'000s

Y= Better Life Index (0-10)

MX=Mean of X = 25,200

MY =Mean of Y= 6.34

sX=Standard deviation of X=7.02

sY=Standard deviation of Y=1.44

r =correlation coefficient = +0.90

Is it significant? That depends on how big the sample

is. For N=33, it is highly significant.

Correlations are calculated using means and standard deviations and big samples are more reliable than small ones

Page 47: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Correlations vary from -1 to +1

Page 48: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

• To what extent has today's experience. 1=Hugely; 2=a good amount; 3=some-what; 4=a little bit; 5=not at all

• made your team more aware of what available in this community? • made your team feel more a part of this community? • encouraged team members to use a services/ resources they have

come in contact with? • put team members in closer touch with neighbours or friends

helped team members make some new friends? • given team members some ideas about changes they would like

to make in their lives? • made team members feel safer in this community? • Overall rating: 10 = a wonderful day, 7-8 mostly fun, 5-6=good in

parts, 3=mostly boring, 1 = no fun at all, where would you all rate the day?

Page 49: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

1-tail: p< 0.05 0.025 0.01 0.005

2-tail: p< 0.1 0.05 0.02 0.01

DF=267 (=N-1) 0.102 0.121 0.144 0.159

N=268

r=-0.52

p<0.005

Page 50: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

One or two tails? Have we made a prior prediction? Yes, that high engagement will create high satisfaction = 1 tailed test

What degrees of freedom? df=N-1= 268-1 = 267

What level of significance should be chosen? It depends on the number of correlations. p<0.05 – there is only one correlation. Often there are 100’s – in which case a tougher criterion should be chosen, p<0.01.

Where can we find the critical values of r? HERE

Page 51: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Correlation and regression• Correlation quantifies the degree to which two

random variables are related. Correlation does not fit a line through the data points. You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does.

• Linear regression finds the best line that predicts the size of one variable when given another variable which is fixed. The regression co-efficient (r2) tells how much of the variability of our fixed (dependent) variable is accounted for by the independent variable

Page 52: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

A perfect relationship, but not a linear correlation

x

y

Page 53: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

A powerful relationship,

but not a correlation – what’s

happening here?

Page 54: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

Normality of the data and Homoscedasticity

Page 55: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

r=0.904N=33p<0.00

Page 56: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time
Page 57: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time

How correlation is used and misusedA - The Church Unlimited

B - causes people to want freebies B - The Church Unlimited

A - Misery C - Desire for Freebies

There are so many ways that events can influence each other, that we have to take great about claiming causal relationships between events.

Page 58: Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time