chahine hypothesis testing,

Hypothesis Testing

Foundations ISeptember 26, 2011

22011-2012

Objectives Differentiate three measures of central tendency,

including their advantages and disadvantages Explain the rationale of hypothesis testing Define the null and alternate hypotheses Define and interpret: p value, test statistic, type I and

II error, alpha, beta and statistical power Explain how statistical power and sample size are

related and describe other factors influencing power

32011-2012

Levels of Measurement Categorical (nominal) Ordinal Interval Ratio

42011-2012

Categorical Data Non-ordered data Often represents different categories: sex, eye

colour, genotypes etc… An average would be meaningless More meaningful to talk about: different

categories, proportions, percentages or mode

52011-2012

Ordinal Data Ordered data The distance between the data points may vary E.g., Placement in a race, perceived level of pain, or

depression scale 7 is greater than 5 and greater than 3 but differences

between 7 & 5 may not be the same as 5 & 3 Average is not meaningful here; finding a middle number

maybe more meaningful and most consistent

62011-2012

Interval Data Very similar to ordinal data, but the differences are

consistent E.g., Temperature in Celsius or Ferinheight Difference between 20 and 30 is the same as the difference

between 40 and 50 Really well designed rating scales gather interval data Important to note that 0 is not meaningful in interval data An average (mean) is meaningful unless data is skewed

72011-2012

Ratio Data Very similar to interval data except 0 is meaningful E.g., Tracking growth of bacteria, height, & weight of babies Someone can be twice as tall as another person; however,

cannot say something is twice as hot or cold unless its measured in Kelvin (in Kelvin temperature of 0 is meaningful)

Average is very useful and many statistical procedures for ratio data are based on means; however, if data is skewed median is more useful

82011-2012

Central Tendency If you wanted to describe a population or a group

of people using one or two numbers you could say: • On average, students in this class scored about 75% on

last exam…. • In this class, the most frequent eye colour is….• In a small sub-sample of 10 students, the middle score on

the exam was….

92011-2012

Mean, Median & Mode Depending on the type and quality of your data,

either mean, median, or mode may be more suitable in describing the typical structure of your data or central tendency

Statistical analyses such as Analysis of Variance, or Chi Square Analysis or T-Tests are based on different measures of central tendency

102011-2012

Descriptive vs. Inferential Statistics Descriptive statistics describe the sample or

population usually by providing values of range, maximum, minimum, central tendency, variance (sum of individual differences from the mean)

Inferential statistics are often used when you do not have access to the entire population and want to make an inference about this population

112011-2012

A Conjecture….. After doing a great deal of reading, the dean of a

well know US medical school believed that in general, the students in medical programs have an average IQ of 135

This is conjecture about an entire population of undergraduate medical students

122011-2012

Hypothesis Testing: Step 1 We can test the dean’s conjecture…

Null Hypothesis - Ho: µ=135

Alternative Hypothesis - HA: µ≠135

We test for the conjecture or hypothesis by making it the null

132011-2012

Role of Software Computer programs such as SPSS, SAS, R, STATA,

etc… They have built in algorithms to carry out what you

might do by hand Its is important to initially do this by hand to

understand what it means to reject, or fail to reject the null hypothesis

142011-2012

Hypothesis Testing: Step 2 Because we are not dealing with absolutes and we are making a

prediction about a population its not exact. We need to select a criterion or significance level by which we can

either reject or accept the null hypothesis. Most often the criterion or significance level is set at .05 It is also referred to as p-value or α

At what point is the difference between the sample mean and 135 not due to chance but fact ??

152011-2012

Hypothesis Testing: Step 3 - We sample 10 students - Area of acceptance is 95%- Look up critical values on a t-

score table (±2.262)

162011-2012

Hypothesis Testing: Step 4 We need to randomly draw a sample of 10 Students

115, 140, 133, 125, 120, 126, 136, 124, 132, 129

Mean = 128

172011-2012

Hypothesis Testing: Step 5 We need to calculate Standard Deviation (SD) &

Standard Error (SE)

How many people you know has heard of standard deviation before?

How many people know what it means?

182011-2012

Before SD we need tounderstand variance

Standard Deviation – Can be thought of as an average of deviation Standard Error – Is an estimation of SD used in calculating t-statistic

IQ Scores Mean Diviations Scores Diviations scores Squared 115 128 13 169140 128 -12 144133 128 -5 25125 128 3 9120 128 8 64126 128 2 4136 128 -8 64124 128 4 16132 128 -4 16129 128 -1 1

Sum 0 512

Sample Variance 0 56.88889

Standard Deviation 0 7.542472

Standard Error 2.385139

192011-2012

T-Test The t-statistic was introduced in 1908 by William

Sealy Gosset A chemist working for the Guinness brewery in

Dublin, Ireland ("Student" was his pen name) Gosset devised the t-test as a way to cheaply

monitor the quality of stout Published the test in Biometrika in 1908

202011-2012

Hypothesis Testing: Steps 6 & 7 T-statistic = (sample average – hypothesis)/standard error

t= (128-135)/2.385t=-2.935

“The hypothesis that the mean IQ of the population is 135 was rejected, t= -2.935, df=9, p≤ .05.”

212011-2012

Type I and II Error Remember in step 2, we asked how much will we attribute the

difference of means to chance… Measurement is never exact; though some journals and papers

vary, a p-value of .05 (meaning that we are 95% sure that result did not happen by chance) is used

When we have rejected the null and it is actually true this is type I error or “false positive”

When we have not rejected the null and it is actually false this is a type II error or “false negative”

222011-2012

Power and Measures How much power does our prediction have? How much can we infer? It depends on sample size & quality of the measure IQ, Depression Scale, Cognitive ability are unobservable Growth of bacteria, cellular effects from medication are

observables – a ruler can be put to it The more we can see, the less population we will need The more accurate our inferences, the smaller error we would

produce

232011-2012

Contact Dr. Saad Chahine

[email protected]

mailto:[email protected]

chahine hypothesis testing,

Technology