1 experimental statistics - week 2 review: 2-sample t-tests paired t-tests thursday: meet in 15...

Post on 13-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Experimental Statistics - week 2Experimental Statistics - week 2

Review: 2-sample t-tests paired t-tests

Thursday: Meet in 15 Clements!! Bring Cody and Smith book

2

p-Value p-Value

(observed value of t)

-2.39

p-value

0 0 0 : : vs. aH H

0H t t Reject if

Suppose t = - 2.39 is observed from data for test above

Note: “Large negative values” of t make us believe alternative is true

the probability of an observation as extreme or more extreme than the one observed when the null is true

3

Note:Note:-- if p-value is less than or equal to then we reject null at the significance level 

-- the p-value is the smallest level of significance at which the null hypothesis would be rejected

4

Find the p-values for Examples 1 and 2

5

6

Two Independent SamplesTwo Independent Samples

• Assumptions: Measurements from each population are

– Mutually Independent Independent within each sample

Independent between samples

– Normally distributed (or the Central Limit Theorem can be invoked)

• Analysis differs based on whether the 2 populations have the same standard deviation

7

Two CasesTwo Cases

• Population standard deviations equal– Can obtain a better estimate of the common

standard deviation by combining or “pooling” individual estimates

• Population standard deviations unequal– Must estimate each standard deviation

– Very good approximate tests are available

If Unsure, Do Not AssumeEqual Standard Deviations

8

Equal Population Standard Deviations

Equal Population Standard Deviations

Test Statistic

df = n1 + n2 - 2

nns

)μ(μ)yy( t=

p21

2121

11

s= s

+nn

sn + sn=s

pp

p

2

21

222

2112

2

)1()1(

where

9

Behrens-Fisher ProblemBehrens-Fisher Problem

y

2

22

1

21

2121 t~

ns

ns

)(y

1 2 If

10

Satterthwaite’s Approximate t Statistic

Satterthwaite’s Approximate t Statistic

y

1 t

ns

ns

)(y

2

22

1

21

212

1 2 If

2 2 21 2

2 21 2

1 2

( ), ,

1 1

a b s sa b

a b n nn n

df = (Approximate t df)

(i.e. approximate t)

11

Often-Recommended Strategy for Tests on Means

Often-Recommended Strategy for Tests on Means

Test whether 1 = 2 (F-test )– If the test is not rejected, use the 2-sample t statistics,

assuming equal standard deviations– If the test is rejected, use Satterthwaite’s approximate t

statistic

NOTE: This is Not a good strategy– the F-test is highly susceptible to non-normality

Recommended Strategy:– If uncertain about whether the standard deviations are

equal, use Satterthwaite’s approximate t statistic

12

Example 3: Comparing the Mean Breaking Strengths of 2 PlasticsExample 3: Comparing the Mean Breaking Strengths of 2 Plastics

Plastic A:

Plastic B:

.= , s.=y , = n AAA 3332835

Assumptions:Mutually independent measurementsNormal distributions for measurements from each type of plastic

.= , s.=y , = n AAA 9472640

Question:Question: Is there a difference between the 2 plastics in terms of mean breaking strength?

13

Example 3 - solution

14

15

Design:Design:

50 people: randomly assign 25 to go on diet and 25 to eat normally for next month.

Assess results by comparing weights at end of 1 month.

Diet: No Diet:Diet: No Diet:

D

D

X

SND

ND

X

S

Run 2-sample t-test using guidelines we have discussed.

Is this a good design?

New diet – Is it effective?New diet – Is it effective?

16

Better Design:Better Design:

Randomly select subjects and measure them before and after 1-month on the diet.

Subject Before After 1 150 147 2 210 195 : : :

n 187 190

Difference 3 15 :

-3

Procedure: Calculate differences, and analyze differences using a 1-sample test

““Paired t-Test”Paired t-Test”

17

Example 4: International Gymnastics Judging

Example 4: International Gymnastics Judging

Contestant 1 2 3 4 5 6 7 8 9 10 11 12Native J udge 6.8 4.5 8.0 7.2 8.7 4.5 6.6 5.8 6.0 8.8 8.7 4.4Foreign J udges 6.7 4.3 8.1 7.2 8.3 4.6 5.4 5.9 6.1 9.1 8.7 4.3

Question: Do judges from a contestant’s country rate their own contestant higher than do foreign judges?

0 : N FH i.e. test

:a N FH

Data:

18

Example 4 solution

19

Introduction to SAS Introduction to SAS Programming LanguageProgramming Language

21

Fertilizer Data

Brand 1 Brand 2 51.0 cm 54.0 cm 53.3 56.1 55.6 52.1 51.0 56.4 55.5 54.0 53.0 52.9 52.1

A researcher studies the effect of two fertilizer brands on the growth of plants. Thirteen plants grown under identical conditions except that 7 plants are randomly selected to receive Brand 1 and the remaining 6 are fertilized using Brand 2. The data for this experiment are as follows where the outcome measurement is the height of the plant after 3 weeks of growth (you may assume the heights to be normally distributed):

22

The Fertilizer data set as SAS needs to see it

A 51.0A 53.3A 55.6A 51.0A 55.5A 53.0A 52.1B 54.0B 56.1B 52.1B 56.4B 54.0B 52.9

23

Case 1:  Data within SAS FILE : DATA one;INPUT brand$ height;DATALINES;A 51.0A 53.3 . . . B 54.0E 52.9 ;PROC TTEST; CLASS brand; VAR height; TITLE ‘Fertilizer Data – 2-sample t-test';RUN;

SAS file for FERTILIZER data

24

Brief Discussion of Components of the SAS File:

DATA Step

  DATA STATEMENT - the first DATA statement names the data set whose variables are defined in the INPUT statement -- in the above, we create data set 'one'

   INPUT STATEMENT - 2 forms

1.  Freefield - can be used when data values are separated by 1 or more blanks

       INPUT   NAME $  AGE SEX $   SCORE;          ($ indicates character variable)

  2.  Formatted - data occur in fixed columns

       INPUT    NAME $ 1-20  AGE 22-24  SEX  $ 26   SCORE 28-30;  

DATALINES STATEMENT       -  used to indicate that the next records in the file contain the actual data and the semicolon after the data indicates the end of the data itself  

25

SPECIFYING THE ANALYSISSPECIFYING THE ANALYSIS --  PROC STATEMENTS

 GENERAL FORM   PROC xxxxx; implies procedure is to be run on most recently created data set  PROC xxxxx  DATA = data set name; Note:  I did not have to specify DATA=one in the above example

  Example PROCs:

PROC REG - regression analysisPROC ANOVA - analysis of variance PROC GLM - general linear model PROC MEANS - basic statistics, t-test for H0:

PROC PLOT - plottingPROC TTEST - t-tests PROC UNIVARIATE - descriptive stats, box-plots, etc.

PROC BOXPLOT - boxplots

26

PROC TTESTPROC TTEST

• Proc TTEST data = fn ;

Class … ; (specify the classification variable)

Var … / options; (specify the variable for which the means are compared)

Run;

27

SAS SyntaxSAS Syntax

• Every command MUSTMUST end with a semicolon– Commands can continue over two or more lines

• Variable names are 1-8 characters (letters and numerals, beginning with a letter or underscore), but no blanks or special characters

– Note: values for character variables can exceed 8 characters

• Comments – Begin with *, end with ;

28

Titles and LabelsTitles and Labels

• TITLE ‘…’ ;– Up to 10 title lines: TITLE ‘include your title here’;

– Can be placed in Data Steps or Procs

• LABEL name = ‘…’ ;– Can be in a DATA STEP or PROC PRINT

– Include ALL labels, then a single ;

Note: For class assignments, place descriptive titles and labels on the output.

29

Case 2:  Data in External File : 

FILENAME f1 ‘complete directory/file specification’;  

FILENAME f1 ‘fertilizer.data';DATA one;INFILE f1; INPUT brand$ height;PROC TTEST; CLASS brand; VAR height; TITLE ‘Fertilizer Data – 2-sample t-test';RUN;

30

PC SAS on Campus

Library

BIC

Student Center

http://support.sas.com/rnd/le/index.html

SAS Learning Edition $125

top related