statsdirect instructions 2009

12
x – 3.57 – 4.06 z = = Data Analysis Using StatsDirect A. Investigating probability 1. Open a new StatsDirect workbook To start StatsDirect go to Start > All Programs > Core > Statistics > StatsDirect2006 When StatsDirect opens, click Cancel. 2. Use the z score to calculate a probability The mean FEV 1 in 57 male medical students is 4.06 litres with a standard deviation of 0.67 litres. You need to calculate: In StatsDirect, click Tools > Calculator Type (3.57-4.06)/0.67 into the Expression to evaluate box. Click Calculate Then click Save > Close. Click Yes to copy saved results to report. You need to convert the answer for z into a percentage to find the probability: Click Analysis > Distributions > Normal Type the z value you’ve calculated into the Normal deviate (z) box Constructing a formula in StatsDirect: The formula will use one or more of these mathematical operators: ^ exponentiatio n (raise to the power of) What is the probability of one of the students having an FEV 1 of 3.57 litres or less?

Upload: ahmadnazib1

Post on 23-Nov-2014

107 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: StatsDirect Instructions 2009

x – 3.57 – 4.06z = = 0.67

Data Analysis Using StatsDirect

A. Investigating probability

1. Open a new StatsDirect workbook

To start StatsDirect go to Start > All Programs > Core > Statistics > StatsDirect2006

When StatsDirect opens, click Cancel.

2. Use the z score to calculate a probability

The mean FEV1 in 57 male medical students is 4.06 litres with a standard deviation of 0.67 litres.

You need to calculate:

In StatsDirect, click Tools > Calculator

Type (3.57-4.06)/0.67 into the Expression to evaluate box. Click Calculate

Then click Save > Close. Click Yes to copy saved results to report.

You need to convert the answer for z into a percentage to find the probability:

Click Analysis > Distributions > Normal

Type the z value you’ve calculated into the Normal deviate (z) box

Click Calculate. The lower tail P value gives the probability of a medical student having an FEV1 of 3.57 litres or less.

Click Save > Close > Select to send the result of the analysis to the report.

Constructing a formula in StatsDirect:

The formula will use one or more of these mathematical operators:

^ exponentiation (raise to the power of)

* multiplication

/ division

+ addition

- subtraction

Question: What is the probability (as a percentage) that the student will have an FEV1 of 4.55 litres or less?

What is the probability of one of the students having an FEV1 of 3.57 litres or less?

Page 2: StatsDirect Instructions 2009

3. Use the z score to calculate the interquartile range of the FEV1 values (the middle half of the values)

First calculate the z value for the lower 25% interquartile range:

Click Analysis > Distributions > Normal

Type 0.25 in the lower tail P value box and click Calculate

Click Save > Close > Select

Now convert the z value into the FEV1 value:

Click Tools > Calculator and type the values for + z* into the Expression to evaluate box, using the z value for the lower 25% and the mean and standard deviation in the equation above.

Remember to save your result to the report.

The result is the FEV1 in litres at the lower end of the interquartile range (ie 25% of the medical students will have this FEV1 or lower).

Now calculate the FEV1 value for the upper 75%:

Click Tools > Calculator and type the values for - lower 25% +

4. Use the z score to find the number of asthmatics with obstructive airway disease

The lung function in a sample of 34 similar aged male asthmatics was investigated. The mean FEV1 was 3.78 litres with a standard deviation of 0.74. The obstructive airway FEV1 is set at 2.85 litres or less.

http://www-users.york.ac.uk/~mb55/intro/refint.htm

a) Use the equation above to work out the proportion of asthmatics with an FEV1 that indicates an obstructive airway in this group.

b) What number of the asthmatics in the group would this represent?

Question: What is the interquartile range for the FEV1 values?

Page 3: StatsDirect Instructions 2009

B Investigating normal distribution and confidence intervals

Here you are going to work with data from your lung function practical class, to investigate height and peak expiratory flow rate. The sex has been coded: male = M; female = F.

Go to Blackboard > Discussions > Lung function practical and download the file Lung function 2009.sdw to My Documents.

1. Plotting the data

In StatsDirect, click on File > Open File > Look in > My Documents > Lung function 2009.sdw. Save the workbook to My Documents.

Plot a histogram for height: click on B at the top of the height column then click on Graphics > Histogram. If you approve the title, click OK or change the title. Click OK on Histogram Bin Setup then Yes to overlay a normal curve, OK on Histogram unless you want to change the axis title. Select output destination.

The data should now look something like this:

Warning!!! If you try to print the data file when any part of the workbook is highlighted this will jam up the printer.

Split the data for height and PEFR by sex: Click Data > Grouping > Split. Select column A (sex) and click OK. To select the data you want to split click on column B (Height). Click OK. You will now be asked if it is 2 groups. Click OK. Put the output in a new sheet. Split the data for PEFR and put it on the same new sheet.

Plot separate histograms for male and female height.

2. Is

there What test should you use to compare the means of male and female PEFR?

Questions:

- Do you think the histograms show that the data for height are normally distributed?

- Are there any outliers? Could these prevent the data showing a normal distribution?

Page 4: StatsDirect Instructions 2009

a difference in male and female PEFR?

Clues: Is the data parametric or non-parametric; which Student’s t test should you use? (Look at the audio lecture in Case 3 for help here.)

Click Analysis and select the correct test then highlight the column containing the PEFR values for females, hold down the control key and highlight the column for male PEFR. Click OK and the test will be carried out. Click Select to send the results to the report.

The report should look like this:

3. Are height and PEFR related?

Mean of PEFR~Sex=F = Mean of PEFR~Sex=M =

Assuming equal variancesCombined standard error = df = t = One sided P Two sided P

95% confidence interval for difference between means =

Power (for 5% significance) >

Assuming unequal variancesCombined standard error = df = t(d) = One sided P Two sided P

95% confidence interval for difference between means =

Power (for 5% significance) >

Comparison of variances

How can you tell if there is a significant difference in male and female PEFR?

The first two lines of the report show the means, followed by 2 alternative outputs of the data:

o Assuming equal varianceo Assuming unequal variance

Choose the output after looking at bottom of the report, Comparison of variances which will show one of those below:

Question:

What does the 95% confidence interval for the difference between means tell you?

Page 5: StatsDirect Instructions 2009

Click on Graphics > Scatter. Click OK to the default value 1.

Click on the male PEFR column for the y axis, click OK

Click on the male height column for the x axis, click OK

In Chart settings, change the title to Male height v PEFR, click OK, and send it to the report.

Your scatter plot should look like this:

Do the same for the female data

4. Correlation and regression

Correlation is a method to establish whether there is a relationship between the two variables (eg height and PEFR).

Regression determines the nature of that relationship.

Click Analysis > Regression and correlation > Simple Linear & Correlation.

Input:º Outcome variable = PEFR columnº Predictor variable = height column

Click Plot Regression, click Plot,to produce the regression line on a scatter plot.

Analysing the results:

Question: Are there any outliers in either male or female samples? What effect could they have on the result?

Page 6: StatsDirect Instructions 2009

Plot the regression for both the male and female data of height v PEFR

5. Calculating the PEFR of a given height on the regression line

In the simple linear regression report for females, click on the button Interpolate x to y, type in the height 162 and click Calculate.

The predicted PEFR will appear in the output.

Repeat this to calculate the predicted PEFR for males of the same height.

Simple linear regression (example)

Equation: PEFR~Sex=M = 4.255232 Height~Sex=M -172.162076

Standard Error of slope = 1.07184295% CI for population value of slope = 2.133586 to 6.376879

Correlation coefficient (r) = 0.337022 (r² = 0.113584)

95% CI for r (Fisher's z transformed) = 0.171569 to 0.483986

t with 123 DF = 3.970017Two sided P = 0.0001Power (for 5% significance) = 97.12%

Correlation coefficient is significantly different from zero

Standard Error of slope measures the slope’s variabilityThe slope measures the relationship between the variables

95% CI shows a linear relationship between the variables unless 95% CI includes a zero (0) = no relationship.

Correlation coefficient (r) always lies between –1 to +1. Zero = no linear relationship, minus = negative relationship.

Question: Why does the analysis calculate a two-sided probability (P) rather than a one-sided P?

Questions: Do the 95% confidence intervals indicate that there is a linear relationship between height and PEFR for the males and females?

What does the correlation coefficient tell us about the type of relationship? Is it a positive or a negative correlation?

Question:

Do the 162cm tall males and females in the lung function practical have identical predicted PEFR?

Page 7: StatsDirect Instructions 2009

6. Are height and PEFR normally distributed?

Use the Shapiro-Wilk test to check for normal distribution

Click Analysis > Parametric > Shapiro-Wilk

Highlight the column, click OK, and click OK to send output to the report.

Check all your samples (male, female, height, PEFR) for evidence of non-normality

C Comparing lung function before and after treatment

1. Does Salbutamol improve the peak expiratory flow rate?

Download and open the Salbutamol 2009.sdw file in >Blackboard > Discussions > Lung function practical.

Split the PEFR before data into the three groups. Do the same for the PEFR after data and put them into the same worksheet.

Choose an appropriate t-test to compare the difference before and after treatment, and use it for each of the three groups. (After highlighting the first column, hold down the control key and click on the second column.)

Question: Did any of the samples show evidence of non-normality?

Shapiro-Wilk cannot tell you if a distribution is normal; it can only indicate that a sample does not have a normal distribution.

Student’s t-test can be used for non-normal distributions

The student’s t-test is very robust and can cope with non-normal distributions for independent, unpaired samples unless there is a significant difference in the variance of the two samples. StatsDirect tests for this and gives an appropriate warning.

Questions: Look at your analysis of the difference in male and female PEFR.

1. Was there a significant difference in the variance?

2. If there was a difference, which test would StatsDirect suggest that you should try instead of the t-test?

3. What is the type of test that you would use if you could not use the t-test?

In your Asthma Treatment practical, the maximum PEFR values were recorded before and after the use of a salbutamol inhaler with or without a spacer, or a placebo.

The three groups have been coded:

- P: Placebo inhaler- I: Salbutamol inhaler- S: Salbutamol inhaler with spacer

Page 8: StatsDirect Instructions 2009

2. Is there any difference in using salbutamol with or without the spacer?

First, you must subtract the before results from the after results:

o Click Data > Apply Function and highlight the after column and then the before column

o Type v1-v2 into the Apply function to data box.

o Do this for all three groups, P, S and I.

Choose an appropriate t-test to compare the difference between salbutamol with spacer and the salbutamol inhaler alone.

Question:

Is there any evidence for a difference between the salbutamol with spacer or salbutamol inhaler alone?

Your results should look something like this:

For differences between PEFR Before~Type=S and PEFR After~Type=S:Mean of differences = -22.777778 (n = 36)Standard deviation = 41.238755Standard error = 6.873126

95% CI = -36.730965 to -8.824591

df = 35t = -3.314035

One sided P = 0.0011Two sided P = 0.0021

Questions:

Is there any evidence that the salbutamol had an effect on PEFR?

What is the effect of using the placebo?