statistics - cengage learning

75
Minitab ® Technology Manual to Accompany Prepared by Linda Myers Harrisburg Area Community College, Harrisburg, PA Australia • Brazil • Mexico • Singapore • United Kingdom • United States Statistics Learning from Data Roxy Peck California Polytechnic State University, San Luis Obispo, CA © Cengage Learning. All rights reserved. No distribution allowed without express authorization.

Upload: others

Post on 09-Feb-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics - Cengage Learning

Minitab® Technology Manual to Accompany

Prepared by

Linda Myers Harrisburg Area Community College, Harrisburg, PA

Australia • Brazil • Mexico • Singapore • United Kingdom • United States

Statistics Learning from Data

Roxy Peck California Polytechnic State University,

San Luis Obispo, CA

© C

enga

ge L

earn

ing.

All

right

s res

erve

d. N

o di

strib

utio

n al

low

ed w

ithou

t exp

ress

aut

horiz

atio

n.

Page 2: Statistics - Cengage Learning

Contents*

Chapter 1 .................................................................................................................................................. 2

Chapter 2 .................................................................................................................................................. 8

Chapter 3 .................................................................................................................................................. 18

Chapter 4 .................................................................................................................................................. 26

Chapter 5 ……………………………………………………………………………………………………………………………………………. 34

Chapter 6 .................................................................................................................................................. 36

Chapter 9 .................................................................................................................................................. 40

Chapter 10 ................................................................................................................................................ 42

Chapter 11 ................................................................................................................................................ 45

Chapter 12 ................................................................................................................................................ 49

Chapter 13 ................................................................................................................................................ 53

Chapter 15 ................................................................................................................................................ 61

Chapter 16 ................................................................................................................................................ 66

* Chapters 7, 8 and 14 have been omitted from this guide since they contain no material relevant to Minitab.

Page 3: Statistics - Cengage Learning

1

PREFACE

This Minitab Manual is a supplement to Statistics: Learning from Data by Roxy Peck. This manual is intended to help students perform the analysis described in that textbook using the Student Version of Minitab 14. There are step-by-step commands and screen shots for each problem. While some of the advanced functionality is restricted in the student version, it is sufficient for the purposes here. There are also Technology notes at the end of each chapter to help you with the commands used. Dr. Linda Myers Harrisburg Area Community College Spring, 2013

Page 4: Statistics - Cengage Learning

2

CHAPTER ONE Getting Started With Minitab OVERVIEW This chapter covers the basic structure and commands of Minitab for Windows Release 14. After reading this chapter you should be able to

1. Start Minitab 2. Identify the Main Menu Bar 3. Enter Data into Minitab 4. Save the Data File 5. Print the Session Window 6. Obtain Online Help 7. Exit Minitab.

Minitab commands and software features are featured in areas where they are appropriate for the specific statistical analysis. 1.1 STARTING MINITAB Minitab is a computer software program initially designed as a system to help in the teaching of statistics, and over the years has evolved into an excellent system for data analysis. The procedure for starting Minitab requires only that you select Start > Programs > Minitab 14 for Windows > Minitab

1.

Page 5: Statistics - Cengage Learning

3

1.2 THE MAIN MENU The main Minitab window contains numerous sub-windows, two of which are shown above called the Worksheet window and the Session window. A third window is the Project Manager. The Project Manager contains folders that allow access to various parts of your project. These folders include Session, History, Graphs, Report Pad, Related Documents and Worksheet folders. Across the top of the Minitab window is the menu bar, from which menus may be opened and from which you choose commands. The Session and Worksheet windows are the most important and the most frequently used windows. The main menu bar contains selections common to most Windows applications and some selections specific to Minitab.

The File command contains options related to opening files, saving files, printing, and exiting Minitab. The Edit command contains options related to deleting, copying, and pasting.

Page 6: Statistics - Cengage Learning

4

The other selections on the menu bar, Data, Calc(ulate), Stat(istics), Graph, Editor and Tools are specific to Minitab. The final two selections on the main menu bar, Window and Help are found in most Windows applications. The Window command enables you to switch among windows, while the Help command enables you to get online help from Minitab. 1.3 ENTERING DATA Minitab’s Worksheet window is like a spreadsheet in that it works with data in rows and columns. Typically, a column contains the data for one variable, with each individual observation in a row. Columns are designated as C1, C2, C3,... and rows are numbered 1, 2, 3, .... The size of the worksheet is limited only by the memory available and the size of the hard drive. There are several ways to enter data into the Minitab Data window. You may read data from a file or type in the data. Reading Data from a File Follow these steps to read data from a file: 1. Start Minitab 2. To open the file, select File > Open Worksheet from the menu. At the completion of this operation, all data in the current worksheet will be replaced with the data in the file.

When you select Open Worksheet the dialog box will open. Minitab allows you to open files from many different software packages. Minitab worksheets use the file extension .mtw and Minitab portable files use the extension .mtp. Choose the location and the type of file you want to open, then select the filename from the list and choose Open to open the file.

Typing Data into the Worksheet Follow these steps to add data to the dataset: 1. Make the Worksheet window the active window. Position the cursor in the Worksheet window in the column and cell where you want the data located. Position the cursor in column 1 row 1.

Page 7: Statistics - Cengage Learning

5

2. Enter the data. 3. Correcting errors: If you enter an incorrect value, highlight the cell, retype the data entry and press{ENTER}. Do not delete the error, just type in the correct value. Deleting the data causes the entire column to move up one line!!

Naming Columns in the Worksheet Columns are generally used for different variables within the dataset. To name a column (variable) in the worksheet, position the cursor in the box at the top of the column above row 1 and below the C# label. Type in the name you want to assign to the column. In version 14 of Minitab, column names may be longer than 8 characters. 1.4 SAVING A FILE There are three basic components in a Minitab session: the worksheet (contained in the Worksheet window), the Session window and graphs. Saving graphs will be covered after you create your first graph. Saving a Worksheet Follow these steps to save a Minitab worksheet for the first time. 1. Choose File > Save Current Worksheet As... 2. Select the drive. Designate the correct drive and path for saving the file in the dialog box, then position the cursor in the box labeled File Name: and type in the filename. Minitab uses the same file naming conventions as Windows. Minitab worksheets use the file extension .mtw and Minitab portable files use the extension .mtp. Designate the drive as A: and enter the filename ex1_12a. Select Save. Saving a Project When you save your work as a project, you save all the information about your work. The contents of every window is saved, including the columns of data in each Worksheet window, the complete text in the Session window and History window, and

Page 8: Statistics - Cengage Learning

6

each Graph window. You will want to save these results if it is necessary to examine the output at a later time or use the output in a document. Follow these steps to save a project. 1. Choose File > Save Project As... 2. Designate the correct drive and path for saving the file in the dialog box, then position the cursor in the box labeled File Name: and type in the filename for this project file. 3. Select drive A: and enter the filename ex1_12a.mpj and choose Save. 1.5 PRINTING THE SESSION WINDOW Follow these steps to print a copy of the Session window. 1. Make the Session window the active window. Click on the title bar of the Session window to make it the active window. 2. Select the correct printer. If necessary, select File > Print Setup... to select the correct printer. After selecting the correct printer select OK. 3. Print the Session window. Select File > Print Worksheet. Choose OK to print the Session window. 1.6 OBTAINING ON-LINE HELP Follow these steps to obtain on-line help. 1. Click with the mouse on Help > Help, to bring up the dialog box.

2. Select the topic. Click on the text ’’Getting Started’’ and ’’Introduction to Minitab’’. The Help window displays basic information on using Minitab. Click on the Close option button on the top right of the Help window.

Page 9: Statistics - Cengage Learning

7

3. Using the Index. Click Index tab, to bring up the dialog box Type ’’Graph Menu’’ in the Type in the keyword to find: textbox. Click on the Display option button. 4. Exiting Help. To return to the Minitab session, Click on the Close option button on the top right of the Help window.

1.7 EXITING MINITAB

To end a MINITAB session and exit the program, choose File from the menu bar and then choose Exit. A dialog box will appear, asking if you want to save the changes made to this worksheet. Click Yes or No. It is also possible to exit MINITAB by clicking the X in the upper right corner of the window.

Page 10: Statistics - Cengage Learning

8

CHAPTER TWO Graphical Methods for Describing Data Distributions OVERVIEW: Graphically representing data is one of the most helpful ways to become acquainted with the sample data. There are several ways to display a picture of the data. These graphical displays help us get acquainted with the data and to begin to get a feel for how the data is distributed and arranged. In attempting to get a pictorial representation of data, we must decide what type of graphic display would best present the data and their distribution. The type of display used depends, in large part, on the type of data and the idea to be presented. In this chapter you will use Minitab to present data graphically. 2.1 Bar graphs 2.2 Dotplots 2.3 Stem-and-leaf displays 2.4 Histograms. 2.5 Scatterplots and Time Series plots. 2.1 BAR GRAPHS A bar chart is a graphical display of categorical data. Each category in the frequency distribution is represented by a bar or rectangle, and the display is constructed so that the area of each bar is proportional to the corresponding frequency or relative frequency. We will use Example 2.4 How Far Is Far Enough from the text to illustrate how to construct a bar chart using Minitab.

1. If you are working from a data file, select File > Open Worksheet. Select the appropriate file. Select Open. If not you will have to enter the data.

2. Choose Graph > Bar Chart. From the Bars represent: drop down list box, select Values from a table. Fill in the dialog boxes as shown and select OK.

Page 11: Statistics - Cengage Learning

9

Page 12: Statistics - Cengage Learning

10

2.2 DOTPLOTS Dotplots are a quick and efficient way to get a preliminary understanding of the distribution of your data. The dotplot display is not available in Excel, but the initial step of ranking the data can be done. We will use the data from Example 2.6 Graduation Rates. Input the data into columns as shown. Follow these steps to construct a dotplot for the graduation rates: 1. Open the worksheet File>Open Worksheet or enter the data by hand. 2. Construct the dotplot by selecting Graph > Dotplot... >Simple as the type of dotplot from the Multiple Y’s dialog

Page 13: Statistics - Cengage Learning

11

choices. Place in the Graph variables: text box. Choose OK. The graph will appear in its own window.

Page 14: Statistics - Cengage Learning

12

2.3 STEM AND LEAF DISPLAY

To illustrate the commands necessary to construct a stem-and-leaf display, let's use the data from Example 2.8 Going Wireless. Follow these steps to construct a basic stem-and-leaf display 1. Open the worksheet by selecting File > Open Worksheet or enter the data by hand. 2. To construct the stem-and-leaf display select Graph > Stem-and-Leaf. Fill in the dialog boxes as shown. Click OK.

Page 15: Statistics - Cengage Learning

13

By changing the increment level we get a nicer graph.

2.4 HISTOGRAMS Histograms are used for large sets of data and don’t work well for small data sets. We expect the histogram of a sample to be similar to that of the population. Histograms are constructed a bit differently, depending on whether the variable of interest is discrete or continuous.) in the interval. We will use Example 2.10 Promiscuous Queen Bees to illustrate how to construct a histogram for a discrete data set. We divide the sample values into many intervals called bins. Bars represent the number of observations falling within each bin (its frequency). 1. Open the worksheet or enter the data by hand. 2. Construct the relative frequency histogram by selecting Graph > Histogram...Select Simple histogram. Click OK. Fill in the dialog boxes as shown and click OK.

Page 16: Statistics - Cengage Learning

14

Page 17: Statistics - Cengage Learning

15

Frequency Distributions for Continuous Numerical Data We will be using Example 2.13 Enrollments at Public Universities to illustrate constructing a histogram for continuous data in Minitab. The program will decide the size of the bins. Enter the data and select Graph > Histogram...Select Simple histogram. Click OK. Fill in the dialog boxes as shown and click OK.

Page 18: Statistics - Cengage Learning

16

2.5 SCATTER PLOTS AND TIME SERIES PLOTS Scatterplots can be created in Excel by highlighting the columns of variables and selecting Insert Scatter Scatter with only Markers. The explanatory variable should be plotted on the x-axis and the response variable should be plotted on the y-axis. The highlighted columns should have the x variable on the left and the y variable on the right. We will use Example 2.17 Worth the Price You Pay to illustrate the process. Follow these steps to construct a scatterplot of the data. 1. Select File >Open Worksheet. Or enter the data by hand. 2. Create the scatterplot by selecting Graph > Scatterplot > Simple scatterplot > OK Fill in the dialog boxes. You may enter titles and format the graph as you choose.

Page 19: Statistics - Cengage Learning

17

Page 20: Statistics - Cengage Learning

18

CHAPTER THREE: Numerical Methods for Describing Data Distributions OVERVIEW: Numerical summaries that indicate where the center of a data set is located are called measures of central tendency. Measures of the center typically include the mean and median. Recall that the mean of a data set is the sum of the data divided by the number of pieces of data, while the median represents the middle value in an ordered data set and divides the data set into two equal parts. Numerical summaries that describe the spread of values about the center are called measures of variability or measures of dispersion. Measures of variability typically include the range and standard deviation. The range represents the difference between the largest (maximum) and smallest (minimum) values in a data set. In this chapter you will be able to use Minitab to: 3.1 Calculate measures of central tendency 3.2 Calculate measures of variability 3.3 Calculate descriptive statistics 3.4 Calculate Quartiles and IQR 3.5 Calculate the Five Number Summary and boxplots 3.1 MEASURES OF CENTRAL TENDENCY We will use the data in Example 3.2 Baseball Salaries to illustrate the commands needed to calculate the mean and median of the data. Enter the data into the spreadsheet and select Calc > Column Statistics

Page 21: Statistics - Cengage Learning

19

Select the column and function you want and click OK. The answer will appear in the session window.

To display the median, select Calc > Column Statistics > Median > OK.

Page 22: Statistics - Cengage Learning

20

3.2 MEASURES OF VARIABILITY Using the same data, we can calculate measures of variability by choosing the appropriate function. To calculate the standard deviation, select Calc > Column Statistics > Standard deviation > OK.

3.3 DESCRIPTIVE STATISTICS An easier way to display all the descriptive statistics is to use the Descriptive Statistics command. We will use Example 3.6 Thirsty Bats to illustrate the commands. Enter the data into the program. Select Stat > Basic Statistics > Display Descriptive Statistics. Click the Statistic button and choose the functions that you want displayed. This will produce descriptive statistics (N, Mean, Median, Standard Deviation, etc.) for each variable or column.

Page 23: Statistics - Cengage Learning

21

The Graphs option provides the option of displaying a histogram, a histogram with a normal curve, a dotplot, a boxplot, or a graphical summary of the variables. 3.4 QUARTILES and IQR Quartiles are a numerical summary that represent a measure of location. The lower quartile (Q1) represents the point such that 25% of the observations are below the point. The median is the second quartile (Q2) and is the point such that 50% of the observations are below the point. The upper quartile (Q3) represents the point such that 75% of the observations are below the point.

Page 24: Statistics - Cengage Learning

22

Let’s look at Example 3.7 Number of Visits to a Class Web Site and determine the numerical summaries for the data set. This time we will include quartiles and the boxplot. Forty students were enrolled in a statistical reasoning course at a California college. The instructor made course materials, grades, and lecture notes available to students on a class web site, and course management software kept track of how often each student accessed any of these web pages. One month after the course began, the instructor requested a report on how many times each student had accessed a class web page. This same data are used in Examples 3.8 and 3.9 Follow these steps to calculate the numerical summaries for the data set: 1. Enter the data by hand or open the worksheet by selecting File > OpenWorksheet. 2. Calculate the numerical summaries by selecting Stat > Basic Statistics > Display Descriptive Statistics. Enter the variable in the dialog box. 3. Click on the Statistics button and select the appropriate functions. Click OK OK.

The answer will appear in the session window and the boxplot in its own window,

Page 25: Statistics - Cengage Learning

23

3.5 THE FIVE NUMBER SUMMARY AND BOXPLOTS Similar procedures can be used to calculate the five number summary and a boxplot of the data. We will use Example 3.13 Video Game Practice Strategies to illustrate the steps. Follow these steps to calculate the numerical summaries for the data set: 1. Enter the data by hand or open the worksheet by selecting File > OpenWorksheet. 2. Calculate the numerical summaries by selecting Stat > Basic Statistics > Display Descriptive Statistics. Enter the variable in the dialog box. 3. Click on the Statistics button and select the appropriate functions. Click OK. 4. Click on the Graph button and check Boxplot. Click OK OK.

Page 26: Statistics - Cengage Learning

24

The results will appear in the session window and the boxplot in its own window.

Page 27: Statistics - Cengage Learning

25

You can also do side by side boxplots by following these steps. 1. Input the raw data for the first group into C1 2. Input the raw data for the second group into C2 3. Continue to input data for each group into a separate column 4. Select Graph > Boxplot... 5. Highlight Simple under Multiple Y’s 6. Click OK 7. Double-click the column names for each column to be graphed to add it to the Graph Variables box 8. Click OK.

Page 28: Statistics - Cengage Learning

26

CHAPTER FOUR Describing Bivariate Numerical Data OVERVIEW: A bivariate data set consists of measurements or observations on two variables, x and y. You can develop models, which express the relationships among various characteristics, to predict a characteristic of interest, called the response or dependent variable. The characteristics used to predict the response variable are called the independent or predictor variables. Minitab will perform simple linear correla-tion(s), linear regression and multiple regression. Both numerical and graphical presentations are available. In this chapter you will use Mintab to 4.1 Obtain a correlation coefficient between two variables 4.2 Fit a least squares regression line 4.3 Assess the fit of a line using a residual plot 4.1 THE CORRELATION COEFFICIENT When investigating the relationship between two numerical variables, looking at a scatterplot of the data is the best place to start. A scatterplot of bivariate numerical data gives a visual impression of the relationship between two variables. In order to make precise statements and draw conclusions from data, we need to go beyond pictures. A correlation coefficient is a quantitative assessment of the strength of a linear relationship between the ordered pairs of data. We will use Example 4.3 Does It Pay to Pay More for a Bike Helmet? Enter the data into the program. Construct the scatterplot by selecting Graph > Scatterplot > Simple scatterplot > OK

Page 29: Statistics - Cengage Learning

27

Select Stat > Basic Statistics > Correlation Enter the variables in textbox > OK

Page 30: Statistics - Cengage Learning

28

The results appear in the session window. 4.2 FIT A LEAST SQUARES REGRESSION LINE The Regression command in Minitab fits a simple linear or polynomial (second or third order) regression model and plots a regression line through the data or the log10 of the data. The fitted line plot shows you how closely the actual data lie to the fitted regression line. In this section, you will obtain a fitted line plot to illustrate how the estimated relationship fits the data in a simple linear regression model. Given two variables x and y, the general objective of regression analysis is to use information about x to make predictions concerning y. The roles played by the two variables are reflected in the terminology: y is referred to as the dependent or response variable, while x is referred to as the independent, predictor, or explanatory variable. We can model the response variable as a linear relationship of the independent variable. The simple linear regression model is a straight line of the form y = a + bx where a is the y-intercept, the point on the y-axis where the straight line crosses the y-axis, and b is the slope, the amount by which y increases when x increases by 1unit. We will use Example 4.6 It May be a Pile of Debris to You, But It Is Home to a Mouse to illustrate the commands. Enter the data into the program. Select Stat > Regression > Regression

Page 31: Statistics - Cengage Learning
Page 32: Statistics - Cengage Learning
Page 33: Statistics - Cengage Learning
Page 34: Statistics - Cengage Learning
Page 35: Statistics - Cengage Learning
Page 36: Statistics - Cengage Learning
Page 37: Statistics - Cengage Learning
Page 38: Statistics - Cengage Learning
Page 39: Statistics - Cengage Learning
Page 40: Statistics - Cengage Learning
Page 41: Statistics - Cengage Learning
Page 42: Statistics - Cengage Learning
Page 43: Statistics - Cengage Learning
Page 44: Statistics - Cengage Learning
Page 45: Statistics - Cengage Learning
Page 46: Statistics - Cengage Learning
Page 47: Statistics - Cengage Learning
Page 48: Statistics - Cengage Learning
Page 49: Statistics - Cengage Learning
Page 50: Statistics - Cengage Learning
Page 51: Statistics - Cengage Learning
Page 52: Statistics - Cengage Learning
Page 53: Statistics - Cengage Learning
Page 54: Statistics - Cengage Learning
Page 55: Statistics - Cengage Learning
Page 56: Statistics - Cengage Learning

54

Select the option button for Samples in different columns. Place Men in the First: text box. Place Women in the Second: text box. Select Options. Accept the 95.0 default value in the Confidence level: textbox. Accept the 0.0 default value in the Test difference: text box. Choose the option of greater than in the Alternative: drop down list box. Click OK. OK.

Page 57: Statistics - Cengage Learning

55

The results appear in the session window.

From the output, the value of the test statistic = 3.11 and df = 15 and the associated P-value is 0.004. Because the P-value (0.004) is less than the selected significance level (0.01), the null hypothesis is rejected. The sample data provide convincing evidence that the mean annual salary for male purchasing managers is greater than the mean annual salary for female purchasing managers. 13.2 PERFORM A HYPOTHESIS TEST ABOUT THE DIFFERENCE BETWEEN TWO POPULATION MEANS USING DEPENDENT SAMPLES When samples are paired, hypotheses about µ1− µ2 are translated into hypotheses about µd , the mean of the population of differences. This test is appropriate when the following conditions are met:

1. The samples are paired. 2. The n sample differences can be viewed as a random sample from a Population of differences (or it is reasonable to regard the sample of differences as representative of the population of differences). 3. The number of sample differences is large (n > 30) or the population distribution of differences is approximately normal.

Example 13.4 Benefits of Ultrasound Revisited is a paired sample t- test. Because the samples are paired, the first thing to do is compute the sample differences. These are the before – after range of motion differences for the seven physical therapy patients in the sample. A negative difference means that the after measurement was larger, so range of motion increased after the ultrasound therapy. The sample data and the computed differences are shown in the accompanying table. Do these data provide evidence that the mean range of motion before ultrasound is less than the mean range of motion after ultrasound?

Page 58: Statistics - Cengage Learning

56

The population characteristics of interest are µ1 = mean range of motion for physical therapy patients before ultrasound µ2 = mean range of motion for physical therapy patients after ultrasound Because the samples are paired, you should also define md : µd = µ1 − µ2 = mean difference in range of motion (before − after) Translating the question of interest into the hypotheses gives

H0: µd = 0 Ha: µd < 0

Enter the data in two columns labeled Before and After. Select Stat > Basic Statistics > Paired t.

Select Samples in columns: option button. Enter Before in the First sample: text box. Enter After in the Second sample: text box. Select Options. Accept the (default) level of 95.0 in the Confidence level: textbox. Accept the (default) level of 0.0 in the Test mean: text box. Accept the (default) option of not equal in the Alternative: dropdown dialog box. Click OK OK.

Page 59: Statistics - Cengage Learning

57

The results appear in the session window.

Because the P-value (0.02) is less than α (0.05), you reject H0. There is convincing evidence that the mean knee range of motion for physical therapy patients before ultrasound is less than the mean range of motion after ultrasound. 13.3 CONSTRUCT A CONFIDENCE INTERVAL TO ESTIMATE THE DIFFERENCE BETWEEN TWO POPULATION MEANS. We can also look at constructing a confidence interval for the difference of two means. We will use Example 13.7 Freshman Year Weight Gain. The researchers studied a random sample of first-year students who lived on campus and a random sample of first-year students who lived off campus. Data on weight gain (in kg) during the first year, consistent with summary quantities given in the paper, are given below. A negative weight gain represents a weight loss. The researchers believed that the mean weight gain of students living on campus was higher than the mean weight gain for students living off campus and were interested in estimating the difference in means for these two groups. The answers to the four key questions are estimation, sample data, one numerical variable (weight gain), and two independently selected samples. This combination of answers suggests using a two-sample t confidence interval. You can now construct a 95% confidence interval for the difference in mean weight gain for students who live on campus and students who live off campus. You want to estimate m1 - m2 = mean difference in weight gain where m1 = mean weight gain for first-year students living on campus and m2 = mean weight gain for first-year students living off campus. Enter the data into the spreadsheet and select Stat > Basic Statistics > 2-Sample t Darken the Samples in different columns: option button. Place On Campus in

Page 60: Statistics - Cengage Learning

58

the First sample: text box. Place Off Campus in the Second sample: text box. Select Options. Accept the (default) level of 95.0 in the Confidence level: textbox. Accept the (default level of 0.0 in the Test difference: text box. Accept the (default) option of not equal in the Alternative: dropdown dialog box. Click OK OK.

The results appear in the session window.

Page 61: Statistics - Cengage Learning

59

In Example 13.8 Benefits of Ultrasound One More Time we will use the data from Example 13.4. The conclusion was that there was convincing evidence that mean knee range of motion for physical therapy patients was greater after ultrasound than before ultrasound. Once you have reached this conclusion, it would also be of interest to estimate the increase in mean range of motion. The answers to the four key questions are estimation, sample data, one numerical variable (range of motion), and two paired samples. This combination leads to considering the paired-samples t confidence interval. The process for estimation problems can be used to construct a 95% confidence interval for the mean difference in range of motion. Enter the data into the spreadsheet and select Stat > Basic Statistics > Paired- t

Darken the Samples in different columns: option button. Place On Campus in the First sample: text box. Place Off Campus in the Second sample: text box. Select Options. Accept the (default) level of 95.0 in the Confidence level: textbox. Accept the (default level of 0.0 in the Test mean: text box. Accept the (default) option of not equal in the Alternative: dropdown dialog box. Click OK OK.

Page 62: Statistics - Cengage Learning

60

The results appear in the session window.

Page 63: Statistics - Cengage Learning

61

CHAPTER FIFTEEN Learning from Categorical Data OVERVIEW: Most of the techniques presented in earlier chapters are designed for numerical data. It is often the case, however, that information is collected on categorical variables such as political affiliation, sex, or college major. As with numerical data, categorical data sets can be univariate (consisting of observations on a single categorical variable), bivariate (observations on two categorical variables), or even multivariate. The use of the chisquare distribution is appropriate when the sample size is large enough for every expected count to be at least 5. If any of the expected counts are less than 5, categories can be combined in a sensible way to create acceptable expected cell counts. If you do this, remember to compute the number of degrees of freedom based on the reduced number of categories. After reading this chapter you should be able to 15.1 Perform the chi-square goodness-of-fit test 15.2 Perform a test for Independence of Two Categorical Variables 15.1 PERFORM THE CHI-SQUARE GOODNESS-OF-FIT TEST The chi-square goodness-of-fit test is a method you should consider when the answers to the four key questions are hypothesis testing, sample data, one categorical variable (with more than two categories), and one sample. Example 15.3 Tasty Dog Food Continued fits these criteria. Using the dog food taste data of Example 15.1 to test the hypothesis that the five different spreads (duck liver pate, Spam, dog food, pork liver pate, and liverwurst) are chosen equally often when people who have tasted all five spreads are asked to identify the one they think is the dog food. Population characteristics of interest p1 = proportion of all people who would choose duck liver pate as the dog food p2 = proportion of all people who would choose Spam as the dog food p3 = proportion of all people who would choose dog food as the dog food p4 = proportion of all people who would choose pork liver pate as the dog food p5 = proportion of all people who would choose liverwurst as the dog food

Page 64: Statistics - Cengage Learning

62

The question of interest (are the five spreads identified equally often as the one thought to be dog food) results in a null hypothesis that specifies that all five category proportions are 0.20. Hypotheses Null hypothesis: H0: p1 = 0.20, p2 = 0.20, p3= 0.20, p4 = 0.20, p5 = 0.20 Alternative hypothesis: Ha: At least one of the population proportions is not 0.20 The student version of Minitab 14 does not have the functionality to produce a

chi-square goodness-of-fit test. Here is the output from the full version of Minitab.

The answer is slightly different from the textbook answer because the Minitab calculation carries more significant digits. Because the p-value is less than the selected significance level, the null hypothesis is rejected. Based on these sample data, there is convincing evidence that the proportion identifying a spread as dog food is not the same for all five spreads. When the purpose of a study is to compare two or more populations or treatments on the basis of a categorical variable, the question of interest is whether the category proportions are the same for all the populations or treatments. The test procedure uses a chi-square statistic to compare the observed counts to those that would be expected if there were no differences among populations or treatments. In Example 15.6 Risky Soccer? the researchers compared collegiate soccer players, athletes in sports other than soccer, and a group of students who were not involved in collegiate sports on the basis of their history of head injuries.

Page 65: Statistics - Cengage Learning

63

Enter the data into the spreadsheet and calculate the expected counts.

Select Stat > Tables > Chi-Square Test (Table in Worksheet) > OK. The results appear in the session window.

Page 66: Statistics - Cengage Learning

64

A quick comparison of the observed and expected cell counts in Table 15.4 reveals some large discrepancies, suggesting that the proportions falling into the head injury categories may not be the same for all three groups. 15.2 TEST FOR INDEPENDENCE OF TWO CATEGORICAL VARIABLES The X 2 test statistic and test procedure can also be used to investigate association between two categorical variables in a single population. When there is an association, knowing the value of one variable provides information about the value of the other variable. When there is no association between two categorical variables, they are said to be independent. In Example 15.10 Stroke Mortality and Education, one of the questions of interest was whether there was an association between survival after a stroke and level of education. Medical records for a random sample of 2,333 residents of Vienna, Austria, who had suffered a stroke were used to classify each individual according to two variables—survival (survived, died) and level of education (no basic education, secondary school graduation, technical training/apprenticed, higher secondary school degree, university graduate). Expected cell counts (computed under the assumption of no association between survival and level of education) appear below the counts in the table. The hypotheses of interest are H0: Survival and level of education are independent. Ha: Survival and level of education are not independent. Enter the data into the spreadsheet and calculate the expected counts.

Select Stat > Tables > Chi-Square Test (Table in Worksheet) > OK.

Page 67: Statistics - Cengage Learning

65

The results appear in the session window.

Because the P-value is greater than 0.01, H0 is not rejected at the 0.01 significance level. There is not sufficient evidence to conclude that an association exists between level of education and survival.

Page 68: Statistics - Cengage Learning
Page 69: Statistics - Cengage Learning
Page 70: Statistics - Cengage Learning
Page 71: Statistics - Cengage Learning
Page 72: Statistics - Cengage Learning
Page 73: Statistics - Cengage Learning
Page 74: Statistics - Cengage Learning
Page 75: Statistics - Cengage Learning

73