chapter 2 variables and their measurementpsy210.faculty.unlv.edu/chapters/02_hurlburt_ch_2.pdf2.1...

Variables and Their Measurement

2.1 Levels of Measurement

2.2 Continuous and Discrete VariablesReal Limits • Significant Figures • Rounding

2.3 SummationNotation • Computations

2.4 ConnectionsCumulative Review • Computers • Homework Tips

Exercises for Chapter 2

Personal Trainer LECTLET 2A: Variables and Their Measurement LAbS: Lab for Chapter 2 DATAGEN: Statistical Computational Package and Data Generator REViEwMASTER 2A RESouRCE 2X: Additional Exercises

Chapter

22

15

ch02.indd 15 6/7/17 4:02 PM

16 Chapter 2 Variables and Their Measurement

variable A characteristic that can take on several or many different values

measurement The procedure for assigning a value to a variable

Personal Trainer

Lectlets

Learning objectives❶ What is a variable?❷ What are the characteristics of the three measurement scales (nominal, ordinal,

and interval/ratio) used to measure variables? Why do we lump interval and ratio variables together?

❸ What is the difference between a continuous and a discrete variable?❹ What are the real limits of a measurement?❺ What are the three parts of the rule about rounding?❻ How is summation notation used to simplify the communication about sums?

A “variable” can take on any of several values. There are three kinds of variables (nominal, ordinal, and interval/ratio) based on their level of measurement.

Most statistical computation involves summing values; the shorthand indication for a sum is ∑.

Recall Pygmalion: Rosenthal and Jacobson (1968) led teachers to believe that the second-graders identified as “bloomers” would show IQ spurts but the other children would not spurt. There are three variables: the pretest IQ, the posttest IQ, and the intel-lectual growth (IQ gain). All three are interval/ratio variables.

Statistics can be thought of as the science of understanding data; data are the results of a series of measurements on one or more variables. Let us be clear about what these terms mean.

A variable is any characteristic of the world that can be measured and that can take on any of several or many different values. For example, height is a variable (defined as the number of inches a person is tall). It takes on the value 70 inches when John is measured and 64 inches when Mary is measured.

Values of a variable may sometimes be assigned arbitrarily; for example, biologi-cal sex is a variable to which we may assign the values 1 for female and 2 for male (or 2 for female and 1 for male, or 27 for female and 136 for male, or any other values we choose).

Measurement is the procedure that assigns values to the variable. For our purposes, it is enough to realize that a measurement rule must provide a unique and unambiguous result for every individual. Thus, in our sex example, although it makes no difference what values are assigned to female and male, the assignment must be the same for all females and for all males. It is not satisfactory measurement to begin by assigning 1 to females and then, halfway through our data collection, change our minds and begin assigning 27.

Click Lectlets and then 2A in the Personal Trainer for an audiovisual discussion of Sec tions 2.1 through 2.4.

variablemeasurementnominalordinalinterval/ratio∑ (SIG∙ma) = “sum of”Xi (EX∙sub∙EYE)∑(2X — 4)2 = “sum of

two X minus four that quantity squared”

DataGen (DAY∙ta∙jen)

ch02.indd 16 6/7/17 4:02 PM

2.1 Levels of Measurement 17

2.1 Levels of MeasurementStatisticians distinguish three kinds of variables and measurement levels: nominal, ordi nal, and interval/ratio. The main characteristics of these kinds of variables are given in Table 2.1.

The nominal scale of measurement classifies objects into categories based on some characteristic of the object. Examples of nominal measurements of people are male/female, Republican/Democrat/Independent/decline to state, fraternity member/nonmember, and Californian/Ohioan/Nebraskan/Alaskan/and so on. In each case, the measurement is the placing of an individual into one of the categories in the measure-ment scale. We require that all measurement operations be mutually exclusive; that is, for example, a person cannot be both a Republican and a Democrat or both an Ohioan and a Nebraskan.

The order of categories in a nominal variable is not important. For example, it doesn’t matter from a measurement standpoint whether we refer to the sex distinction as male/female or female/male. It may be useful to assign a numerical value to a nomi-nal category; we might let male = 1 and female = 2. Doing that does not change the fact that the order is irrelevant; it would have made just as much measurement sense to let male = 2 and female = 1.

The ordinal scale of measurement classifies objects into mutually exclusive cat-egories based on some characteristic of the object (as does the nominal level of measurement), and furthermore it requires that this classification have some inher-ent, logical order. An example of ordinal measurement of people is class standing (freshman/sophomore/junior/senior) because it is both mutually exclusive (you can’t be both a freshman and a sopho more) and ordered (sophomore is more advanced than freshman). Other examples are class rank (first in class/second/third/etc.), grade in course (A/B/C/D/F), and level of depression (not depressed/slightly depressed/moder-ately depressed/severely depressed).

We may assign a numerical value to these categories, and if we do, the order of these categories is important (unlike in nominal variables where order is irrelevant). For example, it does make sense to assign freshman = 1, sophomore = 2, junior = 3, and senior = 4, but it does not make sense to assign sophomore = 1, senior = 2, fresh-man = 3, and junior = 4. Order is inherent in ordinal variables, and the values of the variables must reflect that order.

nominal scale Classification of unordered variables

TABLE 2.1 Characteristics of nominal, ordinal, and interval/ratio levels of measurement

CharacteristicLevel of Measurement

Nominal Ordinal Interval/RatioCategories are mutually exclusive. Yes Yes Yes

Categories have logical order. No Yes Yes

Equal differences in characteristic imply equal differences in value.

No No Yes

ordinal scale Measurement of variables that have an inherent natural order

ch02.indd 17 6/7/17 4:02 PM


The interval/ratio level of measurement classifies objects into mutually exclusive categories based on some characteristic of the object (as do the nominal and ordinal levels of mea surement). It requires that this classification have some inherent, logi-cal order (as does the ordinal level of measurement). Furthermore, it requires that the width of all the categories be the same. An example of an interval/ratio variable is temperature as measured in degrees Celsius (°C). When we require that the intervals be equal, we mean, for example, that the temperature difference between 34°C and 35°C is the same as the difference between 77°C and 78°C. It follows that the distance between nonconsecutive measurements must also be equal if the measured differences are equal. For example, the temperature difference between 34°C and 37°C must be the same as that between 75°C and 78°C because both differences are 3 degrees.

BOX 2.1 RATIO LEvEL Of MEAsuREMENTSome statisticians divide the interval/ratio level of measurement into two separate levels, called the interval level and the ratio level. Their “interval” level is the same as what we have labeled “interval/ratio.”

The ratio level of measurement has all the characteristics of the interval level, and fur thermore it requires that the scale have a true zero point. Examples of ratio measurements are weight (expressed as number of pounds), height (number of inches), and time (number of minutes). A true zero point means that the thing being measured actually vanishes when the scale reads zero. Thus, for example, the vari-able weight measures the heaviness of an individual; when the weight is 0 pounds, there is in fact no weight.

Note that the existence of a true zero point on a scale does not imply the exis-tence of an individual whose measurement is zero. There are no 0-pound humans, for example. The existence of a true zero simply means that if there were an indi-vidual with no weight, then the weight scale would read 0.

Temperature as measured in degrees Celsius is not a ratio scale because 0°C does not mean the absence of heat; 0°C is instead a relatively arbitrary point, the freezing point of pure water at sea level. The true zero of the Celsius scale (the complete absence of heat) is actually –273°C. By contrast, the Kelvin scale of tem-perature has absolute zero temperature as the 0 K point of the scale (thus making the freezing point of water +273 K). The Kelvin scale is therefore a ratio scale.

For most statistical procedures (including all those described in this book), there is no difference between the interval and the ratio levels of measurement. Therefore, we will lump interval and ratio scales together, referring to either as an interval/ratio scale.

We must make the distinction among nominal, ordinal, and interval/ratio vari-ables because statistics that are appropriate for variables measured at one level of measurement may not be appropriate for variables measured at a different level. For example, if Mary is 64 inches tall and John is 70 inches tall, it is reasonable to say that their average height is 67 inches. However, suppose we code political party preference as Democrat = 1, Republican = 2, and Independent = 3 and that Mary is a Democrat and John is Inde pendent. If we average Mary’s and John’s

interval/ratio scale A measurement scale for ordered variables that has equal units of measurement

ratio scale An interval scale of measurement that has a true zero point

Nominal Ordinal Interval Ratio = NOIR as in “film noir”

c

This box may be omitted without loss of continuity.

c

ch02.indd 18 6/7/17 4:02 PM

2.2 Continuous and Discrete Variables 19

party preferences, we get (1 + 3)/2 = 4/2 = 2, which might lead us to the totally unreasonable conclusion that Mary and John’s average political party preference is Republican. This unreasonableness arises because the average is an appropriate statistic for interval/ratio variables such as height but not for nominal variables such as political preference. We will return to this discussion in Chapter 4.

2.2 Continuous and Discrete VariablesVariables at the ordinal or interval/ratio level of measurement can be classified as either continuous or discrete (all nominal variables are discrete). A continuous variable is one that has an infinite number of possible values between any two adjacent scale values. For example, height is a continuous variable because there is no limit to the refinement we can make in the measurement operation. If we measure more and more precisely, we can ascer tain that Mary’s height is 64.4 inches, or 64.37 inches, or 64.372 inches, or 64.3719 inches.

A discrete variable is one that has no possible intermediate values between two adja cent points. For example, the number of children a woman has is a discrete vari-able, with possible values of only whole numbers, not 1.5 or 2.993.

To restate: If, for any two values of a variable, you can imagine a meaningful interme diate value, then the variable is continuous. Otherwise, the variable is discrete.

Real LimitsWhen we say that Mary is 64 inches tall, we do not in general mean that she is exactly 64 inches tall; she could actually be 63.8 inches or 64.372 inches tall. All measure-ments made on continuous variables are only approximations because of the (theo-retically) infinite number of possible measured values. When we say that Mary is 64 inches tall, we really mean (if we are to be precise) that Mary is taller than 63.5 inches and shorter than 64.5 inches. We sometimes say, then, that Mary’s height is 64 ± .5 inches. The two values 63.5 and 64.5 are referred to as the real limits of the measured value of 64 inches. The real limits are the points that are half the measuring unit above and below the measured value. For example, if Mary’s time in the 100-yard dash is recorded as 10.6 seconds, the real limits of that measurement are 10.55 and 10.65 sec-onds, half the measuring unit (.1 second) below and above the measured value of 10.6 seconds. Equivalently, we can say that Mary’s time is 10.6 ± .05 seconds.

Significant FiguresStatistical computations on continuous variables often result in values that have more digits than the numbers in the original data. For example, if Mary is 60 inches tall and both Natasha and Carla are 67 inches tall, then the average height of these three women is (60 + 67 + 67)/3 = 64.66666666 . . . inches. Two questions should be answered about such a result: (1) How many of those digits should we report if that average height is to be the final result of our computations? (2) How many digits should we carry if that average is a subcomputation whose result will not be reported directly but will be used in some later computation?

continuous variable One with an infinite number of values between adjacentscale values

discrete variable One with no possible intermediate values between two adjacentpoints

real limits of a measurement The points that are half the measuring unit above and below the measured value

ch02.indd 19 6/7/17 4:02 PM


In the behavioral sciences, there is no generally accepted answer to either of these questions. I (and many others) have these two suggestions:

1. Report in the final answer two more significant figures than were reported in the original data. Thus, because our height data were originally reported as whole numbers, our final answer should be reported as a value with two deci-mal places—that is, as 64.67 inches. If the original data had been reported in tenths, we would report a final answer with three decimal places, and so on.

2. Carry as intermediate subcomputations at least three more significant figures than were originally reported. Thus, the average height used as an intermedi-ate value should be carried forward as 64.667 inches. It is simpler (and often better) to carry forward as many figures as your calculator will hold and round only when you report the final answer.

RoundingNow that we have determined how many digits to report, we must decide how to deter-mine the value of the final digit. Why, for example, did we write the final value of the average height as 64.67 and not as 64.66 inches?

The procedure for determining the final digit is called rounding and is generally accepted among all statisticians. The rule for rounding has three parts:

T Part 1 (remainder < 5). If the remainder (the digits beyond the last digit to be reported) is less than 500 . . . , simply discard the remainder. For example, if we wish to round 8.347 pounds to one decimal place, the remainder is 47, which is smaller than 500 . . . , so we discard the remainder and round 8.347 to 8.3 pounds.

T Part 2 (remainder > 5). If the remainder is greater than 500 . . . , add 1 to the last digit before the remainder and then discard the remainder. For example, if we wish to round 4.86524 miles to one decimal place, the remainder is 6524, which is larger than 500 . . . , so we increase the 8 by 1 and discard the remain-der, thus rounding 4.86524 to 4.9 miles.

T Part 3 (remainder = 5). If the remainder is exactly 500 . . . (that is, exactly halfway between two candidates for the final digit), then use the even final digit and discard the remainder. For example, if we wish to round 7.850 seconds to one decimal place, the remainder is 50, which is exactly equal to 500. . . . Thus, 7.850 is exactly halfway between 7.8 and 7.9, so we select the even final digit, 8. Thus 7.850 rounds down to 7.8 seconds (not up to 7.9 seconds as perhaps you learned in math class). Here’s another example of rounding rule part 3: if we wish to round 2.35 gallons to one decimal place, the remainder is 5, which is equal to 500. . . . 2.35 is exactly halfway between 2.3 and 2.4, so we round to the even final digit, 4. Thus, 2.35 rounds up to 2.4 gallons.

Some students find it easy to remember the following expression, which states part 3 of the rule in a different but equivalent way: “Even—leave it. Odd—up.” For example, suppose we wish to round 72.3500 feet to one decimal place. The remain-der is “500”—that is, exactly 5—so part 3 applies. Because the digit just before the

Rounding rule parts

Part 3 may not be what you learned in math class. If the remainder is 500, statisticians round down half the time and round up half the time.

c

Frequent mistake: rounding 3.4501 to 3.4. 3.4501 rounds up to 3.5 (part 3 does not apply because the remainder is not exactly 500 . . .).

c

rounding The procedure for abbreviating a number

ch02.indd 20 6/7/17 4:02 PM

2.2 Continuous and Discrete Variables 21

remainder is “3,” which is odd, we follow “Odd—up” and round up to 72.4 feet. On the other hand, we may wish to round 81.65 grams to one decimal place. The remainder is “5” and the digit just before the remainder is “6,” which is even, so we “Even—leave it.” We leave the 6 as it is and round to 81.6 grams.

Part 3 of the rule for rounding is necessary to prevent a systematic bias from enter-ing our data as a result of rounding. See Box 2.2 for an explanation.

BOX 2.2 HOw pART 3 Of THE ROuNdINg RuLE pREvENTs BIAsSuppose your friend Izzy Buyust tells you that part 3 of the rounding rule is too complicated: Why not just round all numbers ending in the digit 5 upward? Here’s how you can show Izzy that his procedure causes a statistical bias in his results. Add up the 100 numbers 0.1, 0.2, 0.3, . . . , 9.7, 9.8, 9.9, and 10.0. The sum is exactly 505. Now ask Izzy first to round each number using his procedure and then to sum the rounded values. His sum will be 510, not 505. When you round all numbers correctly (using part 3 of the rounding rule when necessary), your sum will be 505 as desired.

Why the discrepancy? When you and Izzy round, you both leave 10 numbers unchanged (1.0 rounds to 1, 2.0 rounds to 2, etc.). Izzy rounds 50 numbers upward (all those that end in .5, .6, .7, .8, and .9) but only 40 numbers downward (those that end in .1, .2, .3, and .4), and therefore his procedure biases the sum upward (to 510 instead of 505). You, on the other hand, use part 3 of the rounding rule, so you round 45 values upward (half of those that end in .5 as well as all those that end in .6, .7, .8, and .9) and 45 values downward (half of those that end in .5 as well as all those that end in .1, .2, .3, and .4), and your sum comes out to be 505. Thus, part 3 of the rounding rule lets you avoid Izzy’s bias.

Table 2.2 gives more examples of the rounding process.

TABLE 2.2 Examples of rounding

decimal places Rounding Ruledesired Original value Remainder Rounded value part

1 712.31 1 712.3 12 3.697 7 3.70 21 5.350 50 5.4 30 247.499 499 247 1 (not 3!)1 84.25 5 84.2 31 84.2501 501 84.3 22 .005 5 .00 30 800.501 501 801 2 (not 3!)

Note that the last column gives the rounding rule

c

ch02.indd 21 6/7/17 4:02 PM


2.3 SummationNotationIt is often convenient to assign a symbol to take the place of the name of a variable. Most often we call the variable X; if there are two variables, we call them X and Y; and generally we call a third variable Z. When it is convenient, we use other symbols; for example, we might call height H instead of X.

A data set is a collection of measurements on one or more variables. It is often con venient to assign an index variable, usually i, that refers to a particular measure-ment’s position in a data set. Table 2.3 shows the measurements on the variable height for five people. When we let i = 1, we refer to the first person, John. When we let i = 4, we refer to the fourth person, Carla. The order of data in a data set is usually arbitrary; we could arrange the people in alphabetical order, in which case i = 1 would refer to Carla. For sample data, we usually refer to the last value of i as n. Thus, n is the number of observations in the sample; n = 5 for the data in Table 2.3.

We often put a subscript on the symbol for the variable when we wish to refer to a specific data point rather than the variable in general. Thus, X1 (pronounced EX∙sub∙WON or just EX∙WON) refers to the first value of X; in Table 2.3 (assuming we are using X to denote height), X1 = 70 inches, John’s height, and X4 = 65 inches, Carla’s height. In general, Xi (pronounced EX∙sub∙EYE) refers to the ith value of X.

Many of the computations in statistics involve summing all the entries in a data set. To make communication about such sums more compact, statisticians have developed sum mation notation. For purposes of this discussion, we will assume that we have made two measurements, X and Y, on each of six participants, as shown in Table 2.4. We indicate the sum of all the values of the variable X as ∑Xi, which we read as “the sum of X sub i.”1 The character ∑ (pronounced “SIG∙ma”) is the Greek capital S (for “sum”; ∑ looks like an E to some students, but it’s really an S). For the data of Table 2.4, we can expand the indicated sum as follows:

X X X X X X Xi∑ = + + + + +

= + + + + +=

1 2 3 4 5 6

7 2 4 2 3 6

24

(2.1)

In cases where it is unambiguous, we may drop the i subscript, understanding that ∑X means ∑Xi. Occasionally, again when the situation is unambiguous, we use three dots (called “ellipses”) to indicate a sum; for example, we may write Equation (2.1) as ∑Xi = X1 + X2 + … + X6, where the ellipses (…) stand for the missing X3 + X4 + X5.

ComputationsThe calculations required in statistics (and throughout this book) often require us to perform some calculations on the variable before we perform the summation. Suppose, for the data in Table 2.4, we wish to compute ∑X2. That expression says, “Sum all the

TABLE 2.3 Heights of five people

person IndexHeight

(in.)John 1 70Mary 2 64Natasha 3 65Carla 4 65Sam 5 68

Xi The ith value of X

∑ Symbol meaning “the sum of ”

TABLE 2.4 Measurements on variables X and Y

i Xi Yi

1 7 52 2 123 4 44 2 105 3 96 6 9

1 Our summation notation ∑Xi is a simplification of the more complete notation sometimes used by statisti-cians ∑n

i = 1 Xi, which is read the sum of Xi for all values of i from i = 1 to n.” Because in this book sums will always be taken over all the values of the variable (that is, in this book sums always run from i = l to n), we can use the simpler notation without ambiguity.

Sum of values of a variable

Xi (EX∙sub∙EYE)∑ (SIG∙ma) = “sum of”ith (EYEth)

ch02.indd 22 6/7/17 4:02 PM

2.3 Summation 23

values of X2.” Note carefully that ∑X2 does not say “Sum all the values of X and then square the sum” (that would be [∑X ]2, which a very different thing). Thus, the first principle of summation is to be careful to observe what is being summed.

The computation of ∑X2 is shown in Table 2.5. Note that we put the original val-ues of X in the first column and then create a new column to hold the X2 values. The required sum is then obtained simply by adding down the new column: ∑X2 = 118.

Calculations may be even more complex. For example, ∑3X2 is obtained by first squaring the X values, as shown in the second column of Table 2.6, and then multi-plying each squared X value by 3, as shown in the third column. Then ∑3X2 = 354 is obtained by simply adding down that last column.

See Box 2.3 for a note about the organization of computations.

BOX 2.3 A NOTE REgARdINg COMpuTATIONsThe time you spend solving statistics problems can be divided into two parts: time spent performing the computations and time spent finding errors that you made in those computations. Many students tend to overlook the impact that error finding has on their total study time. There are two ways to minimize the error-finding time: Work carefully and without distraction in the first place, and organize your work so that errors are easy to locate. Plan on making errors! To err is human! Failing to plan for errors is also human but not very smart!

Organize your computations so that you are performing only one operation at a time. For example, in computing ∑3X2 for the values of X in Table 2.6, you might be tempted to square the numbers and multiply them in the same step, going through a mental process such as: “7 squared is 49 times 3 is 147, 2 squared is 4 times 3 is 12, 4 squared is 16 times 3 is 48, . . . .” That seems to be more effi-cient because it eliminates writing down a column of values (the second column of Table 2.6); however, alternating back and forth between squaring and multiplying substantially increases the likelihood of making a mistake. It is much better to per-form all the squarings first and then do all the multiplications.

You may be saying to yourself, “I can square and multiply in my head; I don’t need that extra step!” and you may be right most of the time. But such a statement ignores the fact that if you do make a mistake, the mistake will be much harder to find when you have combined two computational steps.

I highly recommend that computations be performed in this manner: (1) List the data in a column, (2) create a new column for every subcomputation, and (3) per-form a summation only by adding directly down one column.

Thus, to compute the sum ∑3X2, we (1) list the original data as shown in the first column of Table 2.6, (2) create a new column with the heading “X2” and perform the squarings, and another new column with the heading “3X2” and perform the multipli-cations, and then (3) obtain the sum by simply adding down that last column.

One operation per column leads to the lowest probability of making a mistake.

Let’s consider another example. Suppose we wish to evaluate the sum ∑(2X – 4)2 for these values of X: 7, 2, 4, 2, 3, and 6. We start by listing the data, as shown in the first col umn of Table 2.7. Then, to follow the suggestion in Box 2.3 that we perform

TABLE 2.5 Computing ∑X2

X X2

7 (7)2 = 492 (2)2 = 44 162 43 96 36

∑X2 = 118

TABLE 2.6 Computing ∑3X2

X X2 3X2

7 49 3(49) = 1472 4 13(4) = 124 16 482 4 123 9 276 36 108

∑3X2 = 354

∑(2X – 4)2 = “sum of two X minus four that quantity squared”

ch02.indd 23 6/7/17 4:02 PM


only one op eration at a time, we have to decide which operation to perform first. All calculations must follow the PEMDAS order-of-operations rules of basic algebra: First perform all operations inside parentheses; then perform exponentiation, then multiplication or division, and only then addition or subtraction. Therefore, we must begin with the expression inside the Parentheses, 2X – 4. In that expression, we must perform the Multiplication first, so we form a new column with the heading “2X” and do all the multiplications. Then we can do the Subtractions, creating a new column with the heading “2X – 4.” That finishes the expression inside the parentheses, so now we perform the Exponentiation outside the parentheses by creating a new column with the heading “(2X – 4)2.” That is the expression we wish to sum, so we add down that column to obtain ∑(2X – 4)2 = 184.

Note that ∑(2X – 4)2 is not equal to [∑(2X – 4)]2; that is, we do not obtain ∑(2X – 4)2 simply by adding down the third column of Table 2.7 and then squaring that result. Adding down the third column gives ∑(2X – 4) = 24, and squaring that result gives [∑(2X – 4)]2 = [24]2 = 576, which is not equal to ∑(2X – 4)2 = 184.

Occasionally we perform computations with two variables at a time. For exam-ple, ∑XY is obtained by first listing the values of X and Y as shown in Table 2.8. Then we create a new column headed “XY” and find ∑XY = 176 by simply adding down that column.

Note that ∑XY is not equal to ∑X ∑Y. This can be verified by noting that ∑X = 24, as might be obtained by adding down the first column of Table 2.8, and ∑Y = 49, as might be obtained by adding down the second column. ∑X ∑Y is thus 24(49) = 1176, which is not equal to ∑XY = 176.

We will use summation notation in all chapters in this textbook, and it is crucial that you become confidently skilled in performing such computations.

Click Algebra and then Summation in the Personal Trainer for practice in sum-mation notation following the suggestion in Box 2.3.

TABLE 2.7 Computing ∑(2X − 4)2

X 2X 2X–4 (2X–4)2

7 14 10 1002 4 0 04 8 4 162 4 0 03 6 2 46 12 8 64

∑(2X – 4)2 = 184

TABLE 2.8 Computing ∑XY

X Y XY7 5 352 12 244 4 162 10 203 9 276 9 54

∑XY = 176

Personal Trainer

Algebra

Connections2.4

Cumulative ReviewChapter 1 discussed statistics in general and described the Pygmalion in the classroom experiment that we will refer to frequently throughout this textbook. It also distinguished between populations (all the members of the group under consideration) and samples (some subset of that group), and between parameters (characteristics of populations) and statistics (characteristics of samples). It also defined the probability of an event as the number of outcomes favorable to that event divided by the total number of outcomes.

ComputersMany computer programs are available that perform statistical computations—far too many to be covered in any single book. This textbook provides examples from three such programs: Personal Trainer’s DataGen, IBM SPSS Statistics Version 24, and Microsoft Excel (2010 for Windows or 2011 for Macintosh).

When you have built up a column heading that matches the required sum, you can sum.

c

ch02.indd 24 6/7/17 4:02 PM

2.4 Connections 25

Personal Trainer’s DataGen, which I wrote, is designed especially for the student who is seeking to comprehend statistics. It allows easy access to all the computations typically encountered in introductory statistics so that the student can do as much or as little of a computation by hand as is desired. It is remarkably easy to use: Most students have no trouble getting it to do what they want within the first 5 minutes of operation. All statistics are presented auto matically; there are no “statistics” menus to select or buttons to press. DataGen is thus a program specialized for the acquisition of statistical comprehension by introductory statistics students. Because of this special-ization, however, DataGen is by no means a comprehensive program and thus does not perform many important (but more advanced) statistical analyses. DataGen operates in the Excel environment; that is, your computer must have Excel (Windows 2007 or later, Macintosh 2011 or later) installed for DataGen to operate.

IBM SPSS Statistics is a statistical package designed to analyze a wide variety of statistical data. There are many different versions of SPSS; the one we will use for examples is Version 24. The instructions for SPSS presented here apply to both the Windows and the Macintosh versions of SPSS. SPSS is extremely versatile, capa ble of performing many statistical tasks that go far beyond the needs of the introductory statistics student. The price for this versatility is that it is somewhat more difficult to use than DataGen.

Microsoft Excel is a versatile spreadsheet that can be made to perform many sta-tistical functions (not as many as SPSS, more than DataGen). It is a widely used data management tool in industry and universities. Like SPSS, its versatility makes it some-what more difficult to use than DataGen to explore statistical relationships.

DataGen is thus statistical explorational software; SPSS and Excel are statistical computational software. Exploration and computation overlap, but DataGen makes the exploration and comprehension of the basics of statistics transparent. It is much easier for the student to see what is hap pening in a particular analysis. When the student moves beyond introductory statistics, however, DataGen’s limitations become more apparent—and then SPSS, Excel, or another complete package is the better choice. However, I have found that students who start with DataGen usually find that the later transition to SPSS or Excel (or other complete statistics package) is easy.

Both the exploration/comprehension and the tool-acquisition arguments are persua sive, so this textbook will provide examples from DataGen, SPSS, and Excel.

Click ESTAT and then DataGen in the Personal Trainer. Then use DataGen to compute ∑X and ∑Y for the data in Table 2.4:

1. In the Personal Trainer, click DataGen (Win) if you are a Windows user or DataGen (Mac) if you are a Macintosh user. DataGen (Win) requires that you have Microsoft Office Excel 2007 or later installed on your Windows computer. DataGen (Mac) requires that you have Microsoft Office Excel 2011 or later installed on your Macintosh computer. If you have a choice, use the Windows version of DataGen because Macintosh Excel does not allow multi-ple windows to be open at the same time. That forces Macintosh users to close one window to look inside another, a substantial inconvenience in DataGen.

2. If Excel presents a message that says “Macros have been disabled” or some-thing similar, click Enable Content or Enable Macros or whatever similar option is presented.

Personal Trainer

DataGen

Your instructor may suggest using either DataGen, SPSS, or Excel.

c

ch02.indd 25 6/7/17 4:02 PM


3. Enter the data from Table 2.4 into the DataGen spreadsheet. To enter values going down a column, press Enter after each value. To enter values going across a row, press Tab after each value. You may also navigate with the arrow keys or the mouse. You do not need to enter the i values because ESTAT auto-matically provides them. Enter the six X values into the Variable 1 column; enter the Y values into Variable 2.

4. If you like, you may change the headings: Click anywhere in the Variable 1 column. Click Rename variable; enter X into the cell provided; click OK. Also change the name of Variable 2 to Y.

5. Note that the sum of each column appears automatically in the DataGen Descrip tive Statistics window on the line labeled “Sum of observations.”

Now use DataGen to compute ∑XY:

6. Highlight the first six cells of the Variable 3 column. Then click Edit variable.

7. Next to the Product button check the boxes marked 1 and 2. Then click Product.

8. Note that the Variable 3 column now holds the product of X and Y. Note also that the sum of these products (176) now automatically appears in the DataGen Descriptive Statistics window. (Macintosh users will have to close the Edit Variable window and then click Show Descriptive Statistics Window.)

You can vary steps 6 and 7 to find any sum or product of variables.

Use SPSS to compute ∑X and ∑Y:

1. Start SPSS by clicking Start > All Programs > IBM SPSS > IBM SPSS Statistics 24.

2. Click the Type in Data button and then OK.

3. Enter the data from Table 2.4 into the SPSS spreadsheet. To enter values going down a column, press Enter or Tab after each value. To enter values going across a row, press after each value. You do not need to enter the i values. Just enter two columns of data (X and Y).

4. Ask SPSS to show sums by clicking Analyze, then Descriptive Statistics, and then Frequencies... .

5. Request VAR00001 to be displayed by clicking . Then click VAR00002 and click .

6. Click Statistics... , check the Sum box, click Continue, and click OK.

7. Figure 2.1 shows the output, with my annotations, that will appear in the Outputl–IBM SPSS Statistics Viewer window; the sums are in the table called “Statistics.” You should see 24.00 as the sum of VAR00001 and 49.00 as the sum of VAR00002.

Now use SPSS to compute ∑XY:

8. Return to the Data Editor by clicking the Go To Data button (looks like a grid with a big star) on the button bar near the top of the Output1 screen.

SPSS

S

%a

ch02.indd 26 6/7/17 4:02 PM

2.4 Connections 27

9. Click var in the third column.

10. Click Transform and then click Compute Variable... .

11. Create a new variable label: Type “product” in the “Target Variable” cell.

12. Request the multiplication: Click , click *, click VAR00002, click , and then click OK.

13. Note that a new column is created that holds the product of X and Y. The val-ues should be 35.00 (= 7.00 × 5.00), 24.00, 16.00, 20.00, 27.00, and 54.00

Quite a few steps are involved in asking SPSS to perform this simple task. Fortunately, the effort expended here will make subsequent tasks easier. You can vary steps 9 and 10 to ask SPSS to create and evaluate any mathematical expression—an example of the versatility of SPSS.

Use Excel to compute ∑X and ∑Y.

1. Start Excel by clicking Start > All Programs > Microsoft Office > Microsoft Excel 2010. The instructions for Excel 2011 or later are the same.

2. Enter the data from Table 2.4 into the Excel spreadsheet, starting in the first row of the first column. To enter values going down a column, press Enter or

after each value. To enter values going across a row, press Tab or after each value. You do not need to enter the i values. Just enter two columns of data (X and Y).

3. Highlight the six values in column A. Click the Formulas tab; then click ∑ AutoSum in the ribbon at the top of the Excel window. Note that the sum (24) appears in cell A7.

4. Highlight the six values in column B. Click ∑ AutoSum in the ribbon at the top of the Excel window. Note that the sum (49) appears in cell B7.

Now use Excel to compute ∑XY.

5. Click on spreadsheet cell C1. Type =A1*B1. Don’t forget to type the equal sign at the beginning of this entry. Then press Enter. Note that the value 35 (= 7 × 5) appears in cell C1.

Excel

H0: m1 = m2

X

Statistics

VAR00001

Check that these are correct

SX SY

VAR00002

N Valid 6 6

Missing 0 0

Sum 24.00 49.00

fIguRE 2.1 Sample SPSS output: Descriptive statistics from Table 2.4

ch02.indd 27 6/7/17 4:02 PM


6. Click on spreadsheet cell C1. Grab the fill handle ( , the little square box that appears at the lower right corner of the C1 cell; when you point to it, the square becomes a cross) and drag it down until the first six cells in column C are highlighted. Then release the handle. Note that you have filled cells C1 through C6 with the formula that you typed in cell C1. For example, click cell C2; note in the text entry area just above the spreadsheet that =A2*B2 appears. Excel has applied the formula you typed in cell C1, which you wrote to apply to the values in the first row, to the values in the second row.

7. Highlight the six values in column C. Click ∑ AutoSum in the ribbon at the top of the Excel window. Note that the sum (176) appears in cell C7. That is ∑XY.

Homework Tips1. Check the list of learning objectives at the beginning of this chapter. Do you under-

stand each one?

2. The hint given in Box 2.3 will in the long run save you a lot of computational time, and I strongly recommend it even if it seems at first like a waste of time. Almost all students are surprised and frustrated by how many computational errors they make. This procedure will help you avoid those costly errors.

3. Remember to include at least three extra significant figures in subcomputations.

4. Be perfectly clear that [∑(X – 10)]2 is not the same as ∑(X – 10)2. If we follow the advice of Box 2.3, computing [∑(X – 10)]2 requires the creation of one new column headed “X – 10.” By contrast, computing ∑(X – 10)2 requires the creation of two new columns, the first headed “X – 10” and the second headed “(X – 10)2.” For practice, click Algebra and then Summation in the Personal Trainer.

Click ReviewMaster and then Chapter 2 in the Personal Trainer for an electronic interactive review of the concepts in Chapter 2.

Click Labs and then Chapter 2 in the Personal Trainer for interactive practice of the skills in Chapter 2 and a quiz to test your understanding.

Personal Trainer

ReviewMaster

Personal Trainer

Labs

This is the most pervasive computational mistake in introductory statistics.

c

ch02.indd 28 6/7/17 4:02 PM

Chapter 2 Exercises 29

CHAPTER 2 E xErc i s EsSection A: Basic Exercises (Answers in Appendix D, page 575)

1. (a) Suppose I am interested in the effect of family size on life satisfaction. I code family size as 0 if four or fewer individuals live in the house-hold and 1 if more than four individuals live in the household. Does such a coding constitute a measurement of family size? If not, why not?

(b) Halfway through my study, I find that I am get-ting too many 0 family sizes, so I change my coding. I leave my original codings unchanged, but I now code family size as 0 if three or fewer individuals live in the household and 1 if more than three live in the household. Does such a procedure constitute measurement? If not, why not?

ExErcisE i workEd out(a) Measurement is a procedure that assigns a unique

and unambiguous value to each participant. This coding does that, so the answer is yes.

(b) A measurement operation must be the same for all participants, but here four-member households are treated inconsistently. Therefore the answer is no.

2. Ray Shoscale wants to know the level of measure-ment (nominal, ordinal, or interval/ratio) for the following variables. Help Ray out.

(a) Weights of people (measured in pounds) (b) The Likert scale (measured as 1 = strongly dis-

agree; 2 = disagree; 3 = neutral; 4 = agree; and 5 = strongly agree)

(c) Basketball players’ uniform numbers (0, 1, 2, . . . , 55)

(d) Clothing sizes (small, medium, large, extra large)

(e) Shades of nail polish (21 = “shameless rose,” 25 = “pink in the afternoon,” 33 = “orchid,” 41 = “berry rich,” 88 = “nouveau nude,” etc.)

(f) Volumes of bottles (measured in fluid ounces)

3. Which of the following variables are discrete and which are continuous?

(a) Distance (in miles) from one city to another

(b) Number of bar presses a rat makes (c) Number of students in your statistics class (d) Fuel consumption (miles per gallon)

4. Consider the sample data given in the table.

i X1 62 13 94 –55 66 07 –3

(a) What value does X4 have? (b) What is n? (c) Compute ∑X. (d) Create a column headed “X2” as described in

Box 2.3 and compute ∑X2. (e) Compute (∑X)2. (f) For these data, does ∑X2 = (∑X)2? (g) Create a column headed “X + 10” and compute

∑(X + 10). (h) For these data, does ∑(X + 10) = ∑X + 10? (i) Create a column headed “X2 – 5” and compute

∑(X2 – 5).

5. For the data of Exercise 4, consider the sum 3 6

4

2Xi −

∑ .

(a) How many columns are required for this computa tion, following the suggestions of Box 2.3? What are their headings?

(b) Compute the sum.

6. Use the data in the accompanying table. (a) Find X3 – Y3. (b) Compute ∑XY.

X Y3 24 47 13 11

ch02.indd 29 6/7/17 4:02 PM


7. For the data of Exercise 6, consider the sum

2 2( )X Y

X Y

−+

∑ .



8. For the data of Exercise 6, demonstrate each expres sion.

(a) ∑X2 ≠ (∑X )2

(b) ∑(X + 5) ≠ ∑X + 5 (c) ∑(X + 5)2 ≠ (∑X + 5)2

(d) ∑XY ≠ ∑X ∑Y

9. (a) For the data of Exercise 6, how many decimal places should you use in reporting the value of (∑X )/3 as a final answer?

(b) What is that value? Be sure to round correctly.

10. Round each value to two decimal places. (a) 14.637 (f) 3.315 (b) –14.637 (g) 3.325 (c) 2.152 (h) 3.335 (d) 3.40500 (i) 27.43475 (e) 3.40501

Section B: Supplementary Exercises 11. Use the sample data in the table.

i Xi

1 42 –53 24 –85 216 37 138 6

(a) What value does X2 have? (b) What is n? (c) Compute ∑X . (d) Create a column headed “X2” as described in

Box 2.3 and compute ∑X2. (e) Compute (∑X )2. (f) For these data, does ∑X2 = (∑X )2? (g) Create a column headed “X – 3” and compute

∑(X – 3).

(h) For these data, does ∑(X – 3) = ∑X – 3? (i) Create a column headed “X2 – 9” and compute

∑(X2 – 9).

12. For the data of Exercise 11, consider the sum ( )( )X Xi i− +

∑ 3 2 4

2.



13. Consider the data in the accompanying table. (a) Find X4 – Y4. (b) Compute ∑XY.

X Y 1.4 5 4.1 –2–3.8 1–6.0 –10 2.2 2

14. For the data of Exercise 13, consider the sum

∑ − ++

X Y X Y

X Y

(2 )( 2 ).



15. For the data of Exercise 13, demonstrate each expres sion.

(a) ∑X2 ≠ (∑X )2

(b) ∑(X + 5) ≠ ∑X + 5 (c) ∑(X + 5)2 ≠ (∑X + 5)2

(d) ∑XY ≠ ∑X ∑Y

16. (a) For the data of Exercise 13, how many decimal places should you use in reporting the value of (∑X )/7 as the final answer?

(b) What is that value? Be sure to round correctly.

17. Round each value to one decimal place. (a) 13.55 (f) 3.14159 (b) 13.65000 (g) .02 (c) –4.1472 (h) 6.66 (d) –3.5001 (i) 12.5 (e) –3.6001

ch02.indd 30 6/7/17 4:02 PM

Chapter 2 Exercises 31

Section C: Cumulative Review (Answers in Appendix D, page 576)

18. Suppose you take the numbers 1 through 100 and write each one on a poker chip; then you put all the poker chips in a bag and mix them thoroughly. You draw one chip from the bag. What is the probability that you draw a chip with the property given?

(a) A “34” (b) A “129” (c) An odd number (d) An even number

(e) Greater than or equal to 75 (f) Less than 10 (g) Contains “7” as one (or more) of its digits (h) Has digits that add to 10

19. (Exercise 18 continued) Suppose I tell you that the first digit is a “3” and it is a two-digit number. What is the probability that the chip is “34”?

20. (Exercise 18 continued) Suppose I tell you that the chip is an even number. What is the probability that the chip is “34”?

Click Resources and then 2X in the Personal Trainer for additional exercises.Personal Trainer

Resources

ch02.indd 31 6/7/17 4:02 PM

ch02.indd 32 6/7/17 4:02 PM

chapter 2 variables and their measurementpsy210.faculty.unlv.edu/chapters/02_hurlburt_ch_2.pdf2.1...

Documents