4/12/2015slide 1 we have seen that skewness affects the way we describe the central tendency and...

72
03/21/22 Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution is more skewed than the threshold of -1.0 to 1.0, we report the median and interquartile range rather than the mean and standard deviation. A major cause of skewed distributions is the presence of outliers – cases that have very small or very large scores relative to the other cases in the distribution. Outliers have a larger effect on the results of statistical analysis than other cases. One extreme outlier may change our view of central tendency and variability for the entire distribution.

Upload: courtney-becknell

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 1

• We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution is more skewed than the threshold of -1.0 to 1.0, we report the median and interquartile range rather than the mean and standard deviation.

• A major cause of skewed distributions is the presence of outliers – cases that have very small or very large scores relative to the other cases in the distribution.

• Outliers have a larger effect on the results of statistical analysis than other cases. One extreme outlier may change our view of central tendency and variability for the entire distribution.

Page 2: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 2

• Outliers pose a dilemma for us in terms of our justification for either omitting them or retaining them in the analysis.

• It is easy to remove outliers that were data entry errors. It is more difficult to defend removing outliers when the scores represent accurate data.

• One response to the dilemma is to run the analysis with and without the outliers, and describe the difference. Sometimes it makes little difference and we can ignore the presence of the outliers.

• Another response to the dilemma is to re-express or transform the variable and see if the outliers are eliminated. If there are no outliers using the re-expressed data, we can run the analysis with the re-expressed data and draw our conclusions based on the results for the re-expressed variables.

Page 3: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 3

• Two downsides to the strategy of re-expressing data are: • the skepticism of audiences who already think we

massage the numbers to produce the results we want, and

• the need to convert the results back to the original scale if we need to report numerical results.

• In this problem set, we will use a boxplot strategy for detecting outliers and examine the use of two of the possible transformations: the square and the logarithm.

• The Explore procedure in SPSS provides both the boxplot and the descriptive statistics needed to solve these problems.

• In the boxplot, two types of outliers are identified by symbols: circles for outliers, and stars for extreme (or far) outliers.

Page 4: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 4

• A case is identified as an outlier (circle) if its value is less than or equal to the first quartile minus 1.5 times the interquartile range, or is greater than or equal to the third quartile plus 1.5 times the interquartile range.

• If the case has a value less than or equal to the first quartile minus 3 times the interquartile range or greater than the third quartile plus 3 times the interquartile range, it is characterized as a far outlier (stars).

• If outliers or far outliers are found for a variable, we will examine the behavior of the outliers when the variable is re-expressed by computing the logarithm of the values if the variable is skewed to the right. If the variable is negatively skewed, we will square the values and examine the effect on the outliers.

Page 5: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 5

The script for this week positions the boxplot under the histogram. In this chart, we see a number of circles at the right end of the distribution. These are outliers, and there are no far outliers in this distribution. As we would expect, this distribution has a skewness problem (skewness=1.19) in the subtitle to the chart.NOTE: the horizontal axis for the

boxplot approximates the axis for the histogram, but is not exact.

Page 6: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 6

This distribution for this variable shows one far outlier at the extreme right of the distribution.

Page 7: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 7

Some distributions will show both outliers and far outliers. Our problems will state the number of outliers,

and the number of far outliers as a subset of the total number of outliers.

Note that the chart shows the presence or absence of outliers, but does not necessarily provide an exact count since the outlier symbol might represent more than one case with the score.

Page 8: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 8

The boxplots for some distributions will indicate that there are no outliers.

Page 9: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 9

The boxplot for the distribution for this variable shows several outliers at the right end of the distribution. When the variable is positively skewed, the data values are re-expressed on the logarithmic scale.

When we re-express the values for the variable on a logarithmic scale, the boxplot does not indicate that there are any outliers.

Page 10: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 10

If we re-express the data values using the wrong transformation, we actually increase the problem of outliers. The distribution was positively skewed, and we squared the data values, rather than converting to a log scale, resulting in more outliers.

Page 11: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 11

The boxplot of the squared values indicates that there are not outliers for the re-expressed data values.

The boxplot for the distribution for this variable shows several outliers at the low end of the scale. Since this variable is skewed to the left, we will re-express the data values as squares.

Page 12: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 12

If we re-express the data values using the wrong transformation, we actually increase the problem of outliers. The distribution was negatively skewed, and we applied a log transformation rather than the square transformation, resulting in more outliers.

Page 13: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 13

Re-expressing the data values does not always remedy the problem of outliers. In the chart to the right, the logarithmic transformation appears to have had little impact.

Page 14: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 14

Some variables have outliers at both ends of the distribution. The outliers at one end may offset the skewness in the other tail, but kurtosis will become a problem.

Neither the logarithmic nor the square transformation will remedy this distribution because each re-expression works on only one tail of the distribution.

Page 15: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 15

• Re-expression changes the measuring scale for the variable by altering the distance between the values. All of the lines below represent the numbers 1 to 10, on a decimal, logarithmic, and squared scale.

• On our familiar decimal measuring scale, the distance between numbers is the same for all numbers.

• On a logarithmic scale, the distance between the numbers decreases as the numbers get larger

• On a square scale, the distance between the numbers decreases as the values get smaller.

• All of the dots represent the same sequence of values from 1 to 10 on different measuring scales.

Page 16: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 16

• The logarithmic transformation works by stretching the scale at the left end of the distribution and compressing the scale at the right end of the distribution.

• As shown in the diagram below, the numbers 1 to 5 (red dots) are converted to their log equivalents (blue dots).

The distance between the log points decreases as the values increase. The distance between the log of 4 and the log of 5 is less than the distance between the log of 1 and log of 2.

Page 17: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 17

• Positive skewing is reduced because the distance between consecutive numbers on the decimal scale decreases as the size of the decimal number increases.

• For example, the difference between the log of 2 and the log of 3 is 0.176, larger than the difference between the log of 4 and log of 5, which is 0.097.

decimal scale log scale difference between consecutive values

1 0.000

2 0.301 0.301

3 0.477 0.176

4 0.602 0.125

5 0.699 0.097

6 0.778 0.079

7 0.845 0.067

8 0.903 0.058

9 0.954 0.051

10 1.000 0.046

Page 18: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 18

• The square transformation works by compressing the scale at the left end of the distribution and stretching the scale at the right end of the distribution.

• As shown in the diagram below, the numbers 1 to 5 (red dots) are converted to their squared equivalents (blue dots).

The distance between the squared points increases as the values increase. The distance between the square of 4 and the square of 5 is larger than the distance between the square of 1 and square of 2.

Page 19: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 19

• Negative skewing is reduced because the distance between consecutive numbers on the decimal scale increases as the size of the decimal number increases.

• For example, the difference between the square of 2 and the square of 3 is 5.0, less than the difference between the square of 4 and square of 5, which is 9.0.

decimal scale squared scale

difference between consecutive values

1 1.000

2 4.000 3.000

3 9.000 5.000

4 16.000 7.000

5 25.000 9.000

6 36.000 11.000

7 49.000 13.000

8 64.000 15.000

9 81.000 17.000

10 100.000 19.000

Page 20: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

04/18/23 Slide 20

• As long as we can reverse the transformation and get back to the original values, the transformations are legitimate.

• To make certain we can get back to the original values, we must make certain the numbers on all scales are mathematically defined as real numbers. Not all numbers are defined, such as the logarithm of 0 and the square root of negative numbers.

• To make certain we do not do a transformation we cannot work backwards, we may need to add a constant to each number. If numbers are negative, we add the amount of the smallest value to each number. If the smallest value in the distribution is 0, we add 1 to each score in the distribution.

• Since we are starting out with transformations, the problem statement will tell you if you need to add a numeric constant when doing the transformations.

Page 21: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 21

The introductory statement in the question indicates:• The data set to use (2001WorldFactBook)• The task to accomplish (checking for outliers)• The SPSS procedure to use (Explore)• The variable to use in the analysis: HIV-AIDS adult

prevalence rate [hivaids]

Page 22: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 22

These problem also contain a second paragraph of instructions that provide the formulas to use if our examination of outliers requires us to re-express or transform the variable.

Page 23: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 23

The first statement concerns the number of valid and missing cases. To answer this question, we produce the descriptive statistics using the SPSS Explore procedure.

Page 24: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 24

To compute the descriptive statistics and charts that we need to check for outliers, select the Descriptive Statistics > Explore command from the Analyze menu.

Page 25: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 25

Move the variable for the analysis hivaids to the Dependent List list box..

Click on the Statistics button to select optional statistics.

Page 26: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 26

The check box for Descriptives is already marked by default.

Click on Continue button to close the dialog box.

Mark the Percentiles check box. This will provided the upper and lower bounds for the interquartile range.

While there is a check box for Outliers, it lists the five largest scores and the five smallest scores, but does not tell us whether or not they are really outliers.

Page 27: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 27

Next, we click on the Plots button to obtain visual evidence of the presence of outliers in the distribution.

Page 28: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 28

We accept the default for the Box plot, which provides us the output we need even though we are not using factor levels in this problem. We accept the default

Stem-and-Leaf plot, and mark the check box for a Histogram as well.

We click on the Continue button to close the Plots dialog.

Page 29: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 29

After returning to the Explore dialog box, click on the OK button to produce the output.

Page 30: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 30

The 'Case Processing Summary' in the SPSS output showed the total number of valid cases to be 162 and the number of missing cases to be 56.

The SPSS output provides us with the answer to the question on sample size.

Page 31: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 31

The 'Case Processing Summary' in the SPSS output showed the total number of valid cases to be 162 and the number of missing cases to be 56.

Click on the check box to mark the statement as correct.

Page 32: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 32

The next two statements focus on the median and interquartile range as the center and spread of the data.

We are using the median and interquartile range because we are using the box plot strategy for identifying outliers. The median and interquartile range are key measures of box plots.

Page 33: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 33

We use the table of descriptive statistics to obtain the value for the median: .2000 for this variable.

However, we do not use the table of descriptive statistics for the value of the interquartile range because this is not the value used in the box plot. The value used in the box plot is based on “Tukey’s hinges” which use a slightly different calculation for the first and third quartile, which may make a difference in the value for the interquartile range.

Page 34: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 34

The value for the first quartile (the 25th percentile) is 0.050. The value for the third quartile (the 75th percentile) is 2.010. The interquartile range is the difference between the two: 1.96 (2.010 – 0.050 = 1.96).

Note that the 75th percentile using the default weighted average calculation is slightly different (2.0150) from the 75th percentile calculated with Tukey’s Hinges.

Page 35: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 35

From the SPSS output, we obtained a value of 0.20 for the median and 1.96 for the interquartile range. We mark the first check box in the pair as the correct answer.

Page 36: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 36

The next pair of statements asks us to identify the direction of the skewing in the distribution of the variable.

Outliers almost always skew the distribution. The direction of the skewness is critical because it dictates which function we choose to re-express or transform the data.

Page 37: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 37

The skewness for the distribution of "HIV-AIDS adult prevalence rate" [hivaids] is 3.19. Since this is greater than zero, we characterize it as positive skewing, or skewing to the right.

When the distribution is skewed to the right, the text recommends re-expressing the data as logarithms, square roots, or reciprocals. We will use logarithms in these problems.

When the distribution is skewed to the left, the text recommends re-expressing the data as squares.

Page 38: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 38

The skewness for the distribution of "HIV-AIDS adult prevalence rate" [hivaids] is 3.19. Since this is greater than zero, we characterize it as positive skewing or skewing to the right.

We mark the check box for the first statement as the correct response.

Page 39: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 39

The next pair of statements asks us to identify how many outliers there are in the distribution, either that there are no outliers or that there are a specific number of outliers and far outliers.

Page 40: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 40

The box plot provides us with the first evidence of outliers. The circles and asterisks above the whiskers of the box plot attest to the presence of outliers.

In the terminology of the text, the circles are outliers, and the asterisks are far outliers.

Page 41: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 41

If the variable does not have outliers, neither circles nor asterisks will appear in the box plot.

This is a box plot for the variable Population below poverty line from the same data set.

Page 42: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 42

While the box plot makes it obvious that there are outliers in this distribution, it is not possible to obtain the exact number because the points overlap and because a single circle or star may represent more than one case with that value.

Page 43: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 43

The presence of outliers is also seen in the histogram of the distribution. However, it also does not make it easy to determine the exact number.

Page 44: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 44

Our first task is to compute the values that would let us determine whether or not a case is an outlier.

The value for the first quartile (the 25th percentile) is 0.050. The value for the third quartile (the 75th percentile) is 2.010. The interquartile range is the difference between the two: 1.96 (2.010 – 0.050 = 1.96).

To be characterized as an outlier, a case would have to have: • a value less than or equal to -2.89 (Q1 - 1.5 x IQR = 0.05 - 1.5 x 1.96 = -2.89)

or • a value greater than or equal to 4.95 (Q3 + 1.5 x IQR = 2.01 + 1.5 x 1.96 = 4.95)

Page 45: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 45

Our second task is to compute the values that would let us determine whether or not a case is a far outlier.

The value for the first quartile (the 25th percentile) is 0.050. The value for the third quartile (the 75th percentile) is 2.010. The interquartile range is the difference between the two: 1.96 (2.010 – 0.050 = 1.96).

To be characterized as a far outlier in the distribution of "HIV-AIDS adult prevalence rate" [hivaids], a case would have to have • a value less than or equal to -5.83 (Q1 - 3 x IQR = 0.05 - 3 x 1.96 = -5.83)

or • a value greater than or equal to 7.89 (Q3 + 3 x IQR = 2.01 + 3 x 1.96 = 7.89)

The calculations may produce values that do not exist in the data set, e.g. -5.83. Since there can be no outliers at that value or smaller, it does not have any impact on our solution.

Page 46: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 46

We sort the cases in ascending order by the variable we are studying, so we can count the number of cases that fall in the outlier region.

Click the right mouse button on the column header for hivaids, and select Sort Ascending from the pop-up menu.

Page 47: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 47

The entries at the top of sorted column are missing values, indicated by the periods in the cells.

The lower bounds for both outliers and far outliers were negative numbers (-2.89 and -5.83). Since all of the values for hivaids are positive numbers there are no outliers in the lower range of values.

Page 48: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 48

The upper bound for outliers was 4.95. After locating in the sorted column, we count the number of values greater than or equal to 4.95., as shown in the red border. There are 25 outliers.

The upper bound for far outliers was 7.89, outlined with the blue border. There are 18 far outliers.

Page 49: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 49

We counted 25 outliers and 18 far outliers in the data editor for hivaids.

We mark the second check box in the pair which concurs with our finding.

Page 50: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 50

The first statement in the next pair asks about the impact of re-expressing or transforming the data as logarithms.

It predicts that the logarithmic transformation will eliminate both outliers and far outliers.

Page 51: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 51

The formula for computing the log transformation was given in the second part of the instructions for the problem. We will create a new variable called LG_hivaids based on the LG10 function in SPSS

In some problems a number (e.g. 2.14 or 6.0) will be included in the parentheses of the formula to make sure all of the values to be converted are greater than zero.

Page 52: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 52

To compute the new variable, select the Compute command from the Transform menu.

Page 53: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 53

In the Compute Variable dialog box, we type the name for the new variable, LG_hivaids, in the Target Variable text box.

Click on the Arithmetic function group so that the list of available functions appears in the Functions and Special Variables list box.

Page 54: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 54

First, in the list of Functions and Special Variables, highlight Lg10 which computes logarithmic values using a base of 10.

Second, click on the up arrow button to paste the Lg10 function in the Numeric Expression text box.

Page 55: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 55

Next, type the name of the variable to be transformed hivaids between the parentheses after the function name.

Finally, click on the OK button to compute the transformed variable.

Page 56: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 56

Scroll the data editor window to the right to see the transformed variable, LG_hivaids.

Note that I moved the hivaids variable to the right as well. It will not appear in this position in your data editor.

Page 57: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 57

To calculate the descriptive statistics so we can identify outliers on the transformed variable, click on the Dialog Recall tool button.

Page 58: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 58

In the pop-up menu for Dialog Recall, select the Explore item (the second to the last command we executed in SPSS).

Page 59: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 59

Since we want the same statistics computed in the last Explore procedure, we only need to replace the variable hivaids with LG_hivaids.

Click on the OK button to produce the output.

Page 60: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 60

The box plot for LG_hivaids shows no circles or asterisks, indicating that there are no outliers in this distribution.

Page 61: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 61

Similarly, the histogram displays a distribution that is much less skewed.

Page 62: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 62

We use the SPSS output in the Percentiles table to compute the cut points that would make a case an outlier.

To be characterized as an "outlier", a case would have to have • a logarithmic value less than or equal to -3.707 (Q1 - 1.5 x

IQR = -1.301 - 1.5 x 1.604 = -3.707) or

• a logarithmic value greater than or equal to 2.71 (Q3 + 1.5 x IQR = 0.303 + 1.5 x 1.604 = 2.71)

The interquartile range is Q3 – Q1 = 0.3032 - -1.30103 = 1.604.

Page 63: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 63

To be characterized as a "far outlier" in the distribution of "HIV-AIDS adult prevalence rate" [hivaids], a case would have to have • a logarithmic value less than or equal to -6.114 (Q1 - 3 x

IQR = -1.301 - 3 x 1.604 = -6.114) or

• a logarithmic value greater than or equal to 5.116 (Q3 + 3 x IQR = 0.303 + 3 x 1.604 = 5.116)

The interquartile range is Q3 – Q1 = 0.3032 – (-1.30103) = 1.604.

Page 64: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 64

The smallest logarithmic values is-2.0, larger than the cut point for an outlier (-3.707) and larger than the cut point for a far outlier (-6.114).

Page 65: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 65

The largest logarithmic value is 1.554, less than the cut point for an outlier (2.71) and less than the cut point for a far outlier (5.116).

When re-expressed as logarithms, the number of outliers in the distribution of "HIV-AIDS adult prevalence rate" was reduced from 25 to 0 and the number of far outliers was reduced from 18 to 0.

Page 66: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 66

The logarithmic re-expression of the distribution of hivaids had no outliers and no far outliers, effectively reducing the number of both to 0.

We mark the check box for the statement.

Page 67: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 67

The final statement asks about the impact of squaring the values of hivaids.

Since this is not a transformation that works for negatively skewed distributions, we do not mark the check box.

Page 68: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 68

Even though we do not need to re-express this variable as a square, an example of the commands in SPSS will be shown.

The formula for the square transformation is provided in the second paragraph to the problem.

Page 69: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 69

To compute the new variable, select the Compute command from the Transform menu.

Page 70: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 70

In the Compute Variable dialog box, we type the name for the new variable, SQ_hivaids, in the Target Variable text box.

The variable name follows the convention of prepending SQ_ (for square) to the original variable name.

There is not a function to compute the square. Instead we directly type the formula into the Numeric Expression text box.

Note the parentheses around the variable name. These could be optional in this problem, but when there is a constant involved, we need to make certain that the constant and the variable name are enclosed in parentheses.

After we have typed in the variable name and formula, we click on the OK button to compute the new variable.

Page 71: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 71

The square of hivaids is added to the data set.

I increased the number of decimal places displayed so that the initial entries were not displayed as .000.

Page 72: 4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution

Slide 72

The feedback in BlackBoard shows that all of the check boxes we marked were correct.