handout 5 stat 115

4
Handouts Stat 115 Page 19 Handout 5: Standardizing, and Probability Learning Objectives Convert raw scores to z-Scores and vice-versa Solve area under the curve problems Demonstrate the relationship between histograms and probability density functions (PDF) Compute the probability of mutually exclusive events Compute the probability of independent events Define outcomes as common or rare Assignments Quiz 3 is on Canvas and due at the start of class on Thursday. Background z-Scores A z-Score is a single score expressed in standard deviation units. Z-scores are unit-less numbers that tell how far a score is from the mean. Put another way: z-scores are how far a score is from the mean, expressed in standard deviation units. If you know the standard deviation and the mean, you can convert z-scores into raw scores and back again. Common point of confusion: The shape of a frequency distribution does not change when converting to z-scores. Only the label of the x-axis changes. Why? Each raw score is associated with one z-score and each z-score is associated with one raw score. If you have two raw scores at the mean, you will have two z-scores at 0. Both histograms will have the same shape. If you convert a variable into a z-score, you have standardized it. Standardized means that you are setting the mean and standard deviation. In the case of the z- score, the standard deviation is set at 1 and the mean is 0. When you convert scores to z- scores, they become standardized. A raw score at the mean will always has a z-score of 0. Example: If the standard deviation for a variable is 1 and the mean is 5, then a z-score of 1 corresponds to a score of 6. Raw scores are the original values of the variable. What do z-Scores Tell You? z-Scores tell you how many standard deviations a score is from the mean. Z-scores of 0 are right on the mean. If I told you my test score as z = 0 in the class, you would know I scored right at the mean. If I told you my test score was z = -2, you would know that I scored 2 standard deviations below the mean. Z-scores are nothing more than standard deviation units. However, if the raw scores were normally distributed, you can make further conclusions. Why are Normally-Distributed z-Scores Useful? When your data is normally distributed, the resulting z-scores are also normally distributed. This distribution has a name: the standard normal curve. The z-score of normally distributed data tells you more than just the standard deviation units.

Upload: liz-lypp

Post on 26-Nov-2015

160 views

Category:

Documents


0 download

TRANSCRIPT

  • Handouts Stat 115 Page 19

    Handout 5: Standardizing, and Probability Learning Objectives

    Convert raw scores to z-Scores and vice-versa Solve area under the curve problems Demonstrate the relationship between histograms and probability density functions (PDF) Compute the probability of mutually exclusive events Compute the probability of independent events Define outcomes as common or rare

    Assignments Quiz 3 is on Canvas and due at the start of class on Thursday.

    Background z-Scores A z-Score is a single score expressed in standard deviation units. Z-scores are unit-less numbers that tell how far a score is from the mean. Put another way: z-scores are how far a score is from the mean, expressed in standard deviation units. If you know the standard deviation and the mean, you can convert z-scores into raw scores and back again. Common point of confusion: The shape of a frequency distribution does not change when converting to z-scores. Only the label of the x-axis changes. Why? Each raw score is associated with one z-score and each z-score is associated with one raw score. If you have two raw scores at the mean, you will have two z-scores at 0. Both histograms will have the same shape.

    If you convert a variable into a z-score, you have standardized it. Standardized means that you are setting the mean and standard deviation. In the case of the z-score, the standard deviation is set at 1 and the mean is 0. When you convert scores to z-scores, they become standardized. A raw score at the mean will always has a z-score of 0. Example: If the standard deviation for a variable is 1 and the mean is 5, then a z-score of 1 corresponds to a score of 6. Raw scores are the original values of the variable.

    What do z-Scores Tell You? z-Scores tell you how many standard deviations a score is from the mean. Z-scores of 0 are right on the mean. If I told you my test score as z = 0 in the class, you would know I scored right at the mean. If I told you my test score was z = -2, you would know that I scored 2 standard deviations below the mean. Z-scores are nothing more than standard deviation units. However, if the raw scores were normally distributed, you can make further conclusions.

    Why are Normally-Distributed z-Scores Useful? When your data is normally distributed, the resulting z-scores are also normally distributed. This distribution has a name: the standard normal curve. The z-score of normally distributed data tells you more than just the standard deviation units.

  • Handouts Stat 115 Page 20

    Because z-scores have no units, they can be compared to other z-scores, even when the units of the variables are not the same. So you can compare relative performance across two variables that dont have the same units.

    Transforming a Raw Score into a z-Score

    Transforming a raw score to a z-score: ! = !!!! In words:

    1. Find the mean of all the scores in your population (!). 2. Subtract ! from the score (!). 3. Divide your result from step 2 by the standard deviation (!) for the population.

    Transforming a z-Score into a Raw Score Transforming a z-score to a raw score: ! = ! + (!)(!) In words:

    1. Find the mean of all the scores in your population (!). 2. Multiply the z-score by the standard deviation (!) for the population. 3. Add steps 1 and 2 to get the raw score.

    Areas under the Curve We can get even more specific; we can talk about the probability of selecting a particular score. In the graph above, if you were to select a participant at random, it is more likely that you would get someone who slept 6 hours than someone who slept 10 hours. In normally distributed data (see how the normal distribution is special in statistics), you can calculate the probability of selecting one score or a range of scores.

    Finding this information is based on the area under the curve. You could calculate the area under the curve using calculus, but we need not suffer through that in this course. Instead, provided we have normally distributed data, we can look up the area under the curve in a table (or use a calculator that has this table built in, like the TI-36X Pro).

    The total area under the curve is 1. To answer area under the curve problems:

    Identify reference point(s) and shaded area. In the question, find the probability of a score greater than 6, the reference point is the raw score of 6 and the shaded area is everything above 6. In the question, find the probability of having a score between 8 and 16, the reference points are 8 and 16 and the shaded area is between them.

    1. Sketch a normal curve. 2. Mark the approximate location of the reference point(s) and the mean. 3. Identify and shade the relevant area under the curve. 4. Find the corresponding z-scores. 5. Use the z-table or your calculator to find the area of the shaded portion. The table

    shows you areas between the mean and a reference point or a reference point and the end of the distribution. The TI-36X Pro shows you the area between any two reference points or a reference point and the end of the distribution.

  • Handouts Stat 115 Page 21

    Example: The human gestation period is normally distributed with a mean of 270 days and a standard deviation of 15 days. What proportion of gestation periods will be between 245 and 255 days (Witte & Witte, 2010, p. 111)?

    Sketch a normal curve. The area of interest is between 245 and 255. These are both below the mean. The corresponding z-scores are -1.67 and -1.00.

    Using the table, you will need to find two areas and subtract them. The first is the area between -1.00 and the end of the distribution. The second area is between -1.67 and the end of the distribution.

    Using the z table, find these two areas and subtract them to get 0.11, or 11%.

    On the TI-36X Pro, you can find the area between z = -1.67 (the lower bound) and z = -1.00 (the upper bound). The Normalcdf function will tell you the area between any two z-scores on the standard normal curve (enter a mean of 0 and a sigma/SD of 1, the properties of the standard normal curve):

    %v"2$$M1.67$M1.00

  • Handouts Stat 115 Page 22

    2. Mark the approximate location of the reference point(s) and the mean. 3. Identify and shade the relevant area under the curve. 4. Use the z-table or your calculator to find the z-score associated with the area. 5. Find the corresponding raw scores.

    The difference is how you use the table or on the calculator. In the table, you will need to look within the table for the proportion .20 in column C (the top 20% of the area) and find the closest value (z = 0.84). On the calculator, you will need to use the invNormal function:

    -%v"3.8$$$< To get the result:

    This is a difficult skill to master because there are many variations of these problems, and using the z-table is counter-intuitive. The best way to learn this skill is to practice during our class meeting until you are comfortable with these problems.

    Key points to remember: Every raw score from a population is associated with only one z-score, and vice-

    versa. Z-Scores are standard deviation units. We can make many more conclusions from normally distributed populations than we

    can from non-normal distributions. The PDF shows the same information as a histogramfrequencies. Because the PDF shows all the scores in the population, the area under the curve

    represents the probability of selecting one score at random.