statistics review i class 13. class overview levels of measurement measures of centrality and...
TRANSCRIPT
![Page 1: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/1.jpg)
Statistics Review I
Class 13
![Page 2: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/2.jpg)
CLASS OVERVIEW
Levels of Measurement
Measures of Centrality and Dispersion
* Centrality (mean, median, mode)
* Dispersion (range, variance, std. deviation, std. error)
* Z scores and Z distribution
Confidence Intervals
Exploring Data Sets
* Reasons
* Methods (histograms, features of distributions)
Dealing with Outliers
![Page 3: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/3.jpg)
LEVELS OF MEASUREMENT
1. Categorical
2. Ordinal
3. Continuous
a. Interval
b. Ratio
c. Discrete
![Page 4: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/4.jpg)
Categorical Variables
1. Refer to categories: human, cat, eggplant
2. All or none: Can’t be 1 third human, 2 thirds eggplant
3. Numbers serve as labels, not values: 1 = human, 2 = eggplant
“1” is not less than “2”; human is not less than eggplant
4. Common kinds of categorical variables: gender, race, major
5. Binary: only two values: Yes/No, Day/Night, present/absent
6. Non-Binary: Multiple values. Animal, vegetable, mineral Democrat, Republican, Independ.
7. Nominal: Values are known signifiers:
“Did Joey go potty? Yes? Was it Number 1 or Number 2?”
In some sports, numbers on jerseys represent player position; e.g. 1 = tackler, 2 = runner, etc.
![Page 5: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/5.jpg)
Ordinal Variables
Numeric values refer to the ordering of things
Rankings: 1 = First place, 2 = second place
Chronology: 1= occurred first, 2 = occurred second, etc.
Numeric valued DO NOT indicate how much “1” differs from “2”
Bike race: 1st place (27.24); 2nd place (27.28); 3rd place (33.10)
Grant scores: 1. 99.89
2. 92.63
3. 89.76
4. 89.75
5. 88.84
6. 79.48
winners
losers
![Page 6: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/6.jpg)
CONTINUOUS VARIABLESInterval: Most stat tests rely on interval data
Equal intervals represent equal differences
Discrete: Virtually same as "interval" but there is a finite range of values, as in Likert scales.
“How happy are you with your cell phone service?”1 2 3 4 5
Not at all Barely somewhat Very Greatly
Ratio: Ratios of values on scale are meaningful
Must have meaningful “0” pointLikert scale above NOT ratio, b/c 2:4 ≠ 1:2Temperature, RT, number of yawns in class ARE ratio
![Page 7: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/7.jpg)
GUESS THAT VARIABLE
Example Variable
1 = female, 2 = male
32.75 miles per gallon
1 = slightly tired 2 = moder. tired 3 = very tired
352 Smith Hall
Top 4 Reasons to Learn Stats:
1. Necessary for career2. Source of serenity 3. Great ice-breaker4. Fun for whole family
Categorical, binary
Ratio
Interval
Categorical, non-binary
Ordinal
![Page 8: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/8.jpg)
Distress and Disclosure: A Sample Experiment That
Never Occurred!!!Hyp: Increased anxiety leads
to disclosure.
Ss see scary movie or neutral movie.
Ss asked to rate how scary they found the movie
Ss write about thoughts and feelings movie created.
![Page 9: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/9.jpg)
Measures of Centrality
MODE Most frequent value, occurrence
MEDIAN Middle-most value; 50% above/below
MEAN Arithmetic average
Number of words written: 2, 2, 3, 5, 8
MODE = ?
MEDIAN = ?
MEAN = ?
2
3
4 [2 + 2 + 3 + 5 + 8 / 5 = 4]
![Page 10: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/10.jpg)
Relations Btwn Mean, Median, Mode
Number of words written?
N = 5: 1, 2, 2, 3, 8
N = 10: 1, 2, 3, 3, 3, 4, 5, 5, 6, 8
N = 20: 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 8
N = 5
0
1
2
3
4
5
$1 $2 $3 $4 $5 $6 $7 $8
N = 10
0
1
2
3
4
5
$1 $2 $3 $4 $5 $6 $7 $8
N = 20
0
1
2
3
4
5
$1 $2 $3 $4 $5 $6 $7 $8
Mode Median Mean
2.0 3.0 3.8
3.0 3.5 4.0
4.0 4.0 4.35
If true distribution is normal, then as sample increases mean, median, and mode converge.
How does change in N affect rel. btwn Mean, Median, and Mode?
![Page 11: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/11.jpg)
MEASURES OF DISPERSON
Range: Difference between highest score and lowest score.
N = 20: 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5,5, 6, 6, 6, 7, 8
4.0 4.0 4.35
Mode Median Mean
8 – 1 = 7 = range
Deviation (from mean), AKA “Error”: Difference between individual score and mean
8 – 4.35 = 3.65 = 8’s deviation
Sum of Squared Errors (SS): Why? To get a meaningful index of average dispersion.
1 - 4.35 + 2 – 4.35 ... + 7 – 4.35 + 8 – 4.35 = 0. Useless!
(1 - 4.35)2 + (2 – 4.35)2 ...+ (7 – 4.35)2 + (8 – 4.35)2 = 87.00. Useful!
![Page 12: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/12.jpg)
Variance = s2 = Average deviation in sample = SS N - 1
87 = 4.58 = s2 20-1
Standard Deviation = s = s2 = sq. root of variance =
Variance and Standard Deviation
4.58 = 2.14
We need to get an estimate of average dispersion from mean, just like the mean gives an estimate of average score.
Two problems with variance:
1) units, based on sq’d deviations, are not relatable to actual scores.
2) Variance tends to be a large, unwieldy, number.
1 sd above and below mean = 68% of distribution
2 sd above and below mean = 95% of distribution
![Page 13: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/13.jpg)
Z Scores and Z Distribution
DV 1: “How anxious were you during movie?”
DV 2: Number of words written about movie.
Mean SD
4.23 2.71
28.71 11.65
Issue: How do we compare anxiety with word production?
Z-score conversion: Effect is to convert different metrics into a common metric
Z = X – X
s
Sub. 24: anxious = 3; words = 22
Z_anxious = 3 – 4.23 = -.45 Z_words = 22 – 28.71 = -.58 2.71 11.65
SPSS: Descriptives, “Save standardized values as variables”
Z distribution is normal, mean = 0, SD = 1
discrete data
ratio data
![Page 14: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/14.jpg)
Standard Error of the Mean
Sample mean ( X ) estimates true population mean (µ)
Many sample means from same population will vary.
Standard Error of the Mean (SE) = the average amount that sample means vary around true mean.
If n of sample mean ≥ 30, SE can be estimated based on s (std. deviation), and sample n.
Formula for SE:
SE Movie anxiety study: DV = reported anxiety; n = 43, s = 2.71
SE = (2.71 / √43) = 0.41
SE X = s/√n
Note: SE is much smaller than SD. Why?
![Page 15: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/15.jpg)
CONFIDENCE INTERVALS
Issue: How do we know if the sample mean is a good estimate of the true mean? In other words, how do we estimate a mean’s accuracy?
Confidence Intervals (CI) estimate accuracy of sample means.
CI shows boundary values (highest & lowest) w/n which true mean is likely to occur.
Conventional boundary captures true mean 95% of time.
Calculation: Lower boundary = + (1.96 * SE)
Upper boundary = − (1.96 * SE)
= 4.23, SE = 0.41Movie anxiety study:
X Lower CI = 4.23 - (1.96 * 0.41) = 3.43
Upper CI = 4.23 + (1.96*.041) = 5.03
X X
![Page 16: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/16.jpg)
GRAPHIC REPRESENTATION OF CI
0
1
2
3
4
5
6
Neutral Movie Scary Movie
An
xiet
y R
atin
g
Alone
With Friend
Error bars overlap; means are likely from same distribution.
Differences are not meaninful.
Error bars DON’T overlap; means are likely from different distributions
Differences are meaningful
![Page 17: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/17.jpg)
GRAPHICALLY EXPLORING DATA USING CENTRALITY AND DISPERSION
Why explore data?
1. Get a general sense or feel for your data.
2. Determine if distribution is normal, skewed, kurtotic, or multi-modal (more on this soon).
3. Identify outliers
4. Identify possible data entry errors
![Page 18: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/18.jpg)
+ =
12, 19, 17, 14, 17, 13, 17, 15
+ 147 =
DATA BUGS ARE A HAZZARD: KNOW WHAT'S IN YOUR DATA!
![Page 19: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/19.jpg)
Normally Distributed Data Set
SPSS output: Note similarity between mean, median, mode
![Page 20: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/20.jpg)
Skewed Distribution
Positive Skew Negative SkewPossible
"floor effect"Possible
"ceiling effect"
![Page 21: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/21.jpg)
Kurtosis
Positive kurtosis, “leptokurotic”
Negative kurtosis, “platykurotic”
"Normativity bias?" DV doesn't discriminate IV wasn't impactful
Distinctiveness bias? IV and/or DV too ambiguous Population too diverse
Problems? Problems?
Neuroticism Measure Drinks Per Week
![Page 22: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/22.jpg)
Bimodality
Note: What clues in “statistics” output that the distribution may be bimodal?
Bimodality suggests 2 (or more) populations
Multimodal: More than two modes.
![Page 23: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/23.jpg)
Outliers
![Page 24: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/24.jpg)
BOX AND WHISKER GRAPH
Median (50 %)
Top 25%
Upper Quartile
Lower Quartile
Bottom 25%
![Page 25: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/25.jpg)
BOX AND WHISKER GRAPH, AND DATA CHECKING
Detecting Skew
Detecting Outliers
subject number
![Page 26: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/26.jpg)
DEALING WITH OUTLIERS1. Check raw data: Entry problem? Coding problem?
2. Remove the outlier:
a. Must be at least 2.5 DV from the mean (some say 3 DV)
b. Must declare deletions in pubs.
c. Try to identify reason for outlier (e.g., other anomalous responses).3. Transform data: Convert data to a metric that reduces deviation. (More on this
in next slide).
4. Change the score to a more conservative one (Field, 2009):
a. Next highest plus 1
b. 2 SD or 3 SD above (or below) the mean.
c. ISN’T THIS CHEATING? No (says Field) b/c retaining score biases outcome. Again, report this step in pubs.
5. Run more subjects!
![Page 27: Statistics Review I Class 13. CLASS OVERVIEW Levels of Measurement Measures of Centrality and Dispersion * Centrality (mean, median, mode) * Dispersion](https://reader031.vdocuments.us/reader031/viewer/2022032005/56649d0c5503460f949e0de0/html5/thumbnails/27.jpg)
Data Transformations1. Log Transformation (log(X)): Converting scores into Log X reduces
positive skew, draws in scores on the far right side of distribution.
NOTE: This only works on sets where lowest value is greater than 0. Easy fix: add a constant to all values.
2. Square Root Transformation (√X): Sq. roots reduce large numbers more than small ones, so will pull in extreme outliers.
3. Reciprocal Transformation (1/X): Divide 1 by each score reduces large values. BUT, remember that this effectively reverses valence, so that scores above the mean flip over to below the mean, and vice versa.
Fix: First, preliminary transform by changing each score to highest score minus the target score. Do it all at same time by 1/(Xhighest – X).
4. Correcting negative skew: All steps work on neg. skew, but first must reverse scores. Subtract each score from highest score. Then, re-reverse back to original scale after transform completed.