research methods & design in psychology lecture 3 descriptives & graphing lecturer: james...
TRANSCRIPT
Research Methods & Design in Psychology
Lecture 3Descriptives &
Graphing
Lecturer: James Neill
Overview
• Univariate descriptives & graphs
• Non-parametric vs. parametric• Non-normal distributions• Properties of normal
distributions• Graphing relations b/w 2 and 3
variables
Empirical Approach to ResearchA positivistic approach ASSUMES:• the world is made up of bits of data which can
be ‘measured’, ‘recorded’, & ‘analysed’
• Interpretation of data can lead to valid insights about how people think, feel and behave
What do we want to Describe?
Distributional properties of variables:
• Central tendency(ies)
• Shape
• Spread / Dispersion
Basic Univariate Descriptive Statistics
Central tendency
• Mode
• Median
• Mean
Spread
• Interquartile Range
• Range
• Standard Deviation
• VarianceShape
• Skewness
• Kurtosis
Basic Univariate Graphs
• Bar Graph – Pie Chart• Stem & Leaf Plot• Boxplot• Histogram
Measures of Central Tendency
• Statistics to represent the ‘centre’ of a distribution– Mode (most frequent)– Median (50th percentile)– Mean (average)
• Choice of measure dependent on– Type of data– Shape of distribution (esp. skewness)
Measures of Central Tendency
XXX?Ratio
XXXInterval
XXOrdinal
XNominal
MeanMedianMode
Measures of Dispersion
• Measures of deviation from the central tendency
• Non-parametric / non-normal:range, percentiles, min, max
• Parametric:SD & properties of the normal distribution
Measures of Dispersion
XXXRatio
X?XXInterval
XOrdinal
Nominal
SDPercentiles
Range, Min/Max
Describing Nominal Data
• Frequencies– Most frequent?– Least frequent?– Percentages?
• Bar graphs– Examine comparative heights of bars
– shape is arbitrary• Consider whether to use freqs or
%s
Frequencies
• Number of individuals obtaining each score on a variable
• Frequency tables• graphically (bar chart, pie chart)• Can also present as %
Frequency table for sex
SEX
14 70.0 70.0 70.0
6 30.0 30.0 100.0
20 100.0 100.0
female
male
Total
ValidFrequency Percent Valid Percent
CumulativePercent
Bar chart for frequency by sex
SEX
SEX
malefemale
Fre
qu
en
cy
16
14
12
10
8
6
4
2
0
Pie chart for frequency by sex
SEX
male
female
Bar chart: Do you believe in God?
YesSort ofNo
Do you believe in God?
60
50
40
30
20
10
0
Cou
nt
Bar chart for cost by state
Bar chart vs. Radar Chart
Time Management
Social Competence
Achievement Motivation
Intellectual Flexibility
Task Leadership
Emotional Control
Active Initiative
Self Confidence
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Bar Chart of Sorted Factor Effect Sizes Time 1 to 2
Factors
Eff
ect
size
Bar chart vs. Radar Chart
Time Management
Social Competence
Achievement Motivation
Intellectual Flexibility
Task Leadership
Emotional Control
Active Initiative
Self Confidence
0.60
0.40
0.20
0.00
Radar Chart of Factor Effect Sizes Time 1 to 2
Mode
• Most common score - highest point in a distribution
• Suitable for all types of data including nominal (may not be useful for ratio)
• Before using, check frequencies and bar graph to see whether it is an accurate and useful statistic.
Describing Ordinal Data
• Conveys order but not distance (e.g., ranks)
• Descriptives as for nominal (i.e., frequencies, mode)
• Also maybe median – if accurate/useful• Maybe IQR, min. & max.• Bar graphs, pie charts, & stem-&-leaf
plots
Stem & Leaf Plot
• Useful for ordinal, interval and ratio data• Alternative to histogram
Box & whisker
• Useful for interval and ratio data
• Represents min. max, median and quartiles
Describing Interval Data
• Conveys order and distance, but no true zero (0 pt is arbitrary).
• Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals)
• Distribution (shape)• Central tendency (mode, median)• Dispersion (min, max, range)• Can also use M & SD if treating as
continuous
Describing Ratio Data
• Numbers convey order and distance, true zero point - can talk meaningfully about ratios.
• Continuous• Distribution (shape – skewness, kurtosis)• Central tendency (median, mean)• Dispersion (min, max, range, SD)
Univariate data plot for a ratio variable
The Four Moments of a Normal Distribution
Mean
<-SD-><-Skew Skew->
<-K
urt-
>
The Four Moments of a Normal Distribution
Four mathematical qualities (parameters) allow one to describe a continuous distribution which as least roughly follows a bell curve shape:
• 1st = mean (central tendency)• 2nd = SD (dispersion)• 3rd = skewness (lean / tail)• 4th = kurtosis (peakedness /
flattness)
Mean (1st moment )
• Average score• Mean = X / N• Use for ratio data or interval (if
treating it as continuous). • Influenced by extreme scores
(outliers)
Standard Deviation (2nd moment )
• SD = square root of Variance
= (X - X)2
N – 1• Standard Error (SE) = SD / square root
of N
Skewness (3rd moment )
• Lean of distribution• +ve = tail to right• -ve = tail to left• Can be caused by an outlier• Can be caused by ceiling or floor effects• Can be accurate
(e.g., the number of cars owned per person)
Skewness (3rd moment )
• Negative skew • Positive skew
Ceiling Effect
Floor Effect
Kurtosis (4th moment )
• Flatness or peakedness of distribution• +ve = peaked• -ve = flattened• Be aware that by altering the X and Y
axis, any distribution can be made to look more peaked or more flat – so add a normal curve to the histogram to help judge kurtosis
Kurtosis (4th moment )
Red = Positive (leptokurtic)
Blue = negative (platykurtic)
Key Areas under the Curve for Normal Distributions
• For normal distributions, approx. +/- 1 SD = 68%+/- 2 SD ~ 95%+/- 3 SD ~ 99.9%
Areas under the normal curve
Types of Non-normal Distribution
• Bi-modal• Multi-modal• Positively skewed• Negatively skewed• Flat (platykurtic)• Peaked (leptokurtic)
Non-normal distributions
Non-normal distributions
Rules of Thumb in Judging Severity of Skewness & Kurtosis
• View histogram with normal curve
• Deal with outliers• Skewness / kurtosis <-1 or >1• Skewness / kurtosis
significance tests
Histogram of weight
WEIGHT
110.0100.090.080.070.060.050.040.0
HistogramF
req
ue
ncy
8
6
4
2
0
Std. Dev = 17.10
Mean = 69.6
N = 20.00
Histogram of daily calorie intake
Histogram of fertility
Example ‘normal’ distribution 1
140120100806040200
Die
60
50
40
30
20
10
0
Fre
qu
ency
Mean =81.21Std. Dev. =18.228
N =188
Example ‘normal’ distribution 2
Very masculineFairly masculineAndrogynousFairly feminineVery feminine
Femininity-Masculinity
60
40
20
0
Cou
nt
Example ‘normal’ distribution 3
Very masculineFairly masculineAndrogynousFairly feminine
Femininity-Masculinity
50
40
30
20
10
0
Cou
ntGender: male
Example ‘normal’ distribution 4
Very masculineFairly masculineAndrogynousFairly feminineVery feminine
Femininity-Masculinity
60
40
20
0
Cou
ntGender: female
Example ‘normal’ distribution 5
250200150100500
Exercise (mins/day)
60
50
40
30
20
10
0
Fre
que
ncy
Skewed Distributions& the Mode, Median & Mean
• +vely skewed mode < median < mean
• Symmetrical (normal) mean = median = mode
• -vely skewed mean < median < mode
Effects of skew on measures of central tendency
More on Graphing
(Visualising Data)
Edward Tufte
Graphs: Reveal data Communicate complex ideas
with clarity, precision, and efficiency
Tufte's Guidelines 1
• Show the data• Substance rather than method• Avoid distortion• Present many numbers in a small space• Make large data sets coherent
Tufte's Guidelines 2
• Encourage eye to make comparisons• Reveal data at several levels• Purpose: Description, exploration,
tabulation, decoration• Closely integrated with statistical and
verbal descriptions
Tufte’s Graphical Integrity 1
• Some lapses intentional, some not • Lie Factor = size of effect in graph
size of effect in data• Misleading uses of area• Misleading uses of perspective• Leaving out important context• Lack of taste and aesthetics
Tufte's Graphical Integrity 2
• Trade-off between amount of information, simplicity, and accuracy
• “It is often hard to judge what users will find intuitive and how [a visualization] will support a particular task” (Tweedie et al)
Chart scale
Chart scale
Chart scale
Cleveland’s Hierarchy
Volume
Food Aid Received by Developing Countries
0
50
100
150
200
250
300
350
Burkin
a Fas
o
Ethiop
ia
Moz
ambi
que
Kenya
Mor
occo
Bangl
ades
hIn
dia
Pakist
anEgy
pt
$ m
illio
n in
foo
d ai
d (1
988)
Percentage of Doctors Devoted Solely to Family Practice in California 1964-1990
Distortive Variations in Scale
Distortive Variations in Scale
Restricted Scales
Restricted Scales
Example Graphs Depicting the Relationship between Two Variables (Bivariate)
People Histogram
Separate Graphs
Example Graphs Depicting the Relationship between
Three Variables (Multivariate)
Clustered bar chart
19th vs. 20th century causes of death
Demographic distribution of age
Where partners first met
Line graph
Line graph
Causes of Mortality
Bivariate Normality
Exampes of More Complex Graphs
Sea Temperature
Sea Temperature
Inferential Statistical Analaysis Decision Making
Tree
Links
• Presenting Data – Statistics Glossary v1.1 - http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html
• A Periodic Table of Visualisation Methods - http://www.visual-literacy.org/periodic_table/periodic_table.html
• Gallery of Data Visualization
• Univariate Data Analysis – The Best & Worst of Statistical Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm
• Pitfalls of Data Analysis – http://www.vims.edu/~david/pitfalls/pitfalls.htm
• Statistics for the Life Sciences –http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/Handouts.html