graphs and graphical presentation peter shaw ph 6 7 1 2 3 4 5 6 pond #
TRANSCRIPT
Graphs and graphical presentation
Peter Shaw
pH6
7
1 2 3 4 5 6 Pond #
121212N =
Collembola density in 3 habitats
in Colyford wood
SITE
coppiceconiferscleared
colle
mbo
la m
-2
14000
12000
10000
8000
6000
4000
2000
0121212N =
Species richness in 3 habitats
in Colyford wood
SITE
coppiceconiferscleared
Spec
ies
richn
ess
per s
ampl
e
14
12
10
8
6
4
2
0
Your most important concept for the day:
A graph is the best way to communicate numerical information to people. Bar nothing.
Always graph data if you want to understand them or explain to others.
Shows maxima andminimapH
6
7
The distribution of pH valuesin ponds on Wimbledon Common
Rules for any graph:2: A title
1: Clearly labelled axes,units where appropriate
3: Explanations ofsymbols
1 2 3 4 5 6 Pond #
The most common fault:You use the PC stats package to plot the graph for you.Eh? Come on, are you seriously expecting me to draw them by hand when the PC does it for me?!
Well, actually, the number of times that an SPSS- or EXCEL-generated graph is acceptable as thesis-quality first time around is close to zero. Common errors are stupid axis ranges (weight or height starting at negative values), default variable names (VAR001 tells me nothing!), and glorious technicolor (that becomes illegible in the photocopied version).
Re-edit them to give big bold black symbols and sensible ranges. In several cases I don’t bother, but edit and past the graph into powerpoint and re-draw it by hand in powerpoint. All the diagrams in my book were redrawn in Powerpoint this way after I despaired of ever getting a useful graph out of SPSS!
888888N =
actual_distance
32.0016.008.004.002.00.00
Mea
n +
- 2
SE
litt
erde
pth
70
60
50
40
30
20
10
SPSS gives you this…
actual_distance
32.0016.008.004.002.00.00
Mea
n lit
terd
epth
50
40
30
20
Or this…
Lit
ter
dept
h, m
m0
20
40
6
0
8
0
1
00
0 2 4 8 16 32Distance, m
And the graph I really wanted…
Key1997-20011992-19961988-19901986-1987a species
cc
cc
Cc Co
C3
C5
Am
Ar
Lp Pi
Bs
Gr
Sv
Sl
Lh
-300
–
200
-1
00
0
100
2
00
300
4
00
-100 0 100 200 300 400 500
2nd
DC
A a
xis
Eig
enva
lue
= 0
.111
1st DCA axis Eigenvalue = 0.375
Another hand-drawn in Powerpoint..This is an ordination diagram – more later on in the course
Types of graph:There are many types, and no laws stopping you from inventing a new format.
My aim for today is to show you the theory and practice of the commoner types of graph.
Then I will get you used to plotting them in your head to model the behaviour of different patterns within your data (rest assured that this s very quick and easy).
Then we head for the PCs to do them ourselves.
100 50 0
Number of individuals caught
1 2pond
These are useful for showing how properties differ between sites/classes, but work best when you have only one number (a total, average or other) per class.
Bar charts
Early successional Collembola
on PFA sites
Site age, years
40.0030.005.002.00
Mean C
ollem
bola
density m
-2
3000
2000
1000
0
Hypogastrura
vernalis m-2
Cryptopygus
thermophilus m-2
Late successional Collembola
of PFA sites
Site age, years
40.0030.005.002.00
Collem
bola
m-2
4000
3000
2000
1000
0
Tullbergia
macrochaeta m-2
Lepidocyrtus
lanuginosus m-2
Isotomodes
productus m-2
Friesea
mirabilis m-2
Hypogastrura denticulata
Cryptopygus thermophilus
Tullbergia sppLepidocyrtus lanuginosusIsotomodes productusFriesea mirabilis
Successional patterns in Collembola colonising an industrial waste.
Boxplots• These are under-
rated, but extremely helpful tools for examining the distribution of data.
• They have the big advantage over barcharts that they show the range of values in data.
0
50
100
median
25th centile
75th centile
Highest value
Lowest value
121212N =
Collembola density in 3 habitats
in Colyford wood
SITE
coppiceconiferscleared
colle
mbo
la m
-2
14000
12000
10000
8000
6000
4000
2000
0121212N =
Species richness in 3 habitats
in Colyford wood
SITE
coppiceconiferscleared
Spe
cies
rich
ness
per
sam
ple
14
12
10
8
6
4
2
0
Here we have an example of boxplots in action, describing soil insects in 3 areas of a wood in Devon (ancient oak, modern conifer, and newly cleared).
Scatterplots
pH
depth
These are very commonly used and powerful tools. The Y axis (going up) is always assumed to depend on the X variable.
Think hard before putting any marks on here! Generally you should fit a singe best fit line if the correlation is p<0.05, otherwise leave alone.
year
NEVER dot – dot!! Unless your areabsolutely sure that interpolation is valid
Lichen cover on tombstone
This is WRONG
year
Height of 1 child
This is OK
Species richness against %
fine fraction (2-0.5mm)
Rs = 0.61**, Spp = 1.2 + 0.09*%fine
% in fraction 2-0.5mm
3020100
Sp
eci
es
rich
ne
ss
8
6
4
2
0
A sample scattergraph with best-fit line.
A hybrid scatter-graph with error bars. You may want to consider the validity of joining the points up, but it can be justified.
Scatterplots, contd
Beware the false axis!
Why is this graph meaningless?
1 5 10Bag number
Weight of leaf
Pie charts
These are good for showing the proportional composition of communities, but not so good for comparing samples of different sizes.
CATSEAR
GRASS
ULEX
DFLEX
P-P graphsThese are used to decide about normality of data.
If the plotted points lie on the green line (the line of Y=X) the data distribution appears to be that of the Normal or Gaussian curve.
Here we see the same data before and after a logarithmic transformation.
Normal P-P Plot of LOI
Observed Cum Prob
1.00.75.50.250.00
Exp
ect
ed
Cu
m P
rob
1.00
.75
.50
.25
0.00
Normal P-P Plot of LOGLOI
Observed Cum Prob
1.00.75.50.250.00
Exp
ect
ed
Cu
m P
rob
1.00
.75
.50
.25
0.00
Kite diagramsThese are mainly used to show how communities of 3-10 entities vary along an axis (time, or a spatial gradient such as downstream from a pollution source). They are good for ecological studies, less so for physical data.
Age
, yea
rs0
5
10
Species A Species B Species CTotal counts for each species
I want to give you the secret to good results:
• The secret to a successful exercise in data collection is to plan (ie visualise) the final presentation BEFORE you start to collect the data!
• This does NOT mean you plot the graph then make up the data!! It means that you consider what patterns might arise in your data, how best to portray these on a graph, and thereby allows you to plan what data you will need to collect, and drives the whole project along.
Fieldwork PCs+
GRAPHS
98850N =
COVER
Wood/BarkWoodchipsBark
SP
P
8
6
4
2
0
-2
90
32891118836168526
A student wants to measure the pH values of ponds on Wimbledon common, already planning the talk that they will give a week later. They want to show a graph like this:
Shows maxima andminima
pH6
7
1 2 3 4 5 6 Pond #
The distribution of pH valuesin ponds on Wimbledon Common
They know that there are 6 accessible ponds to visit and want to be able to talk about all of them. They work out how long they have for each pond, and collect 4 measurements from each.
A quick boxplot exercise
Imagine that you are to undertake research on the Common, measuring properties of two ponds.
Produce a boxplot chart comparing them between sites under TWO scenarios:
1: There is significant variation between the sites - at least one is different.2: There is a little variation between sites, but only due to random noise.
Now produce a scatter graph showing how two variables are related. Let’s plot yield of vegetation against dose of fertiliser added. Again plot 2 scenarios:
1: How you imagine the data would work out if the two variables are significantly related (correlated in the jargon)
2: What you might find if the fertiliser turned out to a waste of money.