progpal2011
DESCRIPTION
My talk for ProgPal 2011, Leicester.Very rough. Just a few slides recycled and thrown together. Enjoy!TRANSCRIPT
Congruence between Cranial and Postcranial Characters
in Vertebrate Systematics
Ross Mounce and Matthew Wills
Introduction
Total Evidence sensu Carnap (1950)
The strongest test of a phylogenetic hypothesis is
provided by the comparison of multiple lines of
independent evidence
Example Data Partitions
Molecular data
+
Morphological data
Example Data Partitions
Molecular data[nuclear genes]
[mitochondrial genes][coding and non-coding]
+
Morphological data
Morphological Data Partitions:a few examples
Cranial | Postcranial
(for Vertebrata, this study)
Genital | Non-genital
(for insects, Song & Bucheli, Cladistics, 2010)
'Hard parts' | 'Soft parts'
(in prep.)
Motivation
“It is commonly believed that there are differences in the evolutionary lability of the crania, dentition, and postcrania of mammals”
(Sanchez-Villagra & Williams, 1998)
“...postcranial characters either from the vertebral column or fins are considerably less used in phylogenetic analyses of lower actinopterygians”
(Arratia, Acta Zoologica 2009)
Generally, there are more cranial characters in real data matrices
> 60% of vertebrate characters are cranial*
* Based upon a sample of 70 vertebrate (only) data matrices published between 2001-2010, excluding matrices that were either 100% cranial or postcranial
Testing for a difference in signal
Future work: Character Compatibility tests (sensu Meacham & Estabrook, 1985)
ILD - Incongruence Length Difference (Mickevitch & Farris, 1983)
TILD - Topological ILD (W. Wheeler, 1999)
IRD - Incongruence Relationship Difference (Mounce & Wills)
PDC - Partition Distance Correlation (Mounce, Gerber, Wills)
pCI - Partition Consistency Index (steps relative to partition tree only)
HER – Homoplasy Excess Ratio (Archie, 1989)
Incongruence Length Difference
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Out 000000000 A 001110011 B 001110000 C 001100011 D 110000000 E 110001101 F 110001100
Out 000000000A 000000011B 000001100C 000111111D 001111100E 111111101F 111111100
L=25 L=11
L=12ILD = 25 – (11 + 12)
= 2
Determining the significance of ILD
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Cranial | Postcranial
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Randomized partition 1
Random. Partition 2 … Random. Partition 999
4 3 2 1 00
50
100
150
200
250
300
350
400
450
ILD
ILD = 2 ILD = 1
ILD = 1 ILD = 3
Cran | Post
The ILD of cranial and postcranial partitions
10
6
16
4
14
13
'Fish'AmphibiaMammalsBirdsDinosaursReptiles (other)
N = 63
Source data used in this study
16
7 40
0.001 - 0.0100.011 - 0.1000.101 - 1.000
ILD significance p-values
* 999 random reps, heuristic search, TBR-swapping, maxtrees 10000, hold 1000, RAS 10
Topological ILD
Original Matrix MPT(s) MRP matrix
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
AB 0110000ABC 0111000DE 0000110DEF 0000111ABCDEF 0111111
L = 4
Out 000000000 A 001110011 B 001110000 C 001100011 D 110000000 E 110001101 F 110001100
Out 000000000A 000000011B 000001100C 000111111D 001111100E 111111101F 111111100
Cranial
Post-cranial
AB 0110000ABC 0111000DE 0000110DEF 0000000ABCDEF 0111111DE 0000110DEF 0000111BDEF 0010111BCDEF 0011111ABCDEF 0111111
L = 11
TILD = 11 – 4 = 7
(excluding uninformative characters) MSY Lee, 2001
P-value = 0.113
TILD is not suitable for testing morphological data partitions
Regardless of exactly how one partitions morphological datasets, the very process of partitioning means that the smaller partition will have only 50% (or less) of the characters in the whole matrix
This has a negative effect on the % topological resolution (nodes) of the MP solution(s)
Which in turn reduces the number of parsimony-informative characters of the MRP-matrix of the MP solution(s)
Indeed, for many datasets I tested; the Strict Consensus (SC) of the MP solutions for one or both of the individual partitions was completely unresolved or very poorly resolved.
TILD (>50% maj-rule)
46
6
73
0.001 - 0.0100.011 - 0.1000.101 - 0.3000.301 - 1.000
TILD significance p-values
Cranial | Postcranial'apparently'
Significantly incongruent
N = 63
These results CANNOT be interpreted as support for hypotheses of incongruence of phylogenetic signal between cranial and postcranial data.They are an artefact of incongruence (character conflict) within each data
partition, NOT between the partitions as originally intended.
Incongruence Relationship Difference
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Cranial | Postcranial
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Randomized partition 1
Random. Partition 2 … Random. Partition 999
4 3 2 1 00
50
100
150
200
250
300
350
400
450
RF = 2 RF = 4
RF = 2 RF = 2
Cran | Post
IRD (Robinson-Foulds)
17
244
0.001 - 0.0100.011 - 0.3000.301 - 1.000
IRD(RF) significance p-values
16
7 40
ILD significance p-values
IRD(RF) provides similar but not the same conclusions as ILD
Partition Distance Correlation
Transform each partition into a (phenetic) distance matrix
Compute the difference between the two distance matrices using:– a) Mantel's permutation test (Mantel, 1967)– b) Spearman's coefficient of correlation
Determine significance p-values by comparing with randomly partitioned replicate matrix scores (~ILD)
Out 000000000 A 001110011 B 001110000 C 001100011 D 110000000 E 110001101 F 110001100 Out 000000000A 000000011B 000001100C 000111111D 001111100E 111111101F 111111100
-0.66 -0.44 0.44 -0.22 0.44 0.44 -0.55 0.55 0.55 0.55 -0.44 0.44 0.66 0.44 0.33 -0.55 0.33 0.11 0.55 0.44 0.55 -
Mantel: 0.36
Spearman's: 0.41 -0.44 -0.11 0.33 -0.33 0.33 0.44 -0.66 0.66 0.55 1.00 -0.55 0.33 0.66 0.44 0.55 -0.55 0.88 0.44 0.44 0.55 0.66 -
Which partition most resembles the whole tree (a fair test)?
Out 000000000 000000000A 001110011 000000011B 001110000 000001100C 001100011 000111111D 110000000 001111100E 110001101 111111101F 110001100 111111100
Non-parametric bootstrap resample the larger partition down to the size of the smaller partition
Calculate MP solutions for each (now equally sized partition)
Find Tree-Tree distance (partition<->whole) for each
Significance: Mann-Whitney U test each pool (partition) of replicates
Recap & Conclusions
'Apparent' significant incongruence of phylogenetic signal between cranial and postcranial data partitions is not uncommon, however one choses to measure it (ILD, IRD or “phenetic” ILD)
At present, there appears to be no obvious pattern to the distribution of significantly incongruent datasets (but since overall sample size is low, things may change with more data)
Further work needs to be done to properly distinguish between 'global incongruence' and strong 'local incongruence' (Baker ea., 2001)
To this end, further development of character compatibility methods (sensu Meacham & Estabrook, 1985) would be ideal, so that one can get informative statistics per-character rather than per-partition
AcknowledgementsFunding Computational Resources
Help and Guidance
Matthew WillsSylvain Gerber
Biodiversity Lab 1.07Ward Wheeler
All authors who kindly provided me their data(Marie Stopes Travel Grant)
...and now for something a bit different!
“Science is based on building on, reusing and openly criticising the
published body of scientific knowledge”
http://pantonprinciples.org/
Mounce, Butler, Davis, Dunhill, Garwood, Lamsdell, Legg, Lloyd, Pittman, Warnock, Wolfe+ ~150 signatories inc Benton, Ruta, Rannala, Wagner, Upchurch, Sutton, Farke, Dunlop...
The Next Step...
Publish your underlying data (set a good example!)
If you spot a paper in which the results cannot be replicated from the data given, because
a) necessary data is missingorb) data given doesn't give the results they say it does
DO SOMETHING ABOUT IT!
(you may get rewarded with a Nature paper!)