step three: statistical analyses to test biological hypotheses general protocol continued
TRANSCRIPT
Step three: statistical analyses Step three: statistical analyses to test biological hypothesesto test biological hypotheses
General protocol continuedGeneral protocol continued
Biological hypotheses and Biological hypotheses and statistical testsstatistical tests
Hypotheses driven by Hypotheses driven by BiologyBiology
Statistics depend on data and hypothesesStatistics depend on data and hypotheses
NO NEW STATISTICAL TOOLS ARE NEEDED FOR NO NEW STATISTICAL TOOLS ARE NEEDED FOR MORPHOMETRICS!!MORPHOMETRICS!!
Explanatory hypotheses: relative position of Explanatory hypotheses: relative position of specimens in data space:relationship among specimens in data space:relationship among specimens in data spacespecimens in data space
Confirmatory hypotheses: compare groups, Confirmatory hypotheses: compare groups, associate shape with other variables, etc.associate shape with other variables, etc.
Some hypotheses (shape related)Some hypotheses (shape related)
How do populations and species differ?How do populations and species differ?
Does the observed variation generate a Does the observed variation generate a predictable pattern?predictable pattern?
Are there additional factors (ecological, Are there additional factors (ecological, evolutionary) correlated with variation?evolutionary) correlated with variation?
How does shared evolutionary history How does shared evolutionary history affect the observed patterns?affect the observed patterns?
Hypotheses as statistical testsHypotheses as statistical tests
Do populations Do populations differ?differ?
Is there a Is there a predictable predictable pattern?pattern?
Correlated factors?Correlated factors?
Effect of Effect of phylogeny?phylogeny?
MANOVA, CVAMANOVA, CVA
PCA, UPGMAPCA, UPGMA
Regression, 2B-PLSRegression, 2B-PLS
Comparative MethodComparative Method
Exploratory data analysisExploratory data analysis
Investigate data using only Investigate data using only YY-matrix of -matrix of shape variables (PWScores + U1,U2)shape variables (PWScores + U1,U2)Specimens are points in high-dimensional Specimens are points in high-dimensional data spacedata spaceLook for patterns and distributions of Look for patterns and distributions of pointspointsGenerate summary plot of data space Generate summary plot of data space (ordination)(ordination)Look for relationships of points (clustering)Look for relationships of points (clustering)
Ordination and dimension reductionOrdination and dimension reduction
Visualize high dimensional data space as Visualize high dimensional data space as succinctly as possiblesuccinctly as possible
Describe variation in original data with new set of Describe variation in original data with new set of variables (typically orthogonal vectors)variables (typically orthogonal vectors)
Order new variables by variation explained (most Order new variables by variation explained (most – least)– least)
Plot first few dimensions to summarize dataPlot first few dimensions to summarize data
Principal Components Analysis (PCA) one Principal Components Analysis (PCA) one approach (others include: PCoA, MDS, CA, etc.)approach (others include: PCoA, MDS, CA, etc.)
PCA: what does it do?PCA: what does it do?
Rotates data so that main axis of variation (PC1) is Rotates data so that main axis of variation (PC1) is horizontalhorizontal
Subsequent PC axes are orthogonal to PC1, and are ordered Subsequent PC axes are orthogonal to PC1, and are ordered to explain sequentially less variationto explain sequentially less variation
The goal is to explain more variation in fewer dimensionsThe goal is to explain more variation in fewer dimensions
PCA: interpretationsPCA: interpretations
Eigenvectors are Eigenvectors are linear combinations linear combinations of original variables of original variables (interpreted by PC loadings of each variable)(interpreted by PC loadings of each variable)PCA PCA PRESERVES EUCLIDEAN DISTANCES PRESERVES EUCLIDEAN DISTANCES among objectsamong objectsPCA does PCA does NOTHING NOTHING to the data, except rotate it to axes to the data, except rotate it to axes expressing the most variation; it loses expressing the most variation; it loses NO INFORMATIONNO INFORMATION (if all PC vectors retained)(if all PC vectors retained)If the original variables are uncorrelated, PCA not helpful in If the original variables are uncorrelated, PCA not helpful in reducing dimensionality of datareducing dimensionality of data
PCA does not PCA does not find find a particular factor a particular factor (e.g., group (e.g., group differences, allometry): it identifies the direction of most differences, allometry): it identifies the direction of most variation, which may be interpretable as a ‘factor’ (but may variation, which may be interpretable as a ‘factor’ (but may not)not)
Example: leatherside chubExample: leatherside chub
ClusteringClustering
Data are dots in a high-dimensional space (Data are dots in a high-dimensional space (YY--matrix)matrix)Can we connect to dots for Can we connect to dots for groupingsgroupings, where , where clusters represent groups of similar specimens?clusters represent groups of similar specimens?Cluster methods generate ‘1-dimensional view’ of Cluster methods generate ‘1-dimensional view’ of relationships, based on some criterionrelationships, based on some criterionClustering requires distance (or similarity) Clustering requires distance (or similarity) between points between points MANY different criteriaMANY different criteria
Clustering is algorithmic, not algebraic (i.e., it is a Clustering is algorithmic, not algebraic (i.e., it is a procedure, or set of rules for connecting data) procedure, or set of rules for connecting data)
Clustering: UPGMAClustering: UPGMA
UC RW LOC PRED PROCRUST NOTAIL 40Group average
1 1 1 1 1 1 3 7 1 3 1 3 2 2 5 7 7 2 5 5 5 5 5 5 7 7 8 4 3 3 3 4 1 7 7 7 7 7 7 4 3 3 3 3 3 3 3 3 7 4 4 2 4 5 8 4 4 4 3 3 3 3 3 7(A
) 2 4 3 7 3 4 4 4 4 4 4 4 4 4 4 6 6 8 6 6 6 6 6 5 6 8 8 2 8 8 7 6 4 4 7 4 4 7 2 2 2 6 2 6 2 2 2 2 5 7 8 4 4 4 4 7 8 2 2 2 8 2 6 6 3 3 3 3 3 4 3 2 2 8 2 2
Samples0
0.05
0.10
0.15
Dis
tan
ce
Resemblance: D1 Euclidean distance
Conclusions: exploratory methodsConclusions: exploratory methods
Useful tools for summarizing shape Useful tools for summarizing shape variationvariation
Help you understand your data through Help you understand your data through visualizing variation (both ordination plots visualizing variation (both ordination plots and cluster diagrams)and cluster diagrams)
Help describe relationships among Help describe relationships among specimens in terms of overall similarityspecimens in terms of overall similarity
Confirmatory data analysisConfirmatory data analysis
Investigate data using shape variables (Investigate data using shape variables (YY--matrix) and other (independent) variables matrix) and other (independent) variables ((XX-matrix)-matrix)
Test for patterns of shape variationTest for patterns of shape variation
Independent variables determine type of Independent variables determine type of statistical teststatistical test
Types of independent variablesTypes of independent variables
Categorical: variables delineating Categorical: variables delineating groups of specimens (e.g., groups of specimens (e.g., male/female, species, etc.)male/female, species, etc.)
Continuous: variables on a Continuous: variables on a continuous scale (e.g., size, continuous scale (e.g., size, moisture, age, etc.)moisture, age, etc.)
Different statistical methods for eachDifferent statistical methods for each
Some statistical testsSome statistical tests
Categorical: shape Categorical: shape differences among differences among groupsgroupsContinuous: relationship Continuous: relationship of variables and shapeof variables and shapeContinuous: association Continuous: association of variables and shapeof variables and shape
MANOVAMANOVA
Mult. RegressionMult. Regression
2B-PLS (2-Block Partial 2B-PLS (2-Block Partial Least squares)Least squares)
MANOVA and multivariate regression are both GLM statistics (General Linear Models)
Group differences: MANOVAGroup differences: MANOVA
Is there a difference in shape between Is there a difference in shape between groups?groups?
Multivariate generalization of ANOVAMultivariate generalization of ANOVA
Compares variation within groups to Compares variation within groups to variation between groupsvariation between groups
Significant MANOVA: Group means are Significant MANOVA: Group means are different in shapedifferent in shape
RW1-RW30 Utah chubRW1-RW30 Utah chub
SourceSource
SexSexLocLoc
Sex X loc Sex X loc IL/SLIL/SL
SizeSize
MANOVAMANOVA
Wilks' Lambda 0.61907356 1.83 30 89 0.0159
Wilks' Lambda 0.75516916 0.96 30 89 0.5318
Wilks' Lambda 0.10138762 1.40 180 533.33 0.0020
Wilks' Lambda 0.00308619 3.26 240 706.35 <.0001
Wilks' Lambda 0.38888016 4.66 30 89 <.0001
MANOVA: post hoc testsMANOVA: post hoc tests
Pairwise comparisons using Generalized Pairwise comparisons using Generalized Mahalanobis Distance (DMahalanobis Distance (D22 or D) or D)
Convert DConvert D22 →→TT2 2 → → F to testF to testFor experiment-wise error rate, adjust For experiment-wise error rate, adjust using Bonferroni:using Bonferroni:
α exp = α / # comparisons
Discriminant analysis: CVA & DFADiscriminant analysis: CVA & DFA
‘‘Combination’ of MANOVA and PCACombination’ of MANOVA and PCATests for group differences (MANOVA)Tests for group differences (MANOVA)PCA of among-group variation relative to PCA of among-group variation relative to within-group variationwithin-group variationSuggests which groups differ on which Suggests which groups differ on which variablesvariablesCan ‘classify’ specimens to groupsCan ‘classify’ specimens to groups
Special case: 2 groups= discriminant Special case: 2 groups= discriminant function analysis (DFA)function analysis (DFA)
DFA/CVA: post-hoc testsDFA/CVA: post-hoc testsFor DFA/CVA, compare difference among groups using Generalized For DFA/CVA, compare difference among groups using Generalized Mahalanobis Distance (DMahalanobis Distance (D22))Mahalanobis Mahalanobis DD2 2 is logical choice because CVA/DFA is MANOVA, is logical choice because CVA/DFA is MANOVA, and the PCA is relative to within-group variability (i.e., VCV and the PCA is relative to within-group variability (i.e., VCV ‘standardized’)‘standardized’)Convert DConvert D22 →→TT2 2 → → FF to perform statistical test to perform statistical testExperiment-wise error rate adjusted as before (i.e., Experiment-wise error rate adjusted as before (i.e., adjusted adjusted α)α)
Continuous variation: regressionContinuous variation: regression
Is there a relationship between shape and Is there a relationship between shape and some other variable?some other variable?
Multivariate regression of shape on Multivariate regression of shape on continuous variablecontinuous variable
Significant regression implies shape Significant regression implies shape changes as a function of other variable changes as a function of other variable (e.g., size)(e.g., size)
Example of shape on size in Example of shape on size in mountain suckermountain sucker
Multivariate tests of significance: Statistic Value Fs df1 df2 Prob Wilks' Lambda: 0.34356565 22.822 36 430.0 3.580E-078 Pillai's trace: 0.65643435 22.822 36 430.0 3.580E-078 Hotelling-Lawley trace: 1.91065190 22.822 36 430.0 3.580E-078 Roy's maximum root: 1.91065190 22.822 36 430.0 3.580E-078
Test that kth root and those that follow are zero: k U Fs df1 df2 Prob 1 0.34356565 22.822 36 430.0 3.580E-078
Continuous variation: association Continuous variation: association 2B-PLS2B-PLS
Is there an association between shape and some other set Is there an association between shape and some other set of variables (not causal)?of variables (not causal)?Find pairs of linear combinations for X & Y that maximize Find pairs of linear combinations for X & Y that maximize the the covariation covariation between data setsbetween data setsLinear combinations are constrained to be orthogonal Linear combinations are constrained to be orthogonal within each set (like PC axes) but within each set (like PC axes) but NOT NOT between data setsbetween data setsCalculations less complicated for 2B-PLS (because fewer Calculations less complicated for 2B-PLS (because fewer mathematical constraints)mathematical constraints)Analogous to ‘multivariate correlation’Analogous to ‘multivariate correlation’
2B-PLS is called 2B-PLS is called SINGULAR WARPSSINGULAR WARPS when shape is one or when shape is one or more of the data sets. Bookstein et al., 2003: more of the data sets. Bookstein et al., 2003: J. of Hum. J. of Hum. Evol.)Evol.)
Resampling methodsResampling methods
Methods that take many samples from original data Methods that take many samples from original data set in some specified way and evaluate the set in some specified way and evaluate the significance of the original based on these samplessignificance of the original based on these samplesResampling approaches are nonparametric, because Resampling approaches are nonparametric, because they do not depend of theoretical distributions for they do not depend of theoretical distributions for significance testing (they generate a distribution from significance testing (they generate a distribution from the data)the data)Are very flexible, and can allow for complicated Are very flexible, and can allow for complicated designsdesigns
Very useful in morphometrics, and can be used for:Very useful in morphometrics, and can be used for:• Testing standard designsTesting standard designs• Testing non-standard designsTesting non-standard designs• Testing when sample sizes small relative to # of Testing when sample sizes small relative to # of
variablesvariables
Randomization (permutation)Randomization (permutation)Proposed by Fisher (1935) for assessing Proposed by Fisher (1935) for assessing significance of 2-sample comparison (Fisher’s exact significance of 2-sample comparison (Fisher’s exact test)test)Fisher’s exact test: a total enumeration of possible Fisher’s exact test: a total enumeration of possible pairings of datapairings of dataRandomization can be used to determine most any Randomization can be used to determine most any test statistic test statistic ProtocolProtocol
• Calculate observed statistic (e.g., T-statistic): ECalculate observed statistic (e.g., T-statistic): Eobsobs
• Reorder data set (i.e. randomly shuffle data) and recalculate statistic EReorder data set (i.e. randomly shuffle data) and recalculate statistic Erandrand
• Repeat many times to generate distribution of statisticRepeat many times to generate distribution of statistic• Percentage of EPercentage of Erand rand more extreme than Emore extreme than Eobs obs is significance levelis significance level
Randomization: commentsRandomization: comments
Randomization EXTREMELY useful and flexible Randomization EXTREMELY useful and flexible techniquetechniqueHow and what to resample depends upon data How and what to resample depends upon data and hypothesisand hypothesis
• Regression and correlation: shuffle Y vs. XRegression and correlation: shuffle Y vs. X• Group comparison (e.g., ANOVA): shuffle Y on Group comparison (e.g., ANOVA): shuffle Y on
groupsgroups• Some tests (e.g., t-test) may depend on direction Some tests (e.g., t-test) may depend on direction
(1-tailed vs. 2-tailed)(1-tailed vs. 2-tailed)
Also useful when no theoretical distribution exists Also useful when no theoretical distribution exists for statistic, or when design is ‘non-standard’for statistic, or when design is ‘non-standard’This is frequently the case in E&E studiesThis is frequently the case in E&E studies
Step four: Graphical depiction of Step four: Graphical depiction of resultsresults
Strength of landmark-based TPS Strength of landmark-based TPS approachapproach
Can view deformation of TPS grid among Can view deformation of TPS grid among groups or with continuous variablegroups or with continuous variable
SuperimpositionSuperimposition
1
2
34
5
6789
10
11
12
13
14
15
1617
18
19
20
21
22
23
24
Effect of relative intestinal length: Effect of relative intestinal length: measure of trophic levelmeasure of trophic level
1
2
3
4
5 678 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1
2
3
4
5
67
89
10
11
12
13
1415
1617 18
19
20
21
22
23
24
Long IL/SL3.0
Short IL/SL0.72
Effect of gradient on shape in Effect of gradient on shape in mountain suckermountain sucker
Low
High
RW 1 40%
-0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08
RW
2 2
0%
-0.06
-0.04
-0.02
0.00
0.02
0.04
0.06
0.08
loc 1 nonloc 8 predloc 4 nonloc 7 predloc 3 nonloc 6 predloc 2 nonloc 6 pred
1
2
345
678 9
10
11
12
13
1415
1617 18
19
20
21
22
23
24
1
2
3
4
5678 9
10
11
12
13
14
15
1617
18
19
20
21
22
23
24
RW1 40%
-0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04
RW
2 2
0%
-0.02
-0.01
0.00
0.01
0.02
0.03
0.04
nonpredpred
1
2
34
5678 9
10
11
12
13
14
15
1617
18
1920
21
22
23
24
1
2
34
5678 9
10
11
12
13
1415
1617 18
19
20
21
22
23
24
1
2
34
5
6789
10
11
12
13
1415
1617 18
19
20
21
22
23
24