neuroinformatics 18: the bootstrap kenneth d. harris ucl, 5/8/15
TRANSCRIPT
Neuroinformatics18: the bootstrap
Kenneth D. HarrisUCL, 5/8/15
Types of data analysis
• Exploratory analysis• Graphical• Interactive• Aimed at formulating hypotheses• No rules – whatever helps you find a hypothesis
• Confirmatory analysis• For testing hypotheses once they have been formulated• Several frameworks for testing hypotheses• Rules need to be followed
Confidence interval
• Probability distribution characterized by parameter
• Classical statistics: • is random, but is not. has a true value, which we don’t know.• We don’t want to make incorrect statements more than 5% of the time.
• Confidence interval: from data , compute an interval so with 95% probability (whatever the actual value of ).
How to compute a confidence interval• Most often:• Assume that is a known distribution family (e.g. Gaussian, Poisson)• Look up formula for confidence interval in a textbook, or use standard
software
• Assumptions:• Your assumed distribution is appropriate• (Often) the sample is sufficiently large
The bootstrap
• An alternative way to compute confidence intervals, that does not require an assumption for the form of .
• “… I found myself stunned, and in a hole nine fathoms under the grass, when I recovered, hardly knowing how to get out again. Looking down, I observed that I had on a pair of boots with exceptionally sturdy straps. Grasping them firmly, I pulled with all my might. Soon I had hoist myself to the top and stepped out on terra firma without further ado.” - Singular Travels, Campaigns and Adventures of Baron Munchausen, ed. J. Carswell, 1948
Use the bootstrap with caution
• It looks simple, but…
• There are many subtly different variants of the bootstrap• Different variants work in different situations• Often they you false-positive errors (without warning)
• Like Baron Munchausen’s way of getting out of a hole, the bootstrap is not guaranteed to work in all circumstances.
Bootstrap resampling
• Original sample .
• Resample with replacement: choose random integers between and , create resampled data set .
• For example
Simplest method
• “Percentile bootstrap”
• Given estimator of parameter • E.g. sample mean, sample variance, etc.
• Make bootstrap resamples. (At least several thousand)
• Compute confidence interval as 2.5th and 97.5th percentiles of distribution of computed from these resamplings.
An example
• … of why you have to be careful.
• We observe a set of angles . Are they drawn from a uniform distribution?
• Naïve application of bootstrap to compute confidence interval for vector strength
• Gives incorrect result with 100% probability
Circular mean
• Treat angles as points on a circle
• The mean of these gives you• Circular mean • Vector strength
• If all angles are the same:• is this angle• is 1
• If angles are completely uniform• is 0• is meaningless.
𝑧=𝑒𝑖𝜃
𝑧=𝑅𝑒𝑖𝜃
𝜃R
Bootstrap resamples of vector strength
𝑒𝑖𝜃
Circular mean
Bootstrap resamples
95% confidence interval
• The actual vector strength was zero
• There is a 0% chance that this will fall within the bootstrap confidence interval
Why did it go wrong?
• Vector strength is a biased statistic
• The bias gets worse the smaller the sample size
• Bootstrapping makes the equivalent sample size even smaller
• There are variants of the bootstrap that make this kind of mistake less often, but you need to know exactly when to use which version.
Bootstrap vs. permutation test
• Permutation test: is the observed statistic in the null distribution?
• Bootstrap: is the null value in the bootstrap distribution?
95% interval for null distribution
Observed statistic
Observed statistic
95% interval of bootstrap distribution
Null value
When to use the bootstrap
1. When you can’t use a traditional method (e.g. permutation test)
2. When you actually understand the conditions for a particular bootstrap variant to give valid results
3. When you can prove these conditions hold in your circumstance
When NOT to use the bootstrap
• When you tried a traditional test, but it gave you p>0.05