robust statistics and some non-parametric statistics - unige · robust statistics and some...
TRANSCRIPT
![Page 1: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/1.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimators
Robust statisticsand some non-parametric statistics
Stéphane Paltani
ISDC Data Center for AstrophysicsAstronomical Observatory of the University of Geneva
Statistics Course for Astrophysicists, 2010–2011
![Page 2: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/2.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimators
Outline
Why robust statistics?
Removal of data
L-estimators
R-estimators
M-estimators
![Page 3: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/3.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimators
Caveat
The presentation aims at being user-focused and atpresenting usable recipes
Do not expect a fully mathematically rigorous description!
This has been prepared in the hope to be useful even inthe stand-alone version
Please provide me feedback with any misconception orerror you may find, and with any topic not addressed herebut which should be included
![Page 4: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/4.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?First example
General concepts
Data representation
Removal of data
L-estimators
R-estimators
M-estimators
Outline
Why robust statistics?First exampleGeneral conceptsData representation
Removal of data
L-estimators
R-estimators
M-estimators
![Page 5: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/5.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?First example
General concepts
Data representation
Removal of data
L-estimators
R-estimators
M-estimators
Example: The average weight of opossums
I Find a group of N opossumsI Weight them: wi , i = 1, . . . ,NI Calculate the average weight as 1/N
∑i wi
I You should get about 3.5 kgI But you better not use this group!
![Page 6: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/6.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?First example
General concepts
Data representation
Removal of data
L-estimators
R-estimators
M-estimators
Outline
Why robust statistics?First exampleGeneral conceptsData representation
Removal of data
L-estimators
R-estimators
M-estimators
![Page 7: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/7.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?First example
General concepts
Data representation
Removal of data
L-estimators
R-estimators
M-estimators
General concepts
Robust statistics:I are not (less) affected by the presence of outliers or
deviations from model assumptions
I are related, but not identical to non-parametricstatistics, where we drop the hypothesis ofunderlying Gaussian distribution
![Page 8: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/8.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?First example
General concepts
Data representation
Removal of data
L-estimators
R-estimators
M-estimators
Main approaches
Four main approaches:I Discard data
I L-estimators: Use linear combinations of orderstatistics
I R-estimators: Use rank instead of values
I M-estimators: Estimators based onMaximum-likelihood argument
The breakdown point is the fraction of outliers abovewhich the robust statistics fails. It can reach 50 %
![Page 9: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/9.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?First example
General concepts
Data representation
Removal of data
L-estimators
R-estimators
M-estimators
Outline
Why robust statistics?First exampleGeneral conceptsData representation
Removal of data
L-estimators
R-estimators
M-estimators
![Page 10: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/10.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?First example
General concepts
Data representation
Removal of data
L-estimators
R-estimators
M-estimators
Quantiles
I q-quantiles (q ∈ N) are the set of valuex[a], a = k/q, k = 1, . . . ,q − 1 from a probabilitydistribution for which P(x < x[a]) = a
I 4-quantiles, or quartiles are then x[0.25], x[0.5], x[0.75]
I q-quantiles from a sorted sample {xi}, i = 1, . . . ,N :x[a] = mini xi | i/N ≥ a
I Example: if N = 12, quartiles are: x3, x6, x9;if N = 14, quartiles are: x4, x7, x11
![Page 11: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/11.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?First example
General concepts
Data representation
Removal of data
L-estimators
R-estimators
M-estimators
Box plot
I A box plot is a synthetic way to look at the sampleproperties
I It represents in a single plot, the minimum, thequartiles (hence also the median) and the maximum
I that is, the quantiles x[0], x[0.25], x[0.5], x[0.75], x[1.0]
![Page 12: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/12.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?First example
General concepts
Data representation
Removal of data
L-estimators
R-estimators
M-estimators
Q-Q plot
I A Q-Q plot (quantile-quantile plot) is powerful way tocompare sample properties with an assumedunderlying distribution
I The QQ-plot is the plot of xi vs fi , where fi are thevalues for which P(x < fi) = i/(N + 1), when xi issorted in increasing order
I One usually also draw fi vs fi for comparison withexpected values
I Outliers appear quite clearly
![Page 13: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/13.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
Outline
Why robust statistics?
Removal of dataCaveat emptorChauvenet’s criterionDixon’s Q testTrimmed estimatorsWinsorized estimators
L-estimators
R-estimators
M-estimators
![Page 14: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/14.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
What are these outliers?
I They can be due to errors in the measurementI They may be bona fide measurements, but the
distribution is heavy-tailed, so these points may bethe black swan and contain important information
I Outliers can be difficult to identify with confidence,and they can hide each other
I There is the risk of data manipulation
![Page 15: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/15.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
Outline
Why robust statistics?
Removal of dataCaveat emptorChauvenet’s criterionDixon’s Q testTrimmed estimatorsWinsorized estimators
L-estimators
R-estimators
M-estimators
![Page 16: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/16.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
Chauvenet’s criterion
I Assuming a data set {xi} ∼ N (µ, σ), i = 1, . . . ,NI Calculate for each i P(xi) ≡ P(|x | > |xi |)I Discard the point if N · P(xi) < 0.5, i.e. if the
probability to have such an extreme value is lessthan 50%, taking the number of trials into account
I Moderate outliers may mask more extreme outliersI Grubbs’s test uses absolute maximum deviation
G = maxi|xi−µ|σ
![Page 17: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/17.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
Outline
Why robust statistics?
Removal of dataCaveat emptorChauvenet’s criterionDixon’s Q testTrimmed estimatorsWinsorized estimators
L-estimators
R-estimators
M-estimators
![Page 18: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/18.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
Dixon’s Q test
I Dixon’s Q test: Find the largest gap in a sortedsample, and divide it by the total range
I In this case: Q = 25003500 ' 0.71
I Critical values:
![Page 19: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/19.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
Outline
Why robust statistics?
Removal of dataCaveat emptorChauvenet’s criterionDixon’s Q testTrimmed estimatorsWinsorized estimators
L-estimators
R-estimators
M-estimators
![Page 20: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/20.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
Trimmed estimators
I Trimming is a generic method to to make anestimator robust
I The n% trimmed estimator is obtained by calculatingthe estimator on the sample limited to the range[x[n%], x[1−n%]]
I This is not equivalent to removing outliers, as thetrimmed estimators have the same expectations ifthere are no outliers
I The trimmed mean (or truncated mean) is a robustalternative to the mean, which is more dependent onthe distribution than the median
![Page 21: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/21.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
Outline
Why robust statistics?
Removal of dataCaveat emptorChauvenet’s criterionDixon’s Q testTrimmed estimatorsWinsorized estimators
L-estimators
R-estimators
M-estimators
![Page 22: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/22.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of dataCaveat emptor
Chauvenet’s criterion
Dixon’s Q test
Trimmed estimators
Winsorized estimators
L-estimators
R-estimators
M-estimators
Winsorized estimators
I Winsorizing is another generic method to to make anestimator robust which is very similar to trimming
I The n% winsorizing estimator is obtained byreplacing in the sample all values below x[n%] byx[n%] and all values above x[1−n%]] by x[1−n%]]
I I have no idea why you would want to do that...
![Page 23: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/23.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimatorsCentral tendency estimators
Statistical dispersionestimators
R-estimators
M-estimators
Outline
Why robust statistics?
Removal of data
L-estimatorsCentral tendency estimatorsStatistical dispersion estimators
R-estimators
M-estimators
![Page 24: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/24.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimatorsCentral tendency estimators
Statistical dispersionestimators
R-estimators
M-estimators
Median
I The median x̃ is the 2-quantile, i.e. x[0.5], i.e. xN/2 ifN is even or xN+1/2 if N is odd
I If we have 10 “opossums”, the mean will be aboutone ton! It has a breakdown point of 0 %
I The median has a breakdown point of 50 %, i.e. youwould get a roughly correct weight estimations evenif there are 5 mammoths in your populations of 10“opossums”
![Page 25: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/25.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimatorsCentral tendency estimators
Statistical dispersionestimators
R-estimators
M-estimators
Outline
Why robust statistics?
Removal of data
L-estimatorsCentral tendency estimatorsStatistical dispersion estimators
R-estimators
M-estimators
![Page 26: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/26.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimatorsCentral tendency estimators
Statistical dispersionestimators
R-estimators
M-estimators
Median average deviation
I Assuming a sample {xi}, i = 1, . . . ,N, and itsmedian x̃ , the median average deviation isMAD({xi}) = median(|xi − x̃ |)
I Example: {xi} ≡ {1,1,2,2,4,6,9}x̃ = 2, {|xi − x̃ |} ≡ {0,0,1,1,2,4,7}, soMAD({xi}) = 1
I Note that 6 and 9 are completely ignored
I Relation to standard deviation isdistribution-dependent:
I For a Gaussian distribution, σx ' 1.4826 MAD({xi})I For a uniform distribution, σx =
√4/3 MAD({xi})
![Page 27: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/27.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimatorsCentral tendency estimators
Statistical dispersionestimators
R-estimators
M-estimators
Inter-quartile range
I Assuming a sample {xi}, i = 1, . . . ,NI The inter-quartile range is the difference between the
third and first quartiles: IQR({xi}) = x[0.75] − x[0.25]
I For a Gaussian distribution, σx ' IQR({xi})/1.349I This can be used as a (non-robust) normality test
![Page 28: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/28.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Outline
Why robust statistics?
Removal of data
L-estimators
R-estimatorsRanksCorrelation coefficientKolmogorov-Smirnov (and Kuiper) test
M-estimators
![Page 29: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/29.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Ranks
I R-estimators are based on rank
I Given a sample {xi}, i = 1, . . . ,N
I The rank of xi is i if xi is sorted in increasing order
I Example: {xi} ≡ {4,9,6,21,3,11,1}
I The ranks are: {3,5,4,7,2,6,1}, because the sorted{xi} is {1,3,4,6,9,11,21}
![Page 30: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/30.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Outline
Why robust statistics?
Removal of data
L-estimators
R-estimatorsRanksCorrelation coefficientKolmogorov-Smirnov (and Kuiper) test
M-estimators
![Page 31: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/31.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Pearson’s r : a non-robust estimator
I Pearson’s r correlation coefficient between tworandom variables {xi , yi}, i = 1, . . . ,N is: r = xy
σx σy
I But it is not robust. In this case, r = 0.85
I Of course, on can use L-estimators to compute xy ,σx , σy
I But we would have to figure out the critical values
![Page 32: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/32.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Spearman’s correlation coefficient
I Spearman’s coefficient: Replace {xi} and {yi} withtheir ranks, and calculate s as the the Pearson’scorrelation coefficient of the ranks
I Consistent with Pearson’s r in “good” conditionsI It is insensitive to the shape of y vs x and it is robust
I Significance: t = s√
N−21−s2 is approximately Student-t
distributed with N − 2 degrees of freedomI Ties should have the same (average) rank, i.e. if
x6 = x7, give them a rank of 6.5
![Page 33: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/33.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Kendall’s τ rank correlation
I Rank still contain quantitative values
I Kendall’s τ test removes all quantities
I Form N (N − 1)/2 pairs {{xi , yi}; {xj , yj}}, j > i
τ =
∑i,j sgn(xi − xj) sgn(yi − yj)−
∑i,j sgn(xi − xj) sgn(yj − yi)
N (N − 1)/2
I For relatively large sample, τ ∼ N(
0, 2(2N+5)9N(N−1)
)I No single recipe in case of ties
![Page 34: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/34.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Outline
Why robust statistics?
Removal of data
L-estimators
R-estimatorsRanksCorrelation coefficientKolmogorov-Smirnov (and Kuiper) test
M-estimators
![Page 35: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/35.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Do two distributions differ?
I Assuming a sample {xi}, i = 1, . . . ,N, is it aprobable outcome from a draw of N randomvariables from a given distribution (say U(a,b))?
I Similarly, assuming two samples {xi}, i = 1, . . . ,Nand {yj}, j = 1, . . . ,M, how probable is it that theyhave the same parent population?
![Page 36: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/36.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Cumulative comparison
I A (good) idea is to compare cumulative distributions
I F (x) =∫ b
a f (x) dx for a continuous distribution f (x)
I C{xi}(x) =1N∑
i H(x − xi) for a sample (H(x) is theHeaviside function, i.e., 1 if x ≥ 0, 0 if x < 0)
I The KS test is the simplest quantitative comparison:D = maxx |C{xi}(x)− F (x)| orD = maxx |C{xi}(x)− C{yi}(x)|
![Page 37: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/37.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Independence on underlying distribution
I D is preserved under any transformationx → y = ψ(x), where ψ(x) is an arbitrary strictlymonotonic function
I Thus KS test works with any underlying distributionI The null-hypothesis distribution of D is:
P(λ > D) = 2∑∞
k=1 (−1)k−1 e−2k2µ2, with
µ =(√
N + 0.12 + 0.11/√
N)· D
I When comparing two samples, use: Ne =N·MN+M
![Page 38: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/38.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Caveats
I In weird cases, KS-test might be extremelyinefficient. KS test makes hidden assumptions
I KS-test will not work if you derive parameters fromthe data (see “Monte-Carlo methods” course)
![Page 39: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/39.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimatorsRanks
Correlation coefficient
Kolmogorov-Smirnov (andKuiper) test
M-estimators
Kuiper test
I KS test is more sensitive in the tails than in the center
I Among several solutions, the simplest: Kuiper test!V = D+ + D−
I P(λ > V ) = 2∑∞
k=1 (4k2µ2 − 1) e−2k2µ2, with
µ =(√
N + 0.155 + 0.24/√
N)· V
I Kuiper test can be used for distributions on a circle(see Paltani 2004)
![Page 40: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/40.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConcepts
Maximum-likelihoodestimation
So, these M-estimators. . .
Outline
Why robust statistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConceptsMaximum-likelihood estimationSo, these M-estimators. . .
![Page 41: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/41.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConcepts
Maximum-likelihoodestimation
So, these M-estimators. . .
Concepts
I M-estimators are a generalization ofmaximum-likelihood estimators
I So, maximum likelihood is an M-estimator
I It allows to give arbitrary weight to data points
I Weights can be chosen so that they decrease for toofar-off points, opposite to Gaussian weights in leastsquare-estimation
![Page 42: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/42.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConcepts
Maximum-likelihoodestimation
So, these M-estimators. . .
Outline
Why robust statistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConceptsMaximum-likelihood estimationSo, these M-estimators. . .
![Page 43: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/43.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConcepts
Maximum-likelihoodestimation
So, these M-estimators. . .
A case of known outlier distribution
I We search for a Gaussian peak in a very noisyenvironment
I In a fraction f of the cases the right peak is found. Ithas a Gaussian uncertainty
I In a fraction 1− f of the cases a spurious peak isfound. It can be anywhere in the searched range
I We know the real distribution of our measurements
![Page 44: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/44.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConcepts
Maximum-likelihoodestimation
So, these M-estimators. . .
Maximum-likelihood estimation
I We obtain a sample {xi}, i = 1, . . . ,N
I The distribution of xi is f N (µ, σ) + (1− f )U(0,100)
I One can use a maximum likelihood to findparameters µ, σ and f !
L =∏
i
f√2πσ
exp(−(xi − µ)2/2σ2
)+
1− f100
![Page 45: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/45.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConcepts
Maximum-likelihoodestimation
So, these M-estimators. . .
Outline
Why robust statistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConceptsMaximum-likelihood estimationSo, these M-estimators. . .
![Page 46: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/46.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConcepts
Maximum-likelihoodestimation
So, these M-estimators. . .
Model fitting with M-estimators
I Let’s take sample {xi ; yi}, i = 1, . . . ,N, whose errorson yi we know are not normally distributed, and amodel y(x ;p), where p is the parameters of themodel. As above, we have:
L =∏
i
exp (−ρ(yi , y(xi ;p)))
ρ is the negative logarithm of the probabilityI We want then to minimize:∑
i
ρ (yi , y(xi ;p))
I Let’s assume that ρ is local, i.e. ρ(yi , y(xi ;p)) ≡ ρ(z),with z = (yi − y(xi ;p))/λi , i.e. ρ depends only on thedifference with the model scaled by a factor λi
![Page 47: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/47.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConcepts
Maximum-likelihoodestimation
So, these M-estimators. . .
Parameter estimation (cont.)
I Let’s write ψ(z) ≡ dρ(z)dz
I The minimum of L is obtained when:
0 =∑ 1
λiψ
(yi − y(xi ;p)
λi
) (∂y(xi ;p)∂pk
)
I We can solve this equation, or we can minimize∑i ρ(
yi−y(xi ;p)λi
)I ψ(z) acts as a weight in the above equation
![Page 48: Robust statistics and some non-parametric statistics - UNIGE · Robust statistics and some non-parametric statistics Stéphane Paltani ISDC Data Center for Astrophysics Astronomical](https://reader031.vdocuments.us/reader031/viewer/2022020216/5c14cf7109d3f25e338c7c28/html5/thumbnails/48.jpg)
Robust statistics
Stéphane Paltani
Why robuststatistics?
Removal of data
L-estimators
R-estimators
M-estimatorsConcepts
Maximum-likelihoodestimation
So, these M-estimators. . .
Some weights one can think of
I In the Gaussian case, put λi = σi , and we get theleast-squares estimates
I But we have then: ρ(z) = z2/2 and ψ(z) = z
I Two-sided exponential, P(z) ∼ exp(−z):ρ(z) = |z| and ψ(z) = sgn(z)
I Lorentzian, P(z) ∼ 11+z2/2 :
ρ(z) = log(1 + z2/2) and ψ(z) = z1+z2/2