on clustering financial time series - a need for distances between dependent random variables

43
Introduction Dependence and Distribution Toward an extension to the multivariate case On clustering financial time series A need for distances between dependent random variables Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 24 September 2015 Gautier Marti, Frank Nielsen On clustering financial time series

Upload: hellebore-capital-limited

Post on 16-Jan-2017

464 views

Category:

Science


1 download

TRANSCRIPT

Page 1: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

On clustering financial time seriesA need for distances between dependent random variables

Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat

24 September 2015

Gautier Marti, Frank Nielsen On clustering financial time series

Page 2: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

1 Introduction

2 Dependence and Distribution

3 Toward an extension to the multivariate case

Gautier Marti, Frank Nielsen On clustering financial time series

Page 3: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Motivations: Why clustering?

Motivations:

Mathematical finance: Use of variance-covariance matrices(e.g., Markowitz, Value-at-Risk)

Stylized fact: Empiricalvariance-covariance matricesestimated on financial timeseries are very noisy(Random Matrix Theory,Noise Dressing of FinancialCorrelation Matrices, Lalouxet al, 1999)

Figure: Marchenko-Pasturdistribution vs. eigenvalues of theempirical correlation matrix

How to filter these variance-covariance matrices?

Gautier Marti, Frank Nielsen On clustering financial time series

Page 4: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Information filtering? Clustering!

Mantegna (1999) et al’s work:

Limits: focus on ρij (Pearson correlation) which is not robust tooutliers / heavy tails → could lead to spurious clusters

Gautier Marti, Frank Nielsen On clustering financial time series

Page 5: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Modelling

Asset i variations or returns follow random variable Xi

Assets variations or returns are ”correlated”

i.i.d. observations:

X1 : X 11 , X 2

1 , . . . , XT1

X2 : X 12 , X 2

2 , . . . , XT2

. . . , . . . , . . . , . . . , . . .XN : X 1

N , X 2N , . . . , XT

N

Which distances d(Xi ,Xj) between dependent random variables?

Gautier Marti, Frank Nielsen On clustering financial time series

Page 6: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

1 Introduction

2 Dependence and Distribution

3 Toward an extension to the multivariate case

Gautier Marti, Frank Nielsen On clustering financial time series

Page 7: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Pitfalls of a basic distance

Let (X ,Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σ2X ),

Y ∼ N (µY , σ2Y ) and whose correlation is ρ(X ,Y ) ∈ [−1, 1].

E[(X − Y )2] = (µX − µY )2 + (σX − σY )2 + 2σXσY (1− ρ(X ,Y ))

Now, consider the following values for correlation:

ρ(X ,Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2X + σ2

Y .Assume µX = µY and σX = σY . For σX = σY � 1, weobtain E[(X − Y )2]� 1 instead of the distance 0, expectedfrom comparing two equal Gaussians.

ρ(X ,Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 8: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Pitfalls of a basic distance

(Marti, Nielsen, Very, Donnat, ICMLA 2015)

Gautier Marti, Frank Nielsen On clustering financial time series

Page 9: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

The Financial Engineer Bias: Correlation

correlation patterns are blatant

Mantegna et al. aim at filtering information from thecorrelation matrix using clustering

O(N2) (correlation) vs. O(N) (distribution) parameters

Gautier Marti, Frank Nielsen On clustering financial time series

Page 10: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Information Geometry and its statistical distances

original poster: http://www.sonycsl.co.jp/person/nielsen/FrankNielsen-distances-figs.pdf

Gautier Marti, Frank Nielsen On clustering financial time series

Page 11: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Sklar’s Theorem and the Copula Transform

Theorem (Sklar’s Theorem (1959))

For any random vector X = (X1, . . . ,XN) having continuousmarginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P isuniquely expressed as

P(X1, . . . ,XN) = C (P1(X1), . . . ,PN(XN)),

where C, the multivariate distribution of uniform marginals, isknown as the copula of X .

Gautier Marti, Frank Nielsen On clustering financial time series

Page 12: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Sklar’s Theorem and the Copula Transform

Definition (The Copula Transform)

Let X = (X1, . . . ,XN) be a random vector with continuousmarginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N.The random vector

U = (U1, . . . ,UN) := P(X ) = (P1(X1), . . . ,PN(XN))

is known as the copula transform.

Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probabilityintegral transform): for Pi the cdf of Xi , we havex = Pi (Pi

−1(x)) = Pr(Xi ≤ Pi−1(x)) = Pr(Pi (Xi ) ≤ x), thus

Pi (Xi ) ∼ U [0, 1].

Gautier Marti, Frank Nielsen On clustering financial time series

Page 13: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Distance Design

d2θ (Xi ,Xj) = θ3E

[|Pi (Xi )− Pj(Xj)|2

]+ (1− θ)

1

2

∫R

(√dPi

dλ−√

dPj

)2

Gautier Marti, Frank Nielsen On clustering financial time series

Page 14: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Results: Data from Hierarchical Block Model

Adjusted Rand IndexAlgo. Distance A B C

HC-AL

(1− ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01

E[(X − Y )2] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05

GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02

GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01

GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01

GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00

GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00

GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08

AP

(1− ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02

E[(X − Y )2] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00

GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02

GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02

GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02

GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01

GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00

GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1

Gautier Marti, Frank Nielsen On clustering financial time series

Page 15: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Results: Data from CDS market

(Marti, Nielsen, Very, Donnat, ICMLA 2015)

Gautier Marti, Frank Nielsen On clustering financial time series

Page 16: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Limits and questions

Why a convex combination? no a priori support from geometryIn practice:

no real control on the weight of correlation and on the weightof distribution

stability methods are still prone to overfitting for selectingparameters

θ actually depends on the convergence rate of the estimators:correlation measures converge faster than distributionestimation

Gautier Marti, Frank Nielsen On clustering financial time series

Page 17: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

1 Introduction

2 Dependence and Distribution

3 Toward an extension to the multivariate case

Gautier Marti, Frank Nielsen On clustering financial time series

Page 18: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Overview

Gautier Marti, Frank Nielsen On clustering financial time series

Page 19: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Multivariate dependence

What is the state of the art on multivariate dependence?

multivariate mutual information: In information theorythere have been various attempts over the years toextend the definition of mutual information to more thantwo random variables. These attempts have met with agreat deal of confusion and a realization that interactionsamong many random variables are poorly understood.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 20: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Optimal Copula Transport for intra-dependence

Dintra(X1,X2) := EMD(s1, s2),

EMD(s1, s2) := minf

∑1≤i,j≤n

‖pi − qj‖fij

subject to fij ≥ 0, 1 ≤ i, j ≤ n,

n∑j=1

fij ≤ wpi, 1 ≤ i ≤ n,

n∑i=1

fij ≤ wqj, 1 ≤ j ≤ n,

n∑i=1

n∑j=1

fij = 1.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 21: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Optimal Copula Transport for inter-dependence

Gautier Marti, Frank Nielsen On clustering financial time series

Page 22: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Limits and questions

does not scale well with even moderate dimensionality:

density estimationcomputing cost

full parametric approach?

how to connect with the (copula,margins) representation?

information geometry?(approximate) optimal transport?kernel embedding of distributions?

contact: [email protected]

Gautier Marti, Frank Nielsen On clustering financial time series

Page 23: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Daniel Aloise, Amit Deshpande, Pierre Hansen, and PreyasPopat.NP-hardness of Euclidean sum-of-squares clustering.Machine Learning, 75(2):245–248, 2009.

Luigi Ambrosio and Nicola Gigli.A user’s guide to optimal transport.In Modelling and optimisation of flows on networks, pages1–155. Springer, 2013.

David Applegate, Tamraparni Dasu, Shankar Krishnan, andSimon Urbanek.Unsupervised clustering of multidimensional distributions usingearth mover distance.In Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining, pages636–644. ACM, 2011.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 24: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Shai Ben-David, Ulrike Von Luxburg, and David Pal.A sober look at clustering stability.In Learning theory, pages 5–19. Springer, 2006.

Petro Borysov, Jan Hannig, and JS Marron.Asymptotics of hierarchical clustering for growing dimension.Journal of Multivariate Analysis, 124:465–479, 2014.

Leo Breiman and Jerome H Friedman.Estimating optimal transformations for multiple regression andcorrelation.Journal of the American statistical Association, 80(391):580–598, 1985.

Joel Bun, Romain Allez, Jean-Philippe Bouchaud, and MarcPotters.Rotational invariant estimator for general noisy matrices.arXiv preprint arXiv:1502.06736, 2015.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 25: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Gunnar Carlsson and Facundo Memoli.Characterization, stability and convergence of hierarchicalclustering methods.The Journal of Machine Learning Research, 11:1425–1470,2010.

Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum,Anthony Bagnall, Abdullah Mueen, and Gustavo Batista.The UCR time series classification archive, July 2015.www.cs.ucr.edu/~eamonn/time_series_data/.

Tamraparni Dasu, Deborah F Swayne, and David Poole.Grouping multivariate time series: A case study.In Proceedings of the IEEE Workshop on Temporal DataMining: Algorithms, Theory and Applications, in conjunctionwith the Conference on Data Mining, Houston, pages 25–32,2005.

Paul Deheuvels.Gautier Marti, Frank Nielsen On clustering financial time series

Page 26: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

La fonction de dependance empirique et ses proprietes. un testnon parametrique d’independance.Acad. Roy. Belg. Bull. Cl. Sci.(5), 65(6):274–292, 1979.

Paul Deheuvels.An asymptotic decomposition for multivariate distribution-freetests of independence.Journal of Multivariate Analysis, 11(1):102–113, 1981.

T Di Matteo, T Aste, ST Hyde, and S Ramsden.Interest rates hierarchical structure.Physica A: Statistical Mechanics and its Applications, 355(1):21–33, 2005.

T Di Matteo, Francesca Pozzi, and Tomaso Aste.The use of dynamical networks to detect the hierarchicalorganization of financial market sectors.The European Physical Journal B-Condensed Matter andComplex Systems, 73(1):3–11, 2010.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 27: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Francis X Diebold and Canlin Li.Forecasting the term structure of government bond yields.Journal of econometrics, 130(2):337–364, 2006.

A Adam Ding and Yi Li.Copula correlation: An equitable dependence measure andextension of pearson’s correlation.arXiv preprint arXiv:1312.7214, 2013.

Bradley Efron.Bootstrap methods: another look at the jackknife.The annals of Statistics, pages 1–26, 1979.

Gal Elidan.Copulas in machine learning.In Copulae in Mathematical and Quantitative Finance, pages39–60. Springer, 2013.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 28: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Sira Ferradans, Nicolas Papadakis, Julien Rabin, Gabriel Peyre,and Jean-Francois Aujol.Regularized discrete optimal transport.Springer, 2013.

Hans Gebelein.Das statistische problem der korrelation als variations-undeigenwertproblem und sein zusammenhang mit derausgleichsrechnung.ZAMM-Journal of Applied Mathematics andMechanics/Zeitschrift fur Angewandte Mathematik undMechanik, 21(6):364–379, 1941.

Cyril Goutte, Peter Toft, Egill Rostrup, Finn A Nielsen, andLars Kai Hansen.On clustering fMRI time series.NeuroImage, 9(3):298–310, 1999.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 29: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Clive WJ Granger and Paul Newbold.Spurious regressions in econometrics.Journal of econometrics, 2(2):111–120, 1974.

Isabelle Guyon, Ulrike Von Luxburg, and Robert C Williamson.Clustering: Science or art.In NIPS 2009 Workshop on Clustering Theory, 2009.

Jiang Hangjin and Ding Yiming.Equitability of dependence measure.stat, 1050:9, 2015.

Keith Henderson, Brian Gallagher, and Tina Eliassi-Rad.EP-MEANS: An efficient nonparametric clustering of empiricalprobability distributions.2015.

Weiming Hu, Tieniu Tan, Liang Wang, and Steve Maybank.A survey on visual surveillance of object motion and behaviors.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 30: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Systems, Man, and Cybernetics, Part C: Applications andReviews, IEEE Transactions on, 34(3):334–352, 2004.

John C Hull.Options, futures, and other derivatives.Pearson Education, 2006.

Anil K Jain.Data clustering: 50 years beyond k-means.Pattern recognition letters, 31(8):651–666, 2010.

Konstantinos Kalpakis, Dhiral Gada, and VasundharaPuttagunta.Distance measures for effective clustering of ARIMAtime-series.In Data Mining, 2001. ICDM 2001, Proceedings IEEEInternational Conference on, pages 273–280. IEEE, 2001.

M Kanevski, V Timonin, A Pozdnoukhov, and M Maignan.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 31: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Evolution of interest rate curve: empirical analysis of patternsusing nonlinear clustering tools.In European Symposium on Time Series Prediction, 2008.

Leonid Vitalievich Kantorovich.On the translocation of masses.In Dokl. Akad. Nauk SSSR, volume 37, pages 199–201, 1942.

Justin B Kinney and Gurinder S Atwal.Equitability, mutual information, and the maximal informationcoefficient.Proceedings of the National Academy of Sciences, 111(9):3354–3359, 2014.

Jon M. Kleinberg.An impossibility theorem for clustering.In S. Thrun and K. Obermayer, editors, Advances in NeuralInformation Processing Systems 15, pages 446–453. MITPress, Cambridge, MA, 2002.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 32: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

URLhttp://books.nips.cc/papers/files/nips15/LT17.pdf.

Laurent Laloux, Pierre Cizeau, Marc Potters, andJean-Philippe Bouchaud.Random matrix theory and financial correlations.International Journal of Theoretical and Applied Finance, 3(03):391–397, 2000.

Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong,and Mark Flood.Clustering techniques and their effect on portfolio formationand risk analysis.In Proceedings of the International Workshop on Data Sciencefor Macro-Modeling, pages 1–6. ACM, 2014.

Erel Levine and Eytan Domany.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 33: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Resampling method for unsupervised estimation of clustervalidity.Neural computation, 13(11):2573–2593, 2001.

T Warren Liao.Clustering of time series data—a survey.Pattern recognition, 38(11):1857–1874, 2005.

Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu.A symbolic representation of time series, with implications forstreaming algorithms.In Proceedings of the 8th ACM SIGMOD workshop onResearch issues in data mining and knowledge discovery, pages2–11. ACM, 2003.

Jessica Lin, Michail Vlachos, Eamonn Keogh, and DimitriosGunopulos.Iterative incremental clustering of time series.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 34: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

In Advances in Database Technology-EDBT 2004, pages106–122. Springer, 2004.

Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi.Experiencing SAX: a novel symbolic representation of timeseries.Data Mining and knowledge discovery, 15(2):107–144, 2007.

David Lopez-Paz, Philipp Hennig, and Bernhard Scholkopf.The randomized dependence coefficient.arXiv preprint arXiv:1304.7717, 2013.

Rosario N Mantegna.Hierarchical structure in financial markets.The European Physical Journal B-Condensed Matter andComplex Systems, 11(1):193–197, 1999.

Martin Martens and Ser-Huang Poon.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 35: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Returns synchronization and daily correlation dynamicsbetween international stock markets.Journal of Banking & Finance, 25(10):1805–1827, 2001.

Gautier Marti, Philippe Donnat, Frank Nielsen, and PhilippeVery.HCMapper: An interactive visualization tool to comparepartition-based flat clustering extracted from pairs ofdendrograms.arXiv preprint arXiv:1507.08137, 2015a.

Gautier Marti, Philippe Very, and Philippe Donnat.Toward a generic representation of random variables formachine learning.arXiv preprint arXiv:1506.00976, 2015b.

Sergio Mayordomo, Juan Ignacio Pena, and Eduardo SSchwartz.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 36: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Are all credit default swap databases equal?Technical report, National Bureau of Economic Research,2010.

Sergio Mayordomo, Juan Ignacio Pena, and Eduardo SSchwartz.Are all credit default swap databases equal?European Financial Management, 20(4):677–713, 2014.

Gaspard Monge.Memoire sur la theorie des deblais et des remblais.De l’Imprimerie Royale, 1781.

James Munkres.Algorithms for the assignment and transportation problems.Journal of the Society for Industrial and Applied Mathematics,5(1):32–38, 1957.

Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 37: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Relation between financial market structure and the realeconomy: Comparison between clustering methods.Available at SSRN 2525291, 2014.

Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo.Relation between financial market structure and the realeconomy: comparison between clustering methods.2015.

Roger B Nelsen.An introduction to copulas, volume 139.Springer Science & Business Media, 2013.

Dominic O’Kane.Modelling single-name and multi-name credit derivatives,volume 573.John Wiley & Sons, 2011.

Barnabas Poczos, Zoubin Ghahramani, and Jeff Schneider.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 38: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Copula-based kernel dependency measures.arXiv preprint arXiv:1206.4682, 2012.

David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon RGrossman, Gilean McVean, Peter J Turnbaugh, Eric S Lander,Michael Mitzenmacher, and Pardis C Sabeti.Detecting novel associations in large data sets.science, 334(6062):1518–1524, 2011.

David N Reshef, Yakir A Reshef, Pardis C Sabeti, andMichael M Mitzenmacher.An empirical study of leading measures of dependence.arXiv preprint arXiv:1505.02214, 2015a.

Yakir A Reshef, David N Reshef, Hilary K Finucane, Pardis CSabeti, and Michael M Mitzenmacher.Measuring dependence powerfully and equitably.arXiv preprint arXiv:1505.02213, 2015b.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 39: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Yakir A Reshef, David N Reshef, Pardis C Sabeti, andMichael M Mitzenmacher.Equitability, interval estimation, and statistical power.arXiv preprint arXiv:1505.02212, 2015c.

Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas.The earth mover’s distance as a metric for image retrieval.International journal of computer vision, 40(2):99–121, 2000.

Daniil Ryabko.Clustering processes.arXiv preprint arXiv:1004.5194, 2010.

Ohad Shamir and Naftali Tishby.Cluster stability for finite samples.In NIPS, 2007.

Robert H Shumway.Time-frequency clustering and discriminant analysis.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 40: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Statistics & probability letters, 63(3):307–314, 2003.

Noah Simon and Robert Tibshirani.Comment on” detecting novel associations in large data sets”by reshef et al, science dec 16, 2011.arXiv preprint arXiv:1401.7645, 2014.

Ashish Singhal and Dale E Seborg.Clustering of multivariate time-series data.Journal of Chemometrics, 19:427—-438, 2005.

A Sklar.Fonctions de repartition a n dimensions et leurs marges.Universite Paris 8, 1959.

Won-Min Song, T Di Matteo, and Tomaso Aste.Hierarchical information clustering by means of topologicallyembedded graphs.PLoS One, 7(3):e31929, 2012.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 41: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, andPhilip S Yu.Graphscope: parameter-free mining of large time-evolvinggraphs.In Proceedings of the 13th ACM SIGKDD internationalconference on Knowledge discovery and data mining, pages687–696. ACM, 2007.

Gabor J Szekely, Maria L Rizzo, Nail K Bakirov, et al.Measuring and testing dependence by correlation of distances.The Annals of Statistics, 35(6):2769–2794, 2007.

Chayant Tantipathananandh and Tanya Y Berger-Wolf.Finding communities in dynamic social networks.In Data Mining (ICDM), 2011 IEEE 11th InternationalConference on, pages 1236–1241. IEEE, 2011.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 42: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario NMantegna.Cluster analysis for portfolio optimization.Journal of Economic Dynamics and Control, 32(1):235–258,2008.

Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, andRosario N Mantegna.A tool for filtering information in complex systems.Proceedings of the National Academy of Sciences of theUnited States of America, 102(30):10421–10426, 2005.

Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna.Correlation, hierarchies, and networks in financial markets.Journal of Economic Behavior & Organization, 75(1):40–58,2010.

Cedric Villani.

Gautier Marti, Frank Nielsen On clustering financial time series

Page 43: On clustering financial time series - A need for distances between dependent random variables

IntroductionDependence and Distribution

Toward an extension to the multivariate case

Optimal transport: old and new, volume 338.Springer Science & Business Media, 2008.

Kiyoung Yang and Cyrus Shahabi.A pca-based similarity measure for multivariate time series.In Proceedings of the 2nd ACM international workshop onMultimedia databases, pages 65–74. ACM, 2004.

Kiyoung Yang and Cyrus Shahabi.On the stationarity of multivariate time series forcorrelation-based data analysis.In Data Mining, Fifth IEEE International Conference on, pages4–pp. IEEE, 2005.

Gautier Marti, Frank Nielsen On clustering financial time series