wacha bounliphone, arthur gretton, arthur tenenhaus ... · wacha bounliphone, arthur gretton,...
TRANSCRIPT
A low variance consistent test of relative dependency
A low variance consistent test of relative dependency
Wacha Bounliphone, Arthur Gretton, Arthur Tenenhaus, Matthew Blaschko
32nd International Conference on Machine Learning 2015
CVN – L2S Gatsby Unit Galen Team
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Motivation questions
Tests of dependence : Spearman’s ρ, Kendall’s τ , kernel measure of covariance andcorrelation, distance covariance ...
However, there may be multiple dependencies: Is the dependency betweenEnglish and Dutch stronger than the dependency between English and Spanish ?
H0: Dep(English,Dutch) ≤ Dep(English,Spanish ) - p-value < 10−4
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Motivation questions
Tests of dependence : Spearman’s ρ, Kendall’s τ , kernel measure of covariance andcorrelation, distance covariance ...
However, there may be multiple dependencies: Is the dependency betweenEnglish and Dutch stronger than the dependency between English and Spanish ?
H0: Dep(English,Dutch) ≤ Dep(English,Spanish ) - p-value < 10−4
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Detecting statistical dependence
- How do you detect dependence in structured data ?
X1: Conscious of its spiritual and moral
heritage, the Union is founded on the
indivisible, universal values of human
dignity, freedom, equality and solidarity; it is
based on the principles of democracy and the
rule of law. It places the individual at the heart
of its activities, by establishing the citizenship
of the Union and by creating an area of
freedom, security and justice.
Y1: In dem Bewusstsein ihres geistig-
religiösen und sittlichen Erbes gründet sich
die Union auf die unteilbaren und
universellen Werte der Würde des Menschen,
der Freiheit, der Gleichheit und der
Solidarität. Sie beruht auf den Grundsätzen
der Demokratie und der Rechtsstaatlichkeit.
Sie stellt den Menschen in den Mittelpunkt
ihres Handelns, indem sie die
Unionsbürgerschaft und einen Raum der
Freiheit, der Sicherheit und des Rechts
begründet.
Z1: Consciente de su patrimonio espiritual y
moral, la Unión está fundada sobre los valores
indivisibles y universales de la dignidad
humana, la libertad, la igualdad y la
solidaridad, y se basa en los principios de la
democracia y el Estado de Derecho. Al
instituir la ciudadanía de la Unión y crear un
espacio de libertad, seguridad y justicia, sitúa
a la persona en el centro de su actuación.
X2: The Union contributes to the preservation
and to the development of these common
values while respecting the diversity of the
cultures and traditions of the peoples of
Europe as well as the national identities of the
Member States and the organization of their
public authorities at national, regional and
local levels; it seeks to promote balanced and
sustainable development and ensures free
movement of persons, services, goods and
capital, and the freedom of establishment.
Y2: Die Union trägt zur Erhaltung und zur
Entwicklung dieser gemeinsamen Werte
unter Achtung der Vielfalt der Kulturen und
Traditionen der Völker Europas sowie der
nationalen Identität der Mitgliedstaaten und
der Organisation ihrer staatlichen Gewalt auf
nationaler, regionaler und lokaler Ebene bei.
Sie ist bestrebt, eine ausgewogene und
nachhaltige Entwicklung zu fördern und stellt
den freien Personen-, Dienstleistungs-,
Waren- und Kapitalverkehr sowie die
Niederlassungsfreiheit sicher.
Z2: La Unión contribuye a defender y
fomentar estos valores comunes dentro del
respeto de la diversidad de culturas y
tradiciones de los pueblos de Europa, así
como de la identidad nacional de los Estados
miembros y de la organización de sus poderes
públicos a escala nacional, regional y local;
trata de fomentar un desarrollo equilibrado y
sostenible y garantiza la libre circulación de
personas, servicios, mercancías y capitales,
así como la libertad de establecimiento.
The Union contributes to the preservation
and to the development of these common
values while respecting the diversity of
the cultures and traditions of the peoples
of Europe as well as the national
identities of the Member States and the
organization of their public authorities at
national, regional and local levels; it
seeks to promote balanced and
sustainable development and ensures free
movement of persons, services, goods
and capital, and the freedom of
establishment.
→ K =
Die Union trägt zur Erhaltung und zur
Entwicklung dieser gemeinsamen Werte
unter Achtung der Vielfalt der Kulturen
und Traditionen der Völker Europas
sowie der nationalen Identität der
Mitgliedstaaten und der Organisation
ihrer staatlichen Gewalt auf nationaler,
regionaler und lokaler Ebene bei. Sie ist
bestrebt, eine ausgewogene und
nachhaltige Entwicklung zu fördern und
stellt den freien Personen-,
Dienstleistungs-, Waren- und
Kapitalverkehr sowie die
Niederlassungsfreiheit sicher.
→ L =
⇔
X1: Conscious of its spiritual and moral
heritage, the Union is founded on the
indivisible, universal values of human
dignity, freedom, equality and solidarity; it is
based on the principles of democracy and the
rule of law. It places the individual at the heart
of its activities, by establishing the citizenship
of the Union and by creating an area of
freedom, security and justice.
Y1: De Unie, die zich bewust is van haar
geestelijke en morele erfgoed, heeft haar
grondslag in de ondeelbare en universele
waarden van menselijke waardigheid en van
vrijheid, gelijkheid en solidariteit. Zij berust
op het beginsel van democratie en het
beginsel van de rechtsstaat. De Unie stelt de
mens centraal in haar optreden, door het
burgerschap van de Unie in te stellen en een
ruimte van vrijheid, veiligheid en recht tot
stand te brengen.
Z1: Consciente de su patrimonio espiritual y
moral, la Unión está fundada sobre los valores
indivisibles y universales de la dignidad
humana, la libertad, la igualdad y la
solidaridad, y se basa en los principios de la
democracia y el Estado de Derecho. Al
instituir la ciudadanía de la Unión y crear un
espacio de libertad, seguridad y justicia, sitúa
a la persona en el centro de su actuación.
X2: The Union contributes to the preservation
and to the development of these common
values while respecting the diversity of the
cultures and traditions of the peoples of
Europe as well as the national identities of the
Member States and the organization of their
public authorities at national, regional and
local levels; it seeks to promote balanced and
sustainable development and ensures free
movement of persons, services, goods and
capital, and the freedom of establishment.
Y2: De Unie draagt bij tot de instandhouding
en de ontwikkeling van deze
gemeenschappelijke waarden, met
inachtneming van de verscheidenheid van
cultuur en traditie van de volkeren van
Europa, alsmede van de nationale identiteit
van de lidstaten en van hun staatsinrichting op
nationaal, regionaal en lokaal niveau. Zij
streeft ernaar een evenwichtige en duurzame
ontwikkeling te bevorderen en bewerkstelligt
het vrije verkeer van personen, diensten,
goederen en kapitaal, alsook de vrijheid van
vestiging.
Z2: La Unión contribuye a defender y
fomentar estos valores comunes dentro del
respeto de la diversidad de culturas y
tradiciones de los pueblos de Europa, así
como de la identidad nacional de los Estados
miembros y de la organización de sus poderes
públicos a escala nacional, regional y local;
trata de fomentar un desarrollo equilibrado y
sostenible y garantiza la libre circulación de
personas, servicios, mercancías y capitales,
así como la libertad de establecimiento.
The Union contributes to the preservation
and to the development of these common
values while respecting the diversity of
the cultures and traditions of the peoples
of Europe as well as the national
identities of the Member States and the
organization of their public authorities at
national, regional and local levels; it
seeks to promote balanced and
sustainable development and ensures free
movement of persons, services, goods
and capital, and the freedom of
establishment.
→ K =
De Unie draagt bij tot de instandhouding
en de ontwikkeling van deze
gemeenschappelijke waarden, met
inachtneming van de verscheidenheid
van cultuur en traditie van de volkeren
van Europa, alsmede van de nationale
identiteit van de lidstaten en van hun
staatsinrichting op nationaal, regionaal
en lokaal niveau. Zij streeft ernaar een
evenwichtige en duurzame ontwikkeling
te bevorderen en bewerkstelligt het vrije
verkeer van personen, diensten, goederen
en kapitaal, alsook de vrijheid van
vestiging.
→ L =
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Detecting statistical dependence
X1: Conscious of its spiritual and moral heritage, the Union is founded on the
indivisible, universal values of human dignity, freedom, equality and solidarity; it is
based on the principles of democracy and the rule of law. It places the individual at the
heart of its activities, by establishing the citizenship of the Union and by creating an
area of freedom, security and justice.
X2: The Union contributes to the preservation and to the development of these
common values while respecting the diversity of the cultures and traditions of the
peoples of Europe as well as the national identities of the Member States and the
organization of their public authorities at national, regional and local levels; it seeks
to promote balanced and sustainable development and ensures free movement of
persons, services, goods and capital, and the freedom of establishment.
→ K =
Y1: De Unie, die zich bewust is van haar geestelijke en morele erfgoed, heeft haar
grondslag in de ondeelbare en universele waarden van menselijke waardigheid en van
vrijheid, gelijkheid en solidariteit. Zij berust op het beginsel van democratie en het
beginsel van de rechtsstaat. De Unie stelt de mens centraal in haar optreden, door het
burgerschap van de Unie in te stellen en een ruimte van vrijheid, veiligheid en recht
tot stand te brengen.
Y2: De Unie draagt bij tot de instandhouding en de ontwikkeling van deze
gemeenschappelijke waarden, met inachtneming van de verscheidenheid van cultuur
en traditie van de volkeren van Europa, alsmede van de nationale identiteit van de
lidstaten en van hun staatsinrichting op nationaal, regionaal en lokaal niveau. Zij streeft
ernaar een evenwichtige en duurzame ontwikkeling te bevorderen en bewerkstelligt
het vrije verkeer van personen, diensten, goederen en kapitaal, alsook de vrijheid van
vestiging.
→ L =
Idea: measure similarity between the kernel matrices 〈K , L〉 = Tr(K L), K = HKH,where H = I − 1
m11T the centering matrix
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Detecting statistical dependence
X1: Conscious of its spiritual and moral heritage, the Union is founded on the
indivisible, universal values of human dignity, freedom, equality and solidarity; it is
based on the principles of democracy and the rule of law. It places the individual at the
heart of its activities, by establishing the citizenship of the Union and by creating an
area of freedom, security and justice.
X2: The Union contributes to the preservation and to the development of these
common values while respecting the diversity of the cultures and traditions of the
peoples of Europe as well as the national identities of the Member States and the
organization of their public authorities at national, regional and local levels; it seeks
to promote balanced and sustainable development and ensures free movement of
persons, services, goods and capital, and the freedom of establishment.
→ K =
Y1: De Unie, die zich bewust is van haar geestelijke en morele erfgoed, heeft haar
grondslag in de ondeelbare en universele waarden van menselijke waardigheid en van
vrijheid, gelijkheid en solidariteit. Zij berust op het beginsel van democratie en het
beginsel van de rechtsstaat. De Unie stelt de mens centraal in haar optreden, door het
burgerschap van de Unie in te stellen en een ruimte van vrijheid, veiligheid en recht
tot stand te brengen.
Y2: De Unie draagt bij tot de instandhouding en de ontwikkeling van deze
gemeenschappelijke waarden, met inachtneming van de verscheidenheid van cultuur
en traditie van de volkeren van Europa, alsmede van de nationale identiteit van de
lidstaten en van hun staatsinrichting op nationaal, regionaal en lokaal niveau. Zij streeft
ernaar een evenwichtige en duurzame ontwikkeling te bevorderen en bewerkstelligt
het vrije verkeer van personen, diensten, goederen en kapitaal, alsook de vrijheid van
vestiging.
→ L =
Idea: measure similarity between the kernel matrices 〈K , L〉 = Tr(K L), K = HKH,where H = I − 1
m11T the centering matrix
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Probability in feature space
feature spaceProbability in
−→ DiscrepancyMaximum Mean
−→ MeasureKernel Dependence
Feature Map- Consider x 7→ k(., x) ∈ F
instead of x 7→ (ϕ1(x), ..., ϕs(x)) ∈ Rs
- Inner product easily compute〈k(., x), k(., y)〉F = k(x , y)
Embedding of probability measures into Reproducing Kernel Hilbert Space- In particular, we can look at the set of distributions and take each distribution P as a
point that we can embed through the mean-embedding µP :
P 7→ µP = EX∼Px k(.,X ) =∫
Ω φ(x) dP(x) ∈ F- Each distribution can thus be uniquely represented in the F .- Inner product easily compute 〈µP , µQ〉F = EX ,Y k(x , y)
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Maximum Mean Discrepancy
feature spaceProbability in
−→ DiscrepancyMaximum Mean
−→ MeasureKernel Dependence
Maximum Mean Discrepancy (MMD): [Gretton et al, 2007]
MMD2(P,Q) = ‖µP − µQ‖2F
= 〈µP , µP〉+ 〈µQ , µQ〉 − 2〈µP , µQ〉
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Kernel dependence measure
feature spaceProbability in
−→ DiscrepancyMaximum Mean
−→ MeasureKernel Dependence
Dependence Measure using the Hilbert-Schmidt Independence Criterion (HSIC):[Gretton et al, 2005, 2008]
HSIC2(Px ,Py ) = ‖µPxy − µPxPy ‖2F
HSIC2(Px ,Py ) = 0⇐⇒ Pxy = PxPy when kernels K and L are characteristic on theirrespective marginal domains.
Empirical HSIC2(Px ,Py ) : HSICXYm =
1
m2Tr(K L), O(m2) computation time
HSICXYm can be rewritten in terms of a U-statistic, which produces minimum-variance
unbiased estimators.
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
The Problem of relative dependency
Is the dependency between English and Dutch stronger than the dependencybetween English and Spanish ?
H0: HSIC(Px ,Py ) ≤ HSIC(Px ,Pz ) (null hypothesis)
H1: HSIC(Px ,Py ) > HSIC(Px ,Pz ) (alternative hypothesis)
Test statistic: HSICXYm −HSICXZ
m
Observed samples ximi=1 ∼ Px , yimi=1 ∼ Py , zimi=1 ∼ Pz
Two strategies:
- Naively: compute the value of the two independent statistics HSICX ′Y ′m/2 and
HSICX ′′Z ′′m/2 on sample subsets;
- Efficiently: compute the value of the two dependent statistics HSICXYm and HSICXZ
m
and if empirical HSICXYm −HSICXZ
m is :
- ”less or equal than 0”: reject H0
- otherwise: do not reject H0
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
The Problem of relative dependency
Is the dependency between English and Dutch stronger than the dependencybetween English and Spanish ?
H0: HSIC(Px ,Py ) ≤ HSIC(Px ,Pz ) (null hypothesis)
H1: HSIC(Px ,Py ) > HSIC(Px ,Pz ) (alternative hypothesis)
Test statistic: HSICXYm −HSICXZ
m
Observed samples ximi=1 ∼ Px , yimi=1 ∼ Py , zimi=1 ∼ Pz
Two strategies:
- Naively: compute the value of the two independent statistics HSICX ′Y ′m/2 and
HSICX ′′Z ′′m/2 on sample subsets;
- Efficiently: compute the value of the two dependent statistics HSICXYm and HSICXZ
m
and if empirical HSICXYm −HSICXZ
m is :
- ”less or equal than 0”: reject H0
- otherwise: do not reject H0
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
A simple consistent test via independent HSICs
Construction of two independent statistics HSICX ′Y ′m/2 and HSICX ′′Z ′′
m/2 by subsampling
K L M
Joint asymptotic distribution of independent HSIC: [Serfling, 2009]
√m
((HSICX ′Y ′
m/2
HSICX ′′Z ′′m/2
)−(
HSIC(Px ,Py )HSIC(Px ,Pz )
))d−→ N
((00
),
(σ2X ′Y ′ 00 σ2
X ′′Z ′′
))Relative dependency test with independent HSIC statistic: p-value
√m[HSICX ′Y ′
m/2 − HSICX ′′Z ′′m/2 ]
d−→
N(√
2
2(HSIC(Px ,Py )− HSIC(Px ,Pz ),
1
2(σ2
X ′Y ′ + σ2X ′′Z ′′
)
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Joint asymptotic distribution of two dependent HSIC
Joint asymptotic distribution of HSIC and test statistic
√m
((HSICXY
m
HSICXZm
)−(HSIC(Px ,Py )HSIC(Px ,Pz)
))d−→ N
((00
),
(σ2XY σXYXZ
σXYXZ σ2XZ
))
σXYXZ =16
m
1
m
m∑i=1
( (m − 1)!
(m − 4)!
)2 ∑(j,q,r)∈im3 \i
hijqrgijqr
− HSICXYm HSICXZ
m
σXYXZ =16
m
((4m)−1(m − 1)−2
3 hXYThXZ − HSICXY
m HSICXZm
)
hXY = (m − 2)2(K L
)1−m(K1) (L1)
+ (m − 2)(
(Tr(KL))1− K(L1)− L(K1))
+ (1T L1)K1 + (1T K1)L1− ((1T K)(L1))1
We have a O(m2) computation for all terms.
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Joint asymptotic distribution of two dependent HSIC
Joint asymptotic distribution of HSIC and test statistic
√m
((HSICXY
m
HSICXZm
)−(HSIC(Px ,Py )HSIC(Px ,Pz)
))d−→ N
((00
),
(σ2XY σXYXZ
σXYXZ σ2XZ
))
σXYXZ =16
m
1
m
m∑i=1
( (m − 1)!
(m − 4)!
)2 ∑(j,q,r)∈im3 \i
hijqrgijqr
− HSICXYm HSICXZ
m
σXYXZ =
16
m
((4m)−1(m − 1)−2
3 hXYThXZ − HSICXY
m HSICXZm
)
hXY = (m − 2)2(K L
)1−m(K1) (L1)
+ (m − 2)(
(Tr(KL))1− K(L1)− L(K1))
+ (1T L1)K1 + (1T K1)L1− ((1T K)(L1))1
We have a O(m2) computation for all terms.
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Properties of the test of relative dependency
Relative dependency statistical test: p-value
√m[HSICXY
m −HSICXZm ]
d−→
N(√
2
2(HSIC(Px ,Py )− HSIC(Px ,Pz),
1
2(σ2
XY + σ2XZ − 2σXYXZ
)
The dependent test is more powerful than the independent test
Theorem
The asymptotic relative efficiency of the independent approach relative to the dependentapproach is always greater to 1.
1
2(σ2
XY + σ2XZ − 2σXYXZ ) <
1
2(2σ2
XY + 2σ2XZ ) (1)
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Experiments on Synthetic Data
We control the relative degree of functional dependency between variates.
Dependency (X,Y) > Dependency (X,Z) ?si
n(t
)+γ
1N
(0,
1)
−1 0 1 2 3 4 5 6 7−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
tsi
n(t
)+γ
2N
(0,
1)
−10 −5 0 5 10 15−15
−10
−5
0
5
10
tsi
n(t
)+γ
3N
(0,
1)
−15 −10 −5 0 5 10 15−15
−10
−5
0
5
10
t + γ1N (0, 1) t cos(t) + γ2N (0, 1) t cos(t) + γ3N (0, 1)
(X) γ1 = 0.3 (Y) γ2 = 0.3 (Z) γ3 = 0.6
Pow
ero
fth
ete
sts
0.5 1 1.5 2 2.5 3 3.50
0.2
0.4
0.6
0.8
1
dependent testsindependent tests
HS
ICXZ
mvs
HS
ICX′′Z′′
m/
2
0.01 0.015 0.02 0.025 0.03 0.0350.01
0.015
0.02
0.025
0.03
0.035independent testsdependent tests
γ3 HSICXYm vs HSICX
′Y ′m/2
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Experiments on Multilingual Data
Uralic: Finnish (fi), Romance: Italian (it), French (fr), Spanish (es), Portuguese (pt),Germanic: English (en), Dutch (nl), German (de), Danish (da), Swedish (sv).
H0 : Dep(Sc., Tg.1) ≤ Dep(Sc., Tg.2)
Source Target 1 Target 2 p-valuefr es it 0.0157fr pt it 0.1882es fr it 0.2147es pt it < 10−4
es pt fr < 10−4
pt fr it 0.7649pt es it 0.0011pt es fr < 10−8
Relative dependency tests between Romance
languages.
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Pediatric high-grade gliomas (pHGG)
Brain tumors localisation pHGG have different genetics origins depending on thelocation of the tumor in the brain. The goal is to identify the mechanismsresponsible for the tumor.
H0: Dependency(Loc,Gene Exp.) < Dependency(Loc,Chrom. Imbalance)
p-value < 10−5
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Conclusion
A novel non-parametric statistical test that determines whether a source variable ismore stronger dependent on one target variable or another.
The test is low variance, consistent and unbiased.
Computation requirement is quadratic time.
Bibliography:- Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Scholkopf, B., Smola, A. J. (2008).
A kernel statistical test of independence. In Advances in Neural Information ProcessingSystems.
- Gretton, A., Herbrich, R., Smola, A., Bousquet, O., et Schoelkopf, B., (2005). KernelMethods for Measuring Independence, Journal of Machine Learning Research, 6 ,2075-2129,
- Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution.The annals of mathematical statistics, 293-325.
- Serfling, R. J. (2009). Approximation theorems of mathematical statistics, 162. JohnWiley & Sons.
A low variance consistent test of relative dependencyIntroductionTest of relative dependencyExperiments
Thanks for your attention!
Code: https://github.com/wbounliphone/reldep
Contact: [email protected]