jérôme tubiana, rémi monasson laboratoire de physique...

26
Jérôme Tubiana, Rémi Monasson Laboratoire de Physique Théorique Ecole Normale Supérieure

Upload: others

Post on 30-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

JérômeTubiana,RémiMonassonLaboratoiredePhysiqueThéorique

EcoleNormaleSupérieure

Page 2: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Mo3va3on

GoogleNetDeepNeuralNetwork

Whydoesthisnetworkwork?(andnotothers!)

Page 3: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

RestrictedBoltzmannMachines

Ackley,Hinton&Sejnowski1985;Smolensky1986;Hinton,2002

Hiddenlayer

Visiblelayer(binaryr.v.)

V1 V3V2

h1 h2

W�,i

•  Graphicalmodelcons3tutedbytwosetsofrandomvariablesthatarecoupledtogether.

P (v, h) =1

Ze�E(v,h)

E(v, h) = �NX

i=1

ivvi +

KX

�=1

U�(h�)�X

�,i

W�,ivih�

Page 4: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

RestrictedBoltzmannMachines

Ackley,Hinton&Sejnowski1985;Smolensky1986;Hinton,2002

Hiddenlayer

Visiblelayer(binaryr.v.)

V1 V3V2

h1 h2

W�,i

•  Graphicalmodelcons3tutedbytwosetsofrandomvariablesthatarecoupledtogether.

•  RBMlearnaprobabilitydistribu3onoverthe

visiblelayer.

P (v, h) =1

Ze�E(v,h)

E(v, h) = �NX

i=1

ivvi +

KX

�=1

U�(h�)�X

�,i

W�,ivih�

P (v) =

Z KY

�=1

dh�P (v, h�) =1

Ze�Heff (v)

Page 5: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Vanillaexample

•  Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)•  Observestrongcorrela3onbetweenallpairsofvariables

V1

V3V2

V3

Page 6: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Vanillaexample

V1

V3V2

V4

Isingmodelexplainscorrela3onbydirect

couplings

•  Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)•  Observestrongcorrela3onbetweenallpairsofvariables

Page 7: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Vanillaexample

•  Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)•  Observestrongcorrela3onbetweenallpairsofvariables

V1

V3V2

V3

h1

RBMexplainscorrela3onbycommoninput

Page 8: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

SamplingfromRBM

V(0)

h(0)

V(1) V(2)

h(1)Extractfeaturesfrom

data

•  Computethehiddenlayerinputs

•  Sampleeachhiddenunitindependently

•  Computethevisiblelayerinputs

•  Sampleeachvisibleunitindependently

x↵ =X

i

W↵,ivi

yi =X

W↵,ih↵

p(h↵

|x↵

) / e

�U↵(h↵)+h↵x↵

p(vi|yi) / evi( iv+yi)

Inputlayer(data)

Hiddenlayer(features)

Reconstructdatafromfeatures

Page 9: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Thehiddenunitspoten3alsGaussian units :

h↵ 2 R , U↵(h↵) =h2↵

2

Heff [v] = �X

i

ivvi �

X

(X

i

W↵,ivi)2

GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)

Page 10: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Thehiddenunitspoten3alsGaussian units :

h↵ 2 R , U↵(h↵) =h2↵

2

Heff [v] = �X

i

ivvi �

X

(X

i

W↵,ivi)2

GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)

Ingeneral,theeffec3vehamiltonianisnotquadra3candhashigher-ordercouplings.Examples:•  Bernoulli•  ReLU+

Non gaussian units :

Page 11: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Thehiddenunitspoten3alsGaussian units :

h↵ 2 R , U↵(h↵) =h2↵

2

Heff [v] = �X

i

ivvi �

X

(X

i

W↵,ivi)2

GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)

Ingeneral,theeffec3vehamiltonianisnotquadra3candhashigher-ordercouplings.Examples:•  Bernoulli•  ReLU+

Non gaussian units :

DifferentpotenNalscorrespondtodifferent

transferfuncNons

Page 12: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

RBMlearndatarepresenta3ons

Givenaconfigura3on(v1,…,vN),Thehiddenunitsac3va3ons(h1,…,hK)definearepresenta3onofthedata.Learningrepresenta3onsisacrucialtask:

•  InNeuroscience:Sensoryinforma3onprocessing(ex:fromsensorstocor3calareas).

•  InMachineLearning:Thesuccessoflearningalgorithmsdependsondatarepresenta3ons.Agoodrepresenta3onlearningalgorithmextractthefeaturesthathavevariabilityinmanydifferentneighborregionsoftheinputspace.

ThesuccessofDeepLearningalgorithmliesintheirabilitytolearnabstractrepresenta3ons.RBMisoneofthesimplestrepresenta3onlearningalgorithm

thatcanbestudied

Page 13: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Example:MNISTsynthe3cdigits

ReLU+RBMK=400

LogLikelihood:-63Nats

GaussianRBMK=400

LogLikelihood:-83Nats

MNIST60,000imageofdigitsofsize28x28

Learningalgorithm:PCD,PT(Tieleman2008,Desjardins2010)

Page 14: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

MNISTdistributedrepresenta3on

AsubsetofthefeatureslearntforaReLU+RBM.K=400.Eachimagerepresentaweightvector

W�Eachgeneratedhandwriiendigitimageiscomposedby

superposingabout20elementarystrokes.

Differentcombina3onsofstrokesproducedifferentvariantsofthesamedigit

Page 15: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Phenomenologyoflearning

Keymetricsmonitoredduringlearning:•  Log-likelihood:increases L =< log(p(x|✓)) >

validation

Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)

Page 16: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Phenomenologyoflearning

� =<X

i

W 2↵,i >↵

Keymetricsmonitoredduringlearning:•  Log-likelihood:increases

•  Weightamplitude:increases

L =< log(p(x|✓)) >validation

Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)

Page 17: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Phenomenologyoflearning

� =<X

i

W 2↵,i >↵

Keymetricsmonitoredduringlearning:•  Log-likelihood:increases

•  Weightamplitude:increases

•  Weightsparsity:Theweightsaregekngmoresparse

p = fraction non-zero couplings

L =< log(p(x|✓)) >validation

Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)

Page 18: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Phenomenologyoflearning

� =<X

i

W 2↵,i >↵

Keymetricsmonitoredduringlearning:•  Log-likelihood:increases

•  Weightamplitude:increases

•  Weightsparsity:Theweightsaregekngmoresparse

•  Averagenumberofac3vehiddenunits:increase(alertransient)

p = fraction non-zero couplings

L =< log(p(x|✓)) >validation

L = Number of active hidden units

Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)

Page 19: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Ques3ons:

•  Howcanabipar3tenetworkgeneratesuchdata?•  WhydosomeRBMworkandothersdon’t?•  Whatmechanismproducesdistributedrepresenta3ons?

Page 20: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Ques3ons:

•  Howcanabipar3tenetworkgeneratesuchdata?•  WhydosomeRBMworkandothersdon’t?•  Whatmechanismproducesdistributedrepresenta3ons?

Sta3s3calPhysicsApproach:studytheproper3esofarandomRBMwithprescribedcontrolparametersandthedifferentphases(theoutcomeofthealgorithm)

Page 21: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

RandomWeightRBMmodel

MulNtaskingassociaNvenetworksAgliarietal.,PRL2012

N ! 1

↵N

Sparse Random Weights

W

8<

:

0 1� p+1

p2

�1

p2

vi 2 {0, 1}field v

h� 2 R+ , ReLU+

Threshold h

Page 22: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

ThePhasesGlassyPhase:

•  Allthehiddenunitareweakly

ac3vated•  Allstateshaveweak

probability

ComposiNonalPhase•  Severalhiddenunitsarestronglyac3vated,

andtheothersarequiet•  Numberofregionswithhighprobabilityis

polynomialinN

FerromagneNcPhase:•  Onehiddenunitisstrongly

ac3vatedandtheothersareweaklyac3vated.

•  NumberofregionswithhighprobabilityislinearinN

h %p &

↵ %

Page 23: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

StatMechsofRandomRBM•  Areplicatheorycomputa3onisperformedtoes3matethefreeenergyinthe

zero-temperaturelimit:

NumberofacNve

hiddenunitsHiddenunitsthreshold

Fieldsonvisibleunits

Number of hidden units

Number of visible units

Weightssparsity

L? / 1

p

F (↵, p, v, h, L)

Page 24: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Valida3onMNIST

NumericalExperiment:

ReLU+RBMsaretrainedwitharangeofL1-like

regulariza3onthatcontroltheweightmatrixsparsity

Astheweightsget

sparser,thenumberofac3vehiddenunits

increases

Page 25: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Conclusions

•  Distributedencodingreliesontwokeyproper3es:– Nonquadra3cpoten3als(ienon-lineartransferfunc3ons).Theydenoisethehiddenlayerinputsallowingforbeierstability.

– Weightsparsityallowforac3va3onofmanyhiddenunitsthatdetectcomplementaryfeatures.Thecombinatoricscreatesarichoutputdistribu3on.

•  Future:– Dynamicsoflearning– Deepmodels

Page 26: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces

Acknowledgements

•  Funding:– EcoleNormaleSupérieure– CNRS:Inphyni3Challenge

•  Discussions:A.Dubreuil,L.Posani