design'of'stable'algorithms'for''...

43
Design of Stable Algorithms for Privacy and Learning CompSci 590.03 Instructor: Ashwin Machanavajjhala 1 Lecture 1 : 590.03 Fall 16

Upload: others

Post on 15-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Design'of'Stable'Algorithms'for''Privacy'and'Learning'

CompSci(590.03(Instructor:(Ashwin(Machanavajjhala(

1'Lecture'1':'590.03'Fall'16'

Page 2: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Lecture'1':'590.03'Fall'16' 2'

Bayes&Theorem&

Differen0al&Privacy&

Histograms&

Linear&Regression&

Range&Queries&

Laplace&Mechanism&

Odds&Ra0o&

PAC&Learning&

Anonymiza0on&

Secure&Mul0party&Computa0on&

Page 3: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Machine'Learning'is'here'to'stay'

Lecture'1':'590.03'Fall'16' 3'

Page 4: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Machine'Learning'&'adverFsing'

Lecture'1':'590.03'Fall'16' 4'

+250% clicks vs. editorial one size fits all

+79% clicks vs. randomly selected

+43% clicks vs. editor selected

Recommended&links& Personalized&&News&Interests&

Top&Searches&

Page 5: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Machine'Learning'&'markeFng'

Lecture'1':'590.03'Fall'16' 5'

Page 6: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Machine'learning'&'health'

Lecture'1':'590.03'Fall'16' 6'©'Gibson'&'Muse,''A'Primer'of'Genome'Science'

Page 7: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Machine'learning'&'traffic'predicFon'

Lecture'1':'590.03'Fall'16' 7'

Page 8: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Machine'Learning'&'poliFcs!'

Lecture'1':'590.03'Fall'16' 8'

Fivethirtyeight.com'

Page 9: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Learning'•  Given'a'set'of'data'points'(x,'y)'

–  X'is'a'vector'of'predictors'–  Y'is'the'predicFon'

•  Learn'a'funcFon'based'on'a'training'dataset's.t.'the'funcFon'accurately'predicts'y'for'an'unseen'x'

'

Lecture'1':'590.03'Fall'16' 9'

Page 10: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Quality'of'a'learning'algorithm'•  GeneralizaFon'error:'Expected'predicFon'error'on'unseen'data'

•  Empirical'Risk:'Average'predicFon'error'on'unseen'test'dataset(

•  Note'that'test'and'training'data'are''disjoint'to'avoid'overfi]ng'

Lecture'1':'590.03'Fall'16' 10'

Page 11: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Where'is'the'data'coming'from?'

Lecture'1':'590.03'Fall'16' 11'

Page 12: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Where'is'the'data'coming'from?'

•  Census'surveys'•  IRS'Records'

•  Medical'records'•  Insurance'records'

•  Search'logs'•  Browse'logs'•  Shopping'histories'

•  Photos'•  Videos''•  Smart'phone'Sensors'•  Mobility'trajectories'

•  …'

12'

Very&sensi0

ve&informa0on

&…&&

Lecture'1':'590.03'Fall'16'

Page 13: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

How'is'this'data'collected?'

13'

http://graphicsweb.wsj.com/documents/divSlider/media/ecosystem100730.png

Lecture'1':'590.03'Fall'16'

Page 14: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

What'websites'track'your'behavior?'

14'http://blogs.wsj.com/wtk/ Lecture'1':'590.03'Fall'16'

Page 15: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

15'

Individual'''1'r1"

Individual'''2'r2"

Individual'''3'r3"

Individual'''N&rN"

Server'

DB"

Release'the'data'“anonymously”'or'release'a'model'

Lecture'1':'590.03'Fall'16'

Page 16: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Releasing'data'is'bad'

Lecture'1':'590.03'Fall'16' 16'

What(if(we(ensure(our(names(and(other((idenAfiers(are(never(released?((

Page 17: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

The'Massachusejs'Governor''Privacy'Breach'[Sweeney'IJUFKS'2002]'

• Name'• SSN'• Visit'Date'• Diagnosis'• Procedure'• MedicaFon'• Total'Charge'

Medical&Data&

•  Zip

•  Birth

date

•  Sex

17'Lecture'1':'590.03'Fall'16'

Page 18: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

The'Massachusejs'Governor''Privacy'Breach'[Sweeney'IJUFKS'2002]'

• Name'• SSN'• Visit'Date'• Diagnosis'• Procedure'• MedicaFon'• Total'Charge'

• Name'• Address'• Date'''''Registered'• Party'''''affiliaFon ''• Date'last''''voted'

•  Zip

•  Birth

date

•  Sex

Medical&Data& Voter&List&

18'Lecture'1':'590.03'Fall'16'

Page 19: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

The'Massachusejs'Governor''Privacy'Breach'[Sweeney'IJUFKS'2002]'

• Name'• SSN'• Visit'Date'• Diagnosis'• Procedure'• MedicaFon'• Total'Charge'

• Name'• Address'• Date'''''Registered'• Party'''''affiliaFon ''• Date'last''''voted'

•  Zip

•  Birth

date

•  Sex

Medical&Data& Voter&List&

• 'Governor'of'MA''''&uniquely&iden0fied'''''using'ZipCode,''''''Birth'Date,'and'Sex.''''''Name&linked&to&Diagnosis&'

19'Lecture'1':'590.03'Fall'16'

Page 20: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

The'Massachusejs'Governor''Privacy'Breach'[Sweeney'IJUFKS'2002]'

• Name'• SSN'• Visit'Date'• Diagnosis'• Procedure'• MedicaFon'• Total'Charge'

• Name'• Address'• Date'''''Registered'• Party'''''affiliaFon ''• Date'last''''voted'

•  Zip

•  Birth

date

•  Sex

Medical&Data& Voter&List&

• 'Governor'of'MA''''&uniquely&iden0fied'''''using'ZipCode,''''''Birth'Date,'and'Sex.''''''&'

Quasi&Iden0fier&

87'%'of'US'populaFon'

20'Lecture'1':'590.03'Fall'16'

Page 21: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

AOL'data'publishing'fiasco'

21'Lecture'1':'590.03'Fall'16'

Page 22: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

AOL'data'publishing'fiasco'…'

22'

Ashwin222&Ashwin222&Ashwin222&Ashwin222&Jun156&Jun156&BreR12345&BreR12345&BreR12345&BreR12345&Aus0n222&Aus0n222&

Uefa'cup'Uefa'champions'league'Champions'league'final'Champions'league'final'2013'exchangeability'Proof'of'deFini]’s'theorem'Zombie'games'Warcrau'Beatles'anthology'Ubuntu'breeze'Python'in'thought'Enthought'Canopy'

Lecture'1':'590.03'Fall'16'

Page 23: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

User'IDs'replaced'with'random'numbers'

23'

Uefa'cup'Uefa'champions'league'Champions'league'final'Champions'league'final'2013'exchangeability'Proof'of'deFini]’s'theorem'Zombie'games'Warcrau'Beatles'anthology'Ubuntu'breeze'Python'in'thought'Enthought'Canopy'

865712345&865712345&865712345&865712345&236712909&236712909&112765410&112765410&112765410&112765410&865712345&865712345&

Lecture'1':'590.03'Fall'16'

Page 24: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Privacy'Breach'

24'

[NYTimes)2006]

Lecture'1':'590.03'Fall'16'

Page 25: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Can'we'release'a'model'alone?'

25'

Individual'''1'r1"

Individual'''2'r2"

Individual'''3'r3"

Individual'''N&rN"

Server'

DB"

Release'the'data'“anonymously”'or'release'a'model'

Lecture'1':'590.03'Fall'16'

Page 26: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Releasing'a'model'can'also'be'bad'

26'

[Korolova'JPC'2011]'

Facebook&Profile&

+&Online&Data&

Number&of&&Impressions&

'+'Who'are'interested'in'

Men&

'+'Who'are'interested'in'Women&

25

0

Facebook’s'learning'algorithm'uses'private'informaFon'to'predict'match'to'ad'

Lecture'1':'590.03'Fall'16'

Page 27: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Genome'wide'associaFon'studies'

Lecture'1':'590.03'Fall'16' 27'

Did'Bob'parFcipate'in'the'study'

Results'of'a'GWAS'study'High'density'SNP'profile'of'Bob'

[Homer'et'al'PLOS'GeneFcs'08]'

Page 28: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Model'Inversion'

•  An'ajacker,'given'the'model'and'some'demographic'informaFon'about'a'paFent,'can'predict'the'paFent’s'geneFc'markers.'

Lecture'1':'590.03'Fall'16' 28'

[Frederickson'et'al'USENIX'Security'2014]'

Page 29: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

The'red'side'of'learning'

•  Redlining:'the'pracFce'of'denying,'or'charging'more'for,'services'such'as'banking,'insurance,'access'to'health'care,'or'even'supermarkets,'or'denying'jobs'to'residents'in'parFcular,'ouen'racially'determined,'areas.'

29'Lecture'1':'590.03'Fall'16'

Page 30: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Privacy'versus'Learning'•  Does'learning'always'violate'your'privacy?'

'•  Researcher'collects'data'from'people'for'a'study'•  Researcher'learns'smoking'causes'cancer'•  Now'researcher'knows'Bob'(who'was'NOT'in'the'study)'has'high'

risk'for'cancer'since'he'smokes.''''•  Is'this'a'privacy'violaFon?'

Lecture'1':'590.03'Fall'16' 30'

Page 31: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

31'

Individual'''1'r1"

Individual'''2'r2"

Individual'''3'r3"

Individual'''N&rN"

Server'

DB"

Learn'a'model'over'the'data,'but'do'not'reveal'individual'

records'

Lecture'1':'590.03'Fall'16'

Page 32: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Privacy'vs'Learning'

Privacy&•  Be'able'to'learn'general'

trends'from'the'data'(e.g.,'smoking'causes'cancer)'

•  Should'not'disclose'informaFon'about'individuals'in'the'data'

Learning&•  (Same!)'Recover'the'

distribuFon'from'the'records'are'drawn'so'that'we'can'accurately'predict'on'new'data'

'•  (Similar!)'Should'not'overfit'

to'any'single'training'point'

Lecture'1':'590.03'Fall'16' 32'

Page 33: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Stable'Algorithms'•  Algorithms'whose'outputs'are'insensiAve'to'the'addiFon'or'

removal'of'a'single'record'in'the'database'

•  Overcomes'the'overfi]ng'problem.'

•  Great'for'privacy'(since'one'can’t'tell'whether'or'not'an'individual'records'was'used'in'the'computaFon).''

Lecture'1':'590.03'Fall'16' 33'

Page 34: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

DifferenFal'Privacy'

For'every'output'…'

O"D2"D1"

Adversary'should'not'be'able'to'disFnguish'between'any'D1'and'D2'based'on'any'O'

&& &Pr[A(D1)&=&O]&&&&& &Pr[A(D2)&=&O]&&&&&&&&&&&&&&&&.&

For'every'pair'of'inputs'that'differ'in'one'row"

&&<&&ε&&&(ε>0)&log'

[Dwork&ICALP&2006]&

34'Lecture'1':'590.03'Fall'16'

Page 35: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

DifferenFal'Privacy'for'Privacy'

Lecture'1':'590.03'Fall'16' 35'

Released'syntheFc'data'about'where'people'live'and'work'under'differenFal'privacy'

[Machanavajjhala(et(al,(ICDE(2008]([Haney(et(al,(EuroStat(2015](

Page 36: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

DifferenFal'Privacy'for'Privacy'

Lecture'1':'590.03'Fall'16' 36'

Collect'perturbed'data'from'users'for'analysis.'

[Erlingsson(et(al,(CCS(2014](

[Apple(WWDC(2016](

Page 37: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

DifferenFal'Privacy'for'Privacy'

Lecture'1':'590.03'Fall'16' 37'

Released'a'differenFally'private'frequency'histogram'dataset'of'7'million'passwords'for'research'

[Blocki(et(al,(NDSS(2016](

Page 38: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

DifferenFal'Privacy'and'Learning'•  All'differenFally'private'algorithms'saFsfy'uniform(stability(

•  Uniform(stability(=>(generalizaAon(

The'generalizaFon'error'of'a'learning'algorithm'is'the'expected'error'in'predicFon'on'a'sample'randomly'drawn'from'the'populaFon''Empirical'error'is'the'average'predicFon'observed'on'a'sample'of'the'populaFon'(called'the'test'set).''

'For'sufficiently'large'training'datasets,'a'uniformly'stable'algorithms'empirical'error'tends'to'the'generalizaFon'error.''

Lecture'1':'590.03'Fall'16' 38'

Page 39: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

DifferenFal'Privacy'for'learning'

Lecture'1':'590.03'Fall'16' 39'

Can'lead'to'false'discovery'and'overfi]ng'

DifferenFal'privacy'overcomes''False'discovery!''

Page 40: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Outline'of'the'course'•  Module'1:'IntroducFon'(5)'

–  On'Thurday:'DifferenFal'Privacy'•  Module'2:'DifferenFally'Private'Algorithms''(6)'•  Module'4:'DP'and'privacy'in'the'real'world'(6)'•  Module'5:'DP'and'Learning'(6)'

Lecture'1':'590.03'Fall'16' 40'

Page 41: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Administrivia'hjp://www.cs.duke.edu/courses/fall16/compsci590.3/'

•  Tu/Thu'1:25'–'2:40'PM''•  “Reading'Course'+'Project”'

–  No'exams!'–  Every'class'based'on'1'(or'2)'assigned'papers'that'students'must'read.'

•  Projects:'(60%'of'grade)'–  Individual'or'groups'of'size'2'

•  Class'ParFcipaFon'(other'40%)'–  1'short'(20'min)'presentaFon'

•  Office'hours:'by'appointment'

41'Lecture'1':'590.03'Fall'16'

Page 42: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Administrivia'•  Projects:'(60%'of'grade)'

–  Design'new'or'rigorously'evaluate'or'break'exisFng'privacy'algorithms'–  Design'new'or'rigorously'evaluate'stable'algorithms'for'learning'–  Establish'new'connecFons'between'learning,'stability'and'privacy'

•  Goals:'–  Literature'review'–  Some'original'research/implementaFon'

•  Timeline'(details'will'be'posted'on'the'website'soon)'–  Sep&27:'Choose'Project'(ideas'will'be'posted'…'new'ideas'welcome)'–  Oct&13:'Project'proposal'(1{4'pages'describing'the'project)'–  Nov&8:'Mid{project'review'(2{3'page'report'on'progress)'–  Dec&1:'Final'presentaFons'and'submission'(6{10'page'conference'style'paper'

+'10{15'minute'talk)'

Lecture'1':'590.03'Fall'16' 42'

Page 43: Design'of'Stable'Algorithms'for'' Privacy'and'Learning' · Machine'Learning'&'adverFsing' Lecture'1':'590.03'Fall'16' 4' +250% clicks vs. editorial one size fits all +79% clicks vs

Assignment''•  No'class'next'week'(Sep'6,'Sep'8)'

•  Instead'there'will'be'an'assignment'

•  Grade'for'ajempFng'the'assignment'–  Its'ok'if'you'are'not'able'to'solve'it.''

Lecture'1':'590.03'Fall'16' 43'