design'of'stable'algorithms'for''...
Post on 15-Jul-2020
0 Views
Preview:
TRANSCRIPT
Design'of'Stable'Algorithms'for''Privacy'and'Learning'
CompSci(590.03(Instructor:(Ashwin(Machanavajjhala(
1'Lecture'1':'590.03'Fall'16'
Lecture'1':'590.03'Fall'16' 2'
Bayes&Theorem&
Differen0al&Privacy&
Histograms&
Linear&Regression&
Range&Queries&
Laplace&Mechanism&
Odds&Ra0o&
PAC&Learning&
Anonymiza0on&
Secure&Mul0party&Computa0on&
Machine'Learning'is'here'to'stay'
Lecture'1':'590.03'Fall'16' 3'
Machine'Learning'&'adverFsing'
Lecture'1':'590.03'Fall'16' 4'
+250% clicks vs. editorial one size fits all
+79% clicks vs. randomly selected
+43% clicks vs. editor selected
Recommended&links& Personalized&&News&Interests&
Top&Searches&
Machine'Learning'&'markeFng'
Lecture'1':'590.03'Fall'16' 5'
Machine'learning'&'health'
Lecture'1':'590.03'Fall'16' 6'©'Gibson'&'Muse,''A'Primer'of'Genome'Science'
Machine'learning'&'traffic'predicFon'
Lecture'1':'590.03'Fall'16' 7'
Machine'Learning'&'poliFcs!'
Lecture'1':'590.03'Fall'16' 8'
Fivethirtyeight.com'
Learning'• Given'a'set'of'data'points'(x,'y)'
– X'is'a'vector'of'predictors'– Y'is'the'predicFon'
• Learn'a'funcFon'based'on'a'training'dataset's.t.'the'funcFon'accurately'predicts'y'for'an'unseen'x'
'
Lecture'1':'590.03'Fall'16' 9'
Quality'of'a'learning'algorithm'• GeneralizaFon'error:'Expected'predicFon'error'on'unseen'data'
• Empirical'Risk:'Average'predicFon'error'on'unseen'test'dataset(
• Note'that'test'and'training'data'are''disjoint'to'avoid'overfi]ng'
Lecture'1':'590.03'Fall'16' 10'
Where'is'the'data'coming'from?'
Lecture'1':'590.03'Fall'16' 11'
Where'is'the'data'coming'from?'
• Census'surveys'• IRS'Records'
• Medical'records'• Insurance'records'
• Search'logs'• Browse'logs'• Shopping'histories'
• Photos'• Videos''• Smart'phone'Sensors'• Mobility'trajectories'
• …'
12'
Very&sensi0
ve&informa0on
&…&&
Lecture'1':'590.03'Fall'16'
How'is'this'data'collected?'
13'
http://graphicsweb.wsj.com/documents/divSlider/media/ecosystem100730.png
Lecture'1':'590.03'Fall'16'
What'websites'track'your'behavior?'
14'http://blogs.wsj.com/wtk/ Lecture'1':'590.03'Fall'16'
15'
Individual'''1'r1"
Individual'''2'r2"
Individual'''3'r3"
Individual'''N&rN"
Server'
DB"
Release'the'data'“anonymously”'or'release'a'model'
Lecture'1':'590.03'Fall'16'
Releasing'data'is'bad'
Lecture'1':'590.03'Fall'16' 16'
What(if(we(ensure(our(names(and(other((idenAfiers(are(never(released?((
The'Massachusejs'Governor''Privacy'Breach'[Sweeney'IJUFKS'2002]'
• Name'• SSN'• Visit'Date'• Diagnosis'• Procedure'• MedicaFon'• Total'Charge'
Medical&Data&
• Zip
• Birth
date
• Sex
17'Lecture'1':'590.03'Fall'16'
The'Massachusejs'Governor''Privacy'Breach'[Sweeney'IJUFKS'2002]'
• Name'• SSN'• Visit'Date'• Diagnosis'• Procedure'• MedicaFon'• Total'Charge'
• Name'• Address'• Date'''''Registered'• Party'''''affiliaFon ''• Date'last''''voted'
• Zip
• Birth
date
• Sex
Medical&Data& Voter&List&
18'Lecture'1':'590.03'Fall'16'
The'Massachusejs'Governor''Privacy'Breach'[Sweeney'IJUFKS'2002]'
• Name'• SSN'• Visit'Date'• Diagnosis'• Procedure'• MedicaFon'• Total'Charge'
• Name'• Address'• Date'''''Registered'• Party'''''affiliaFon ''• Date'last''''voted'
• Zip
• Birth
date
• Sex
Medical&Data& Voter&List&
• 'Governor'of'MA''''&uniquely&iden0fied'''''using'ZipCode,''''''Birth'Date,'and'Sex.''''''Name&linked&to&Diagnosis&'
19'Lecture'1':'590.03'Fall'16'
The'Massachusejs'Governor''Privacy'Breach'[Sweeney'IJUFKS'2002]'
• Name'• SSN'• Visit'Date'• Diagnosis'• Procedure'• MedicaFon'• Total'Charge'
• Name'• Address'• Date'''''Registered'• Party'''''affiliaFon ''• Date'last''''voted'
• Zip
• Birth
date
• Sex
Medical&Data& Voter&List&
• 'Governor'of'MA''''&uniquely&iden0fied'''''using'ZipCode,''''''Birth'Date,'and'Sex.''''''&'
Quasi&Iden0fier&
87'%'of'US'populaFon'
20'Lecture'1':'590.03'Fall'16'
AOL'data'publishing'fiasco'
21'Lecture'1':'590.03'Fall'16'
AOL'data'publishing'fiasco'…'
22'
Ashwin222&Ashwin222&Ashwin222&Ashwin222&Jun156&Jun156&BreR12345&BreR12345&BreR12345&BreR12345&Aus0n222&Aus0n222&
Uefa'cup'Uefa'champions'league'Champions'league'final'Champions'league'final'2013'exchangeability'Proof'of'deFini]’s'theorem'Zombie'games'Warcrau'Beatles'anthology'Ubuntu'breeze'Python'in'thought'Enthought'Canopy'
Lecture'1':'590.03'Fall'16'
User'IDs'replaced'with'random'numbers'
23'
Uefa'cup'Uefa'champions'league'Champions'league'final'Champions'league'final'2013'exchangeability'Proof'of'deFini]’s'theorem'Zombie'games'Warcrau'Beatles'anthology'Ubuntu'breeze'Python'in'thought'Enthought'Canopy'
865712345&865712345&865712345&865712345&236712909&236712909&112765410&112765410&112765410&112765410&865712345&865712345&
Lecture'1':'590.03'Fall'16'
Privacy'Breach'
24'
[NYTimes)2006]
Lecture'1':'590.03'Fall'16'
Can'we'release'a'model'alone?'
25'
Individual'''1'r1"
Individual'''2'r2"
Individual'''3'r3"
Individual'''N&rN"
Server'
DB"
Release'the'data'“anonymously”'or'release'a'model'
Lecture'1':'590.03'Fall'16'
Releasing'a'model'can'also'be'bad'
26'
[Korolova'JPC'2011]'
Facebook&Profile&
+&Online&Data&
Number&of&&Impressions&
'+'Who'are'interested'in'
Men&
'+'Who'are'interested'in'Women&
25
0
Facebook’s'learning'algorithm'uses'private'informaFon'to'predict'match'to'ad'
Lecture'1':'590.03'Fall'16'
Genome'wide'associaFon'studies'
Lecture'1':'590.03'Fall'16' 27'
Did'Bob'parFcipate'in'the'study'
Results'of'a'GWAS'study'High'density'SNP'profile'of'Bob'
[Homer'et'al'PLOS'GeneFcs'08]'
Model'Inversion'
• An'ajacker,'given'the'model'and'some'demographic'informaFon'about'a'paFent,'can'predict'the'paFent’s'geneFc'markers.'
Lecture'1':'590.03'Fall'16' 28'
[Frederickson'et'al'USENIX'Security'2014]'
The'red'side'of'learning'
• Redlining:'the'pracFce'of'denying,'or'charging'more'for,'services'such'as'banking,'insurance,'access'to'health'care,'or'even'supermarkets,'or'denying'jobs'to'residents'in'parFcular,'ouen'racially'determined,'areas.'
29'Lecture'1':'590.03'Fall'16'
Privacy'versus'Learning'• Does'learning'always'violate'your'privacy?'
'• Researcher'collects'data'from'people'for'a'study'• Researcher'learns'smoking'causes'cancer'• Now'researcher'knows'Bob'(who'was'NOT'in'the'study)'has'high'
risk'for'cancer'since'he'smokes.''''• Is'this'a'privacy'violaFon?'
Lecture'1':'590.03'Fall'16' 30'
31'
Individual'''1'r1"
Individual'''2'r2"
Individual'''3'r3"
Individual'''N&rN"
Server'
DB"
Learn'a'model'over'the'data,'but'do'not'reveal'individual'
records'
Lecture'1':'590.03'Fall'16'
Privacy'vs'Learning'
Privacy&• Be'able'to'learn'general'
trends'from'the'data'(e.g.,'smoking'causes'cancer)'
• Should'not'disclose'informaFon'about'individuals'in'the'data'
Learning&• (Same!)'Recover'the'
distribuFon'from'the'records'are'drawn'so'that'we'can'accurately'predict'on'new'data'
'• (Similar!)'Should'not'overfit'
to'any'single'training'point'
Lecture'1':'590.03'Fall'16' 32'
Stable'Algorithms'• Algorithms'whose'outputs'are'insensiAve'to'the'addiFon'or'
removal'of'a'single'record'in'the'database'
• Overcomes'the'overfi]ng'problem.'
• Great'for'privacy'(since'one'can’t'tell'whether'or'not'an'individual'records'was'used'in'the'computaFon).''
Lecture'1':'590.03'Fall'16' 33'
DifferenFal'Privacy'
For'every'output'…'
O"D2"D1"
Adversary'should'not'be'able'to'disFnguish'between'any'D1'and'D2'based'on'any'O'
&& &Pr[A(D1)&=&O]&&&&& &Pr[A(D2)&=&O]&&&&&&&&&&&&&&&&.&
For'every'pair'of'inputs'that'differ'in'one'row"
&&<&&ε&&&(ε>0)&log'
[Dwork&ICALP&2006]&
34'Lecture'1':'590.03'Fall'16'
DifferenFal'Privacy'for'Privacy'
Lecture'1':'590.03'Fall'16' 35'
Released'syntheFc'data'about'where'people'live'and'work'under'differenFal'privacy'
[Machanavajjhala(et(al,(ICDE(2008]([Haney(et(al,(EuroStat(2015](
DifferenFal'Privacy'for'Privacy'
Lecture'1':'590.03'Fall'16' 36'
Collect'perturbed'data'from'users'for'analysis.'
[Erlingsson(et(al,(CCS(2014](
[Apple(WWDC(2016](
DifferenFal'Privacy'for'Privacy'
Lecture'1':'590.03'Fall'16' 37'
Released'a'differenFally'private'frequency'histogram'dataset'of'7'million'passwords'for'research'
[Blocki(et(al,(NDSS(2016](
DifferenFal'Privacy'and'Learning'• All'differenFally'private'algorithms'saFsfy'uniform(stability(
• Uniform(stability(=>(generalizaAon(
The'generalizaFon'error'of'a'learning'algorithm'is'the'expected'error'in'predicFon'on'a'sample'randomly'drawn'from'the'populaFon''Empirical'error'is'the'average'predicFon'observed'on'a'sample'of'the'populaFon'(called'the'test'set).''
'For'sufficiently'large'training'datasets,'a'uniformly'stable'algorithms'empirical'error'tends'to'the'generalizaFon'error.''
Lecture'1':'590.03'Fall'16' 38'
DifferenFal'Privacy'for'learning'
Lecture'1':'590.03'Fall'16' 39'
Can'lead'to'false'discovery'and'overfi]ng'
DifferenFal'privacy'overcomes''False'discovery!''
Outline'of'the'course'• Module'1:'IntroducFon'(5)'
– On'Thurday:'DifferenFal'Privacy'• Module'2:'DifferenFally'Private'Algorithms''(6)'• Module'4:'DP'and'privacy'in'the'real'world'(6)'• Module'5:'DP'and'Learning'(6)'
Lecture'1':'590.03'Fall'16' 40'
Administrivia'hjp://www.cs.duke.edu/courses/fall16/compsci590.3/'
• Tu/Thu'1:25'–'2:40'PM''• “Reading'Course'+'Project”'
– No'exams!'– Every'class'based'on'1'(or'2)'assigned'papers'that'students'must'read.'
• Projects:'(60%'of'grade)'– Individual'or'groups'of'size'2'
• Class'ParFcipaFon'(other'40%)'– 1'short'(20'min)'presentaFon'
• Office'hours:'by'appointment'
41'Lecture'1':'590.03'Fall'16'
Administrivia'• Projects:'(60%'of'grade)'
– Design'new'or'rigorously'evaluate'or'break'exisFng'privacy'algorithms'– Design'new'or'rigorously'evaluate'stable'algorithms'for'learning'– Establish'new'connecFons'between'learning,'stability'and'privacy'
• Goals:'– Literature'review'– Some'original'research/implementaFon'
• Timeline'(details'will'be'posted'on'the'website'soon)'– Sep&27:'Choose'Project'(ideas'will'be'posted'…'new'ideas'welcome)'– Oct&13:'Project'proposal'(1{4'pages'describing'the'project)'– Nov&8:'Mid{project'review'(2{3'page'report'on'progress)'– Dec&1:'Final'presentaFons'and'submission'(6{10'page'conference'style'paper'
+'10{15'minute'talk)'
Lecture'1':'590.03'Fall'16' 42'
Assignment''• No'class'next'week'(Sep'6,'Sep'8)'
• Instead'there'will'be'an'assignment'
• Grade'for'ajempFng'the'assignment'– Its'ok'if'you'are'not'able'to'solve'it.''
Lecture'1':'590.03'Fall'16' 43'
top related