recognizing and imitating programmer style: adversaries in … · 2019-08-16 · recognizing and...
TRANSCRIPT
![Page 1: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/1.jpg)
Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution
Lucy Simko, Luke Zettlemoyer, Tadayoshi Kohno
sim
[email protected] homes.cs.washington.edu/~simkol
![Page 2: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/2.jpg)
!2
Source Code Attribution
int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);
while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...
DC
A
B
E
F
?
![Page 3: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/3.jpg)
Caliskan-Islam et al. “De-anonymizing programmers via code stylometry.” 24th USENIX Security Symposium (USENIX Security), Washington, DC. 2015.
● 98% accuracy over 250 programmers ● Extract syntactic, lexical, and layout features from C/C++ code ● Random Forest classifier ● Data set: Google Code Jam
○ Programming competition ○ Lots of examples of people solving the same problem in different ways
● Open source
!3
State of the Art: Source Code Attribution
![Page 4: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/4.jpg)
!4
Source Code Attribution
int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);
while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...
DC
A
B
E
F
?
98% accuracy!
![Page 5: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/5.jpg)
!5
Source Code Attribution
int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);
while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...
DC
A
B
E
F
?
98% accuracy!
![Page 6: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/6.jpg)
!6
Source Code Attribution
int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);
while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...
DC
A
B
E
F
?
98% accuracy!
CENSORED
CENSORED
![Page 7: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/7.jpg)
Can we fool source code attribution classifiers?
!7
Research Question
Yes!
Methodology: Lab study* with C programmers
*Approved by University of Washington’s Human Subjects Division (IRB)
![Page 8: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/8.jpg)
● Motivation and Research Question
● Source Code Attribution: Overview and Background ● Evading Source Code Attribution: Definitions and Goals
● Methodology
● Results: Conservative Estimate of Adversarial Success
● Results: How to Create Forgeries
!8
Outline
![Page 9: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/9.jpg)
!9
Source Code Attribution
int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);
while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...
DC
A
B
E
F
?
![Page 10: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/10.jpg)
!10
Source Code Attribution
int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);
while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...
DC
A
B
E
F
Classifier
![Page 11: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/11.jpg)
int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);
while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...
!11
Source Code Attribution
DC
A
B
E
F
ClassifierPc
![Page 12: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/12.jpg)
!12
Source Code Attribution
Classifier
{A, B, C, D, E}
Pc C
![Page 13: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/13.jpg)
!13
Source Code Attribution
ClassifierPc
{A, B, C, D, E}
C ✓Who the classifier thinks wrote this code.
![Page 14: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/14.jpg)
● Motivation and Research Question
● Source Code Attribution: Overview and Background
● Evading Source Code Attribution: Definitions and Goals ● Methodology
● Results: Conservative Estimate of Adversarial Success
● Results: How to Create Forgeries
!14
Outline
![Page 15: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/15.jpg)
1. Train: Given code from original and target authors, learn styles 2. Modify original code to imitate target author (forgery)
● Or just hide the original author’s style (masking)
!15
Evading Source Code Attribution
PcAdversarial manipulation Pc’
Code originally by C, but modified by an adversary.
![Page 16: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/16.jpg)
1. Train: Given code from original and target authors, learn styles 2. Modify original code to imitate target author (forgery)
● Or just hide the original author’s style (masking)
!16
Evading Source Code Attribution
PcAdversarial manipulation Pc’
{A, B, C, D, E}
Classifier A
Forgery
![Page 17: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/17.jpg)
● Motivation and Research Question
● Source Code Attribution: Overview and Background
● Evading Source Code Attribution: Definitions and Goals
● Methodology ● Results: Conservative Estimate of Adversarial Success
● Results: How to Create Forgeries
!17
Outline
![Page 18: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/18.jpg)
Lab Study: Dataset
● C code ● We used a linter1 to eliminate many typographic style differences ● ~4000 authors: avg 2.2 files each ● 5 authors with the most files: avg ~42.8 files
○ Authors: A, B, C, D, E
1 http://astyle.sourceforge.net/
![Page 19: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/19.jpg)
Lab Study: Create Forgeries
C5
{A, B, C, D, E}
Precision: 100% Recall: 100% (10-fold XV)
![Page 20: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/20.jpg)
Lab Study: Create Forgeries
C20
{A, B, C, D, E, ... + 15}
Precision: 87.6% Recall: 88.2% (10-fold XV)
![Page 21: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/21.jpg)
Lab Study: Create Forgeries
C50
{A, B, C, D, E, ... + 45}
Precision: 82.3% Recall: 84.5% (10-fold XV)
![Page 22: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/22.jpg)
!22
Lab Study: Create Forgeries
28 C programmers (participants): 1. Train: Given code from original and target author, learn styles 2. Modify original code to imitate target author’s style (forgery)
PxParticipant modifies Px
Classifier Y
Forgery
X, Y ∈ {A, B, C, D, E}
Px’
![Page 23: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/23.jpg)
!23
Lab Study: Create Forgeries
28 C programmers (participants): 1. Train: Given code from original and target author, learn styles 2. Modify original code to imitate target author’s style (forgery) 3. Check forgery success against oracle classifiers
X, Y ∈ {A, B, C, D, E}
Px’
XC5
YY
C20
C50
PxParticipant modifies Px
![Page 24: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/24.jpg)
● Motivation and Research Question
● Source Code Attribution: Overview and Background
● Evading Source Code Attribution: Definitions and Goals
● Methodology
● Results: Conservative Estimate of Adversarial Success ● Results: How to Create Forgeries
!24
Outline
![Page 25: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/25.jpg)
Percent of final forgery attempts that were successful attacks
C5 C20 C50
Forgery 66.6% 70.0% 73.0%
Masking 76.6% 76.6% 86.6%
!25
Versions of the state-of-the-art machine classifier. The subscript indicates the number of authors in the training set.
Results: Estimate of Adversarial Success
![Page 26: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/26.jpg)
C5 C20 C50
Forgery 66.6% 70.0% 73.0%
Masking 76.6% 76.6% 86.6%
Percent of final forgery attempts that were successful attacks !26
Forgery: adversary is pretending to be a specific target author. Masking: adversary is obscuring the original author.
Results: Estimate of Adversarial Success
![Page 27: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/27.jpg)
C5 C20 C50
Forgery 66.6% 70.0% 73.0%
Masking 76.6% 76.6% 86.6%
Percent of final forgery attempts that were successful attacks!27
A successful forgery attack means the classifier output the target author instead of the original author of the code. 66.6% of forgery attacks against the C5 classifier were successful.
Results: Estimate of Adversarial Success
![Page 28: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/28.jpg)
C5 C20 C50
Forgery 66.6% 70.0% 73.0%
Masking 76.6% 76.6% 86.6%
Percent of final forgery attempts that produced a misclassification!28
C50 attributed forgeries correctly only 13.4% of the time.
Results: Estimate of Adversarial Success
![Page 29: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/29.jpg)
Percent of final forgery attempts that produced a misclassification
Lesson: Non-experts can successfully attack this state-of-the-art classifier, suggesting other authorship classifiers may be vulnerable to the same type of attacks.
!29
Results: Estimate of Adversarial Success
C5 C20 C50
Forgery 66.6% 70.0% 73.0%
Masking 76.6% 76.6% 86.6%
![Page 30: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/30.jpg)
● Motivation and Research Question
● Source Code Attribution: Overview and Background
● Evading Source Code Attribution: Definitions and Goals
● Methodology
● Results: Conservative Estimate of Adversarial Success
● Results: How to Create Forgeries
!30
Outline
![Page 31: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/31.jpg)
Lesson: Forgers did not know the features the classifier was using for attribution. This suggests that forgeries in the wild might contain the same types of modifications.
!31
Results: Methods of Forgery Creation
![Page 32: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/32.jpg)
Example: Two Programs by Author C// libraries imported #define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n) // variables defined int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);
while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1;
// libraries imported #define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n) // variables defined int main() { int i, j, k, l, m, n, t, ok; int a, b, c; int size, count = 0; scanf ("%d", &size);
while (size--) { scanf ("%d%d", &n, &m); rep (i, m) { scanf ("%d", s + i);
![Page 33: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/33.jpg)
// libraries imported #define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n) // variables defined int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);
while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1;
Example: Two Programs by Author C// libraries imported #define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n) // variables defined int main() { int i, j, k, l, m, n, t, ok; int a, b, c; int size, count = 0; scanf ("%d", &size);
while (size--) { scanf ("%d%d", &n, &m); rep (i, m) { scanf ("%d", s + i);
![Page 34: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/34.jpg)
Example: Forgery of Author C
Information Structure
● Variable name ● Syntax ● Macros ● API calls
Control Flow
● Loop type
![Page 35: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/35.jpg)
Example: Creating a Forgery of Author C
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
!35Classifier output: A
![Page 36: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/36.jpg)
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
ORIGINAL FORGERY
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
!36Classifier output: A Classifier output: ??
![Page 37: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/37.jpg)
ORIGINAL FORGERY
!37Classifier output: A Classifier output: ??
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
![Page 38: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/38.jpg)
ORIGINAL FORGERY
!38Classifier output: A Classifier output: ??
#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)
int main() { int i,j,k; int size, count = 0; cin >> size; for(count=1;count<=size;count++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
![Page 39: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/39.jpg)
ORIGINAL FORGERY
!39Classifier output: A Classifier output: ??
#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)
int main() { int i,j,k; int size, count = 0; cin >> size; for(count=1;count<=size;count++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
![Page 40: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/40.jpg)
ORIGINAL FORGERY
!40Classifier output: A Classifier output: ??
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)
int main() { int i,j,k; int size, count = 0; scanf("%d", &size); for(count=1;count<=size;count++) { scanf("%d%d%d%d", &D, &I, &M, &N); for(i=0; i<N; i++) scanf("%d", original+i); ...
![Page 41: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/41.jpg)
ORIGINAL FORGERY
!41Classifier output: A Classifier output: ??
#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)
int main() { int i,j,k; int size, count = 0; scanf("%d", &size); while (size--) { scanf("%d%d%d%d", &D, &I, &M, &N); rep (i,N) scanf("%d", original+i); ...
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
![Page 42: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/42.jpg)
ORIGINAL FORGERY
!42Classifier output: A Classifier output: ??
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)
int main() { int i,j,k; int size, count = 0; scanf("%d", &size); while (size--) { scanf("%d%d%d%d", &D, &I, &M, &N); rep (i,N)scanf("%d", original+i); ...
![Page 43: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/43.jpg)
ORIGINAL FORGERY
!43Classifier output: A Classifier output: C
int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...
#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)
int main() { int i,j,k; int size, count = 0; scanf("%d", &size); while (size--) { scanf("%d%d%d%d", &D, &I, &M, &N); rep (i,N)scanf("%d", original+i); ...
![Page 44: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/44.jpg)
!44
Results: Methods of Forgery Creation
Information Structure
● Variable name ● Syntax ● Macros ● API calls
Control Flow
● Loop type
![Page 45: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/45.jpg)
!45
Results: Methods of Forgery Creation
Information Structure
● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location
Control Flow
● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic
![Page 46: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/46.jpg)
!46
Results: Methods of Forgery Creation
Information Structure
● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location
Control Flow
● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic
Local modifications: only need to understand a line or two of code Lo
cal
![Page 47: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/47.jpg)
!47
Results: Methods of Forgery Creation
Information Structure
● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location
Control Flow
● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic
Local modifications: only need to understand a line or two of code
Algorithmic modifications: need a more comprehensive understanding of the code
Loca
l
![Page 48: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/48.jpg)
!48
Results: Methods of Forgery Creation
Information Structure
● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location
● Variable type ● Data structures ● Static and dynamic
memory usage
Control Flow
● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic
Local modifications: only need to understand a line or two of code
Algorithmic modifications: need a more comprehensive understanding of the code
Loca
lA
lgor
ithm
ic
![Page 49: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/49.jpg)
!49
Results: Methods of Forgery Creation
Information Structure
● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location
● Variable type ● Data structures ● Static and dynamic
memory usage
Control Flow
● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic
Local modifications: only need to understand a line or two of code
Algorithmic modifications: need a more comprehensive understanding of the code
Loca
lA
lgor
ithm
ic
X
![Page 50: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/50.jpg)
!50
Results: Methods of Forgery Creation
Information Structure
● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location
● Variable type ● Data structures ● Static and dynamic
memory usage
Control Flow
● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic
● Functions refactored ● Inlined API calls ● Major addition or
removal of control structures
Local modifications: only need to understand a line or two of code
Algorithmic modifications: need a more comprehensive understanding of the code
Loca
lA
lgor
ithm
ic
X
![Page 51: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/51.jpg)
Lessons from methods of forgery creation:
● Local modifications are common.
● Some forgers copied code directly the target author’s training set.
!51
Results: Methods of Forgery Creation
![Page 52: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0db8ea7e708231d43bc2d4/html5/thumbnails/52.jpg)
● Programmers desiring privacy or with malicious intent may seek to
evade source code attribution classifiers
● Lab study with C programmers producing forgeries, showing
unsophisticated adversaries can fool a state of the art classifier
● Forgeries were successful with local changes that do not require a
high-level understanding of the programming style.
● More recommendations in paper! My coauthors: Luke Zettlemoyer, Tadayoshi Kohno Contact me: Lucy Simko, [email protected], https://homes.cs.washington.edu/~simkol/
!52
Summary