automatically generating search heuristics for concolic
TRANSCRIPT
Automatically Generating Search Heuristics for Concolic Testing
Sooyoung Cha
Korea University
ICSE'18 @Gothenburg, Sweden
(co-work with Seongjoon Hong, Junhee Lee, Hakjoo Oh)
2
Concolic Testing
● Concolic testing (Concrete and Symbolic executions)
– An effective software testing method.– SAGE : Find 30% of all Windows 7 WEX security bugs.
SAGE
3
Concolic Testing
● Concolic testing (Concrete and Symbolic executions)
– An effective software testing method.– SAGE : Find 30% of all Windows 7 WEX security bugs.
● Key Challenge: Path Explosion– # of execution paths: 2
● ex) grep-2.2(3,836) : 2 paths (worst case) ● Exploring all paths is impossible.
# of branches
3,836
SAGE
4
Search Heuristic
● Numerous search heuristics have been proposed.– DFS, BFS, Random, Generational, CFDS, CGS, etc
– Select branches that are likely to maximize code coverage.
b1
b2
b3
DFS(path1) → b3
path1
b1
b2
b3
path2
b8
5
Search Heuristic
● Numerous search heuristics have been proposed.– DFS, BFS, Random, Generational, CFDS, CGS, etc
– Select branches that are likely to maximize code coverage.
b1
b2
b3
BFS(path1) → b1
path1
b1
b2
b3
path2
b4
6
Search Heuristic
● Numerous search heuristics have been proposed.– DFS, BFS, Random, Generational, CFDS, CGS, etc
– Select branches that are likely to maximize code coverage.
b1
b2
b3
path1
b1
b2
b3
path2
S.Heuristic(path1) → b2b5
b6
b7
7
0 500 1000 1500 2000 2500 3000 3500 4000iterations
1000
2000
3000
4000
5000
6000
7000
8000
bra
nch
es
covere
d
vim-5.7
CFDS
CGS
DFS
Gen
Random
Motivation
● No existing heuristics consistently achieve high coverage.● Designing new heuristic is highly nontrivial.
– Search Heuristic (ICSE, FSE, ASE, NDSS, ...)→
0 500 1000 1500 2000 2500 3000 3500 4000iterations
600
700
800
900
1000
1100
1200
1300
bra
nch
es
covere
d
expat-2.1.0
CFDS
CGS
DFS
Gen
Random
8
Goal
● Automatically Generating Search Heuristics
● Key ideas– Parameterized Search Heuristic. – Effective Parameter Search Algorithm.
Input : C program.(e.g., vim, expat)
Svim
Ouput : Search Heuristic.(e.g., Svim, Sexpat)
vim.c Our Tool
9
Effectiveness
● Considerable increase in branch coverage.
● Found real-world performance bugs.– gawk-3.0.3: ./gawk ‘V0\\n^000000000000070000000' file– grep-2.2: ./grep ‘\(\)\1\+**’ file
● Trigger the error in grep-3.1 (the latest version)
0 500 1000 1500 2000 2500 3000 3500 4000iterations
1000
2000
3000
4000
5000
6000
7000
8000
9000
bran
ches
cov
ered
vim-5.7
CFDSCGSDFS
GenOURSRandom
0 500 1000 1500 2000 2500 3000 3500 4000iterations
600
700
800
900
1000
1100
1200
1300
1400
bran
ches
cov
ered
expat-2.1.0
CFDSCGSDFS
GenOURSRandom
10
Effectiveness
● Considerable increase in branch coverage.
● Found real-world performance bugs.– gawk-3.0.3: ./gawk ‘V0\\n^000000000000070000000' file– grep-2.2: ./grep ‘\(\)\1\+**’ file
● Trigger the error in grep-3.1 (the latest version)
`
11
Parameterized Search Heuristic
● Search Heuristicθ : Path Branch→– Generating a “good search heuristic” Finding a “good parameter → θ ”
B1
B2
B3
scoreθ(B1) = 0.1
scoreθ(B2) = 0.7
scoreθ(B3) = -0.5
12
Parameterized Search Heuristic
(1). Represent branches as feature vectors– A feature : a boolean predicate on branches.
● ex1) the branch in main function ? ● ex2) true branch of a case statement ?
B3 = 1, 0, 0, 0, 1⟨ ⟩
B1 = 1, 0, 1, 1, 0⟨ ⟩
B2 = 0, 1, 1, 1, 0⟨ ⟩
13
Parameterized Search Heuristic
● Design 40 features.– 12 static features
● extracted without executing a program.
(e.g., true branch of a loop)
– 28 dynamic features● extracted by the program execution.
(e.g., branch newly covered in the previous execution)
14
Parameterized Search Heuristic
(2). Scoring– The parameter : a k-dimension vector.
θ = -0.5, 0.1, 0.4, 0.2, 0⟨ ⟩– Linear combination of feature vector and parameter
● Scoreθ(B1) = 1, 0, 1, 1, 0 ⟨ ⟩ · -0.5, 0,1, 0.4, 0.2, 0 = 0.1⟨ ⟩● Scoreθ(B2) = 0, 1, 1, 1, 0 ⟨ ⟩ · -0.5, 0.1, 0.4, 0.2, 0 = ⟨ ⟩ 0.7● Scoreθ(B3) = 1, 0, 0, 0, 1 ⟨ ⟩ · -0.5, 0.1, 0.4, 0.2, 0 = -0.5⟨ ⟩
(3). Choosing the branch with the highest score– B2
15
Parameter Search Algorithm
● Finding good parameters is crucial.● Naive algorithm based on random sampling.
θ1 = -0.5, 0.1, 0.4, 0.2, 0 ⟨ ⟩ → Coverage(519)
θ2 = -0.9, 0.5, 0.9, -0.2, 1.0 ⟨ ⟩ → Coverage(423)
…
θn = 0.7, -0.2, -0.9, -0.9, 0.3 ⟨ ⟩ → Coverage(782)
– Failed to find good parameters.● Search space is intractably large.● Performance variation in concolic testing.
θ22
(best)Timeout
16
Parameter Search Algorithm
● Our Algorithm– Iteratively refine the sample search space via feedback.– Repeat the three steps. (Find, Check, Refine)
0. The 40 sample spaces are Initialized: [-1, 1]● θ = -0.5, 0.1, 0.4, 0.2, ..., 0⟨ ⟩
1. Find • Find good candidate parameters quickly. grep-2.2 + Top 10 θ’
↑[-1, 1]
↑[-1, 1]
↑[-1, 1]...
θ1(1,230),2(1,100), θ3(1,321), …, θ1,000(872)
17
Parameter Search Algorithm
● Our Algorithm 2. Check : rule out unreliable parameters.
3. Refine ● θt1 = ⟨ +0.3, −0.6, +0.6, ..., +0.8 ⟩● θt2 = ⟨ +0.7, −0.2, −0.7, ..., +0.8 ⟩
Avg θ′1(1,310), Avg θ′2(1,457), Avg θ′3(1,436), …, Avg θ10(1,500).grep-2.2 +
Top 2 ( θt1, θt2)
18
Parameter Search Algorithm
● Our Algorithm 2. Check
3. Refine● θt1 = ⟨ +0.3, −0.6, +0.6, ..., +0.8 ⟩● θt2 = ⟨ +0.7, −0.2, −0.7, ..., +0.8 ⟩
Avg θ′1(1,310), Avg θ′2(1,457), Avg θ′3(1,436), …, Avg θ10(1,500).grep-2.2 +
Top 2 ( θt1, θt2)
1st Sample Space: [-1, 1] [→ min(0.3, 0.7), 1] → [0.3, 1]
19
Parameter Search Algorithm
● Our Algorithm 2. Check
3. Refine● θt1 = ⟨ +0.3, −0.6, +0.6, ..., +0.8 ⟩● θt2 = ⟨ +0.7, −0.2, −0.7, ..., +0.8 ⟩
Avg θ′1(1,310), Avg θ′2(1,457), Avg θ′3(1,436), …, Avg θ10(1,500).grep-2.2 +
Top 2 ( θt1, θt2)
2nd Sample Space: [-1, 1] [-1, → max(-0.6, -0.2)] → [-1, -0.2]
20
Parameter Search Algorithm
● Our Algorithm 2. Check
3. Refine● θt1 = ⟨ +0.3, −0.6, +0.6, ..., +0.8 ⟩● θt2 = ⟨ +0.7, −0.2, −0.7, ..., +0.8 ⟩
Avg θ′1(1,310), Avg θ′2(1,457), Avg θ′3(1,436), …, Avg θ10(1,500).grep-2.2 +
Top 2 ( θt1, θt2)
3rd Sample Space: [-1, 1] [-1,1]→
21
Parameter Search Algorithm
● Our Algorithm– The 40 sample spaces are refined !
● θ = 0.5, -0.4, 0.4, 0.2, ..., 0⟨ ⟩
– ‘Find’ stage again !● Randomly sample the parameters in refined sample space !
↑[-1, -0.2]
↑[-1, 1]
↑[0.3, 1]
↑[-1, 1]...
22
Experiments
● Implemented in CREST● Compared with five existing heuristics
– CGS, CFDS, Random, Generational, DFS
● Used 10 open-source C programsProgram # Total branches LOC
vim-5.7 35,464 165K
gawk-3.0.3 8,038 30K
expat-2.1.0 8,500 49K
grep-2.2 3,836 15K
sed-1.17 2,656 9K
tree-1.6.0 1,438 4K
cdaudio 358 3K
floppy 268 2K
kbfiltr 204 1K
replace 196 0.5K
23
Evaluation Setting
● The same initial inputs● The same testing budget (4,000 executions)● Average branch coverage for 100 trials (50 for vim)
– 1 trial = 4,000 executions
240 500 1000 1500 2000 2500 3000 3500 4000
iterations
600
800
1000
1200
1400
1600
1800
bran
ches
cov
ered
grep-2.2
CFDSCGSDFS
GenOURSRandom
0 500 1000 1500 2000 2500 3000 3500 4000iterations
0
100
200
300
400
500
600
700
800
bran
ches
cov
ered
tree-1.6.0
CFDSCGSDFS
GenOURSRandom
0 500 1000 1500 2000 2500 3000 3500 4000iterations
1000
2000
3000
4000
5000
6000
7000
8000
9000
bran
ches
cov
ered
vim-5.7
CFDSCGSDFS
GenOURSRandom
0 500 1000 1500 2000 2500 3000 3500 4000iterations
500
1000
1500
2000
2500
3000
bran
ches
cov
ered
gawk-3.0.3
CFDSCGSDFS
GenOURSRandom
Effectiveness
● Average branch coverage (6 Smiles )
0 500 1000 1500 2000 2500 3000 3500 4000iterations
600
700
800
900
1000
1100
1200
1300
1400
bran
ches
cov
ered
expat-2.1.0
CFDSCGSDFS
GenOURSRandom
0 500 1000 1500 2000 2500 3000 3500 4000iterations
0
100
200
300
400
500
600
700
800
bran
ches
cov
ered
sed-1.17
CFDSCGSDFS
GenOURSRandom
25
1.17(1993.05)
1.18(1993.06)
2.05(1994.05)
version
500
600
700
800
900
1000
aver
age
cove
rage
sed
OURSCGSCFDS
RandomDFSGen
3.0.3(1997.05)
3.0.4(1999.06)
3.0.5(2000.06)
3.0.6(2000.08)
3.1.0(2001.06)
version
1000
1500
2000
2500
aver
age
cove
rage
gawk
OURSCGSCFDS
RandomDFSGen
Effectiveness
● Reusable over multiple subsequent programs
● Time for obtaining the heuristics (with 20 cores)– vim-5.7(24 h), expat-2.1.0(10h), grep-2.2(5h), tree-1.6.0(3h)
(4 Years)
3.0.3 3.0.4 3.0.6 3.1.0
(A Year)
1.17 2.053.0.5version
1.18version
Important Features
● No winning feature always belongs to top 10 features.● Depending the program, the role of feature changes. (feature 10)
Top 10 positive features Top 10 negative features
Search Heuristic should be adaptively tuned for each program.