hydiff: hybrid differential software analysis · shallow bugs, but bad in finding deep program...

10
HyDi: Hybrid Differential Software Analysis 1 [email protected] International Conference on Software Engineering (ICSE) 2020 Yannic Noller Corina S. Pasareanu Marcel Böhme Youcheng Sun Hoang Lam Nguyen Lars Grunske

Upload: others

Post on 31-Dec-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

HyDiff Hybrid Differential Software Analysis

1yannicnollerhu-berlinde International Conference on Software Engineering (ICSE) 2020

Yannic Noller Corina S Pasareanu Marcel Boumlhme

Youcheng Sun Hoang Lam Nguyen Lars Grunske

Differential Analysis

yannicnollerhu-berlinde 2International Conference on Software Engineering (ICSE) 2020

input1

program P

input2

program P

2

=behavior1

behavior2

x

yinput

program P

program Prsquo

1

=behavior1

behavior2

x

Regression AnalysisSide-Channel Analysis

Robustness Analysis of Neural Networks

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 3International Conference on Software Engineering (ICSE) 2020

Inputprogram versions

seed input files

change-annotated program

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

Output

symbc outputqueue

input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

good in finding shallow bugs but bad in finding deep program

paths

input reasoning ability but path explosion and

constraint solving

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 4International Conference on Software Engineering (ICSE) 2020

Inputprogram versions

seed input files

change-annotated program

24 differential program analysis 21

Table 1 Change-Annotations by Shadow Symbolic Execution [141]

Change Type Example

Update assignment x = x + change(E1 E2)

Update condition if(change(E1 E2))

Add extra assignment x = change(x E)

Remove assignment x = change(E x)

Add conditional if(change(false C))

Remove conditional if(change(C false))

Remove code if(change(true false))

Add code if(change(false true))

existing assignment x = x + change(E1 E2) the variable x holds two expressions x +

E2 for the new version and x + E1 for the old versionSSE performs dynamic symbolic-execution on such a unified program version which is

implemented in two phases (1) the concolic phase and (2) the bounded symbolic execution(BSE) phase

In the first phase SSE simply follows the concrete execution of test inputs from an ex-isting test suite while it checks for divergences along the control-flow of the two versionsThis exploration is driven by the idea of four-way forking In traditional symbolic executionevery branching condition introduces two forks to explore the true and false branchesShadow symbolic execution instead introduces four forks to investigate all four combina-tions of true and false branches for both program versions As long as there is no concretedivergence SSE follows the so-called sameTrue and sameFalse branch which denotes thatboth concrete executions take the same branches Additionally SSE checks the satisfiabilityof the path constraints for the other two branching options where both versions take dif-ference branches These branches are called diffTrue and diffFalse paths For every feasiblediff path SSE generates a concrete input and stores the divergence point for later explo-ration by the second phase As long there is no concrete divergence SSE continues untilthe end of the program

When SSE hits the mentioned addition or a removal of straightline code blocks itimmediately stores a divergence point This conservative handling leads to an over-approximation of the diff paths because the added deleted code may not necessarilylead to an actual divergence

The second phase performs bounded symbolic execution (BSE) only on the new versionfrom the stored divergence points to further investigate the divergences

At the end Palikareva et al [141] perform some post-processing of the generated inputsto determine whether they expose some observable differences eg by comparing theoutputs and the exit codes Palikareva et al [141] implemented their approach on top ofthe KLEE symbolic execution engine [33]

Limitations of Shadow Symbolic Execution Shadow symbolic execution as introducedby Palikareva et al [141] is driven by concrete inputs from an existing test suite While thisexploration strategy tries to focus on constraining the search space it might miss importantdivergences as it strongly depends on the quality of these initial test input In particular SSEmight miss deeper divergences in the BSE phase because of limiting prefixes in the pathconstraints Since BSE is started from the identified divergence points it inherits the pathconstraint prefix from the concrete input that has been followed to find this divergence Ingeneral when there are several paths from the beginning of the program to this divergence

[ March 29 2020 at 1153 ndash classicthesis version 01 ]

change-annotations by Palikareva et al [2]

input = change(input1 input2)

HyDiffrsquos Inputseed inputs

program under test

two different change types(1) inside the program code (2) in the input

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 5International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Greybox Fuzzing (DF)built upon AFL [1] (genetic algorithm)

mutant selection driven by differential heuristics output difference decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 6International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Symbolic Execution (DSE)built upon Symbolic PathFinder (SPF) [3]

central data structure trie

node selection driven by differential heuristics decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 7International Conference on Software Engineering (ICSE) 2020

Output input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

HyDiffrsquos Outputset of generated inputs

classified by divergence output difference (+odiff) control-flow (+ddiff) crashing behavior (+crash) execution cost (+cdiff)

additionally patch distance (+patch-dist) branch coverage (+cov)

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 8International Conference on Software Engineering (ICSE) 2020

ExperimentsRegression Analysis

Side-Channel Analysis

Robustness Analysis of Neural Networks

HyDiff classifies all subjects correctly

significantly more output and decision differences

HyDiff shows good trade-off between DSE and DF

no significant amplification of the exploration

stress test for HyDiff

HyDiff significantly more output differences

Problem Solution SummaryEvaluation

9yannicnollerhu-berlinde

HyDiff Hybrid Differential Software Analysis

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

Problem Solution SummaryEvaluation

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020

Page 2: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

Differential Analysis

yannicnollerhu-berlinde 2International Conference on Software Engineering (ICSE) 2020

input1

program P

input2

program P

2

=behavior1

behavior2

x

yinput

program P

program Prsquo

1

=behavior1

behavior2

x

Regression AnalysisSide-Channel Analysis

Robustness Analysis of Neural Networks

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 3International Conference on Software Engineering (ICSE) 2020

Inputprogram versions

seed input files

change-annotated program

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

Output

symbc outputqueue

input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

good in finding shallow bugs but bad in finding deep program

paths

input reasoning ability but path explosion and

constraint solving

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 4International Conference on Software Engineering (ICSE) 2020

Inputprogram versions

seed input files

change-annotated program

24 differential program analysis 21

Table 1 Change-Annotations by Shadow Symbolic Execution [141]

Change Type Example

Update assignment x = x + change(E1 E2)

Update condition if(change(E1 E2))

Add extra assignment x = change(x E)

Remove assignment x = change(E x)

Add conditional if(change(false C))

Remove conditional if(change(C false))

Remove code if(change(true false))

Add code if(change(false true))

existing assignment x = x + change(E1 E2) the variable x holds two expressions x +

E2 for the new version and x + E1 for the old versionSSE performs dynamic symbolic-execution on such a unified program version which is

implemented in two phases (1) the concolic phase and (2) the bounded symbolic execution(BSE) phase

In the first phase SSE simply follows the concrete execution of test inputs from an ex-isting test suite while it checks for divergences along the control-flow of the two versionsThis exploration is driven by the idea of four-way forking In traditional symbolic executionevery branching condition introduces two forks to explore the true and false branchesShadow symbolic execution instead introduces four forks to investigate all four combina-tions of true and false branches for both program versions As long as there is no concretedivergence SSE follows the so-called sameTrue and sameFalse branch which denotes thatboth concrete executions take the same branches Additionally SSE checks the satisfiabilityof the path constraints for the other two branching options where both versions take dif-ference branches These branches are called diffTrue and diffFalse paths For every feasiblediff path SSE generates a concrete input and stores the divergence point for later explo-ration by the second phase As long there is no concrete divergence SSE continues untilthe end of the program

When SSE hits the mentioned addition or a removal of straightline code blocks itimmediately stores a divergence point This conservative handling leads to an over-approximation of the diff paths because the added deleted code may not necessarilylead to an actual divergence

The second phase performs bounded symbolic execution (BSE) only on the new versionfrom the stored divergence points to further investigate the divergences

At the end Palikareva et al [141] perform some post-processing of the generated inputsto determine whether they expose some observable differences eg by comparing theoutputs and the exit codes Palikareva et al [141] implemented their approach on top ofthe KLEE symbolic execution engine [33]

Limitations of Shadow Symbolic Execution Shadow symbolic execution as introducedby Palikareva et al [141] is driven by concrete inputs from an existing test suite While thisexploration strategy tries to focus on constraining the search space it might miss importantdivergences as it strongly depends on the quality of these initial test input In particular SSEmight miss deeper divergences in the BSE phase because of limiting prefixes in the pathconstraints Since BSE is started from the identified divergence points it inherits the pathconstraint prefix from the concrete input that has been followed to find this divergence Ingeneral when there are several paths from the beginning of the program to this divergence

[ March 29 2020 at 1153 ndash classicthesis version 01 ]

change-annotations by Palikareva et al [2]

input = change(input1 input2)

HyDiffrsquos Inputseed inputs

program under test

two different change types(1) inside the program code (2) in the input

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 5International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Greybox Fuzzing (DF)built upon AFL [1] (genetic algorithm)

mutant selection driven by differential heuristics output difference decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 6International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Symbolic Execution (DSE)built upon Symbolic PathFinder (SPF) [3]

central data structure trie

node selection driven by differential heuristics decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 7International Conference on Software Engineering (ICSE) 2020

Output input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

HyDiffrsquos Outputset of generated inputs

classified by divergence output difference (+odiff) control-flow (+ddiff) crashing behavior (+crash) execution cost (+cdiff)

additionally patch distance (+patch-dist) branch coverage (+cov)

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 8International Conference on Software Engineering (ICSE) 2020

ExperimentsRegression Analysis

Side-Channel Analysis

Robustness Analysis of Neural Networks

HyDiff classifies all subjects correctly

significantly more output and decision differences

HyDiff shows good trade-off between DSE and DF

no significant amplification of the exploration

stress test for HyDiff

HyDiff significantly more output differences

Problem Solution SummaryEvaluation

9yannicnollerhu-berlinde

HyDiff Hybrid Differential Software Analysis

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

Problem Solution SummaryEvaluation

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020

Page 3: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

yannicnollerhu-berlinde 3International Conference on Software Engineering (ICSE) 2020

Inputprogram versions

seed input files

change-annotated program

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

Output

symbc outputqueue

input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

good in finding shallow bugs but bad in finding deep program

paths

input reasoning ability but path explosion and

constraint solving

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 4International Conference on Software Engineering (ICSE) 2020

Inputprogram versions

seed input files

change-annotated program

24 differential program analysis 21

Table 1 Change-Annotations by Shadow Symbolic Execution [141]

Change Type Example

Update assignment x = x + change(E1 E2)

Update condition if(change(E1 E2))

Add extra assignment x = change(x E)

Remove assignment x = change(E x)

Add conditional if(change(false C))

Remove conditional if(change(C false))

Remove code if(change(true false))

Add code if(change(false true))

existing assignment x = x + change(E1 E2) the variable x holds two expressions x +

E2 for the new version and x + E1 for the old versionSSE performs dynamic symbolic-execution on such a unified program version which is

implemented in two phases (1) the concolic phase and (2) the bounded symbolic execution(BSE) phase

In the first phase SSE simply follows the concrete execution of test inputs from an ex-isting test suite while it checks for divergences along the control-flow of the two versionsThis exploration is driven by the idea of four-way forking In traditional symbolic executionevery branching condition introduces two forks to explore the true and false branchesShadow symbolic execution instead introduces four forks to investigate all four combina-tions of true and false branches for both program versions As long as there is no concretedivergence SSE follows the so-called sameTrue and sameFalse branch which denotes thatboth concrete executions take the same branches Additionally SSE checks the satisfiabilityof the path constraints for the other two branching options where both versions take dif-ference branches These branches are called diffTrue and diffFalse paths For every feasiblediff path SSE generates a concrete input and stores the divergence point for later explo-ration by the second phase As long there is no concrete divergence SSE continues untilthe end of the program

When SSE hits the mentioned addition or a removal of straightline code blocks itimmediately stores a divergence point This conservative handling leads to an over-approximation of the diff paths because the added deleted code may not necessarilylead to an actual divergence

The second phase performs bounded symbolic execution (BSE) only on the new versionfrom the stored divergence points to further investigate the divergences

At the end Palikareva et al [141] perform some post-processing of the generated inputsto determine whether they expose some observable differences eg by comparing theoutputs and the exit codes Palikareva et al [141] implemented their approach on top ofthe KLEE symbolic execution engine [33]

Limitations of Shadow Symbolic Execution Shadow symbolic execution as introducedby Palikareva et al [141] is driven by concrete inputs from an existing test suite While thisexploration strategy tries to focus on constraining the search space it might miss importantdivergences as it strongly depends on the quality of these initial test input In particular SSEmight miss deeper divergences in the BSE phase because of limiting prefixes in the pathconstraints Since BSE is started from the identified divergence points it inherits the pathconstraint prefix from the concrete input that has been followed to find this divergence Ingeneral when there are several paths from the beginning of the program to this divergence

[ March 29 2020 at 1153 ndash classicthesis version 01 ]

change-annotations by Palikareva et al [2]

input = change(input1 input2)

HyDiffrsquos Inputseed inputs

program under test

two different change types(1) inside the program code (2) in the input

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 5International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Greybox Fuzzing (DF)built upon AFL [1] (genetic algorithm)

mutant selection driven by differential heuristics output difference decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 6International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Symbolic Execution (DSE)built upon Symbolic PathFinder (SPF) [3]

central data structure trie

node selection driven by differential heuristics decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 7International Conference on Software Engineering (ICSE) 2020

Output input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

HyDiffrsquos Outputset of generated inputs

classified by divergence output difference (+odiff) control-flow (+ddiff) crashing behavior (+crash) execution cost (+cdiff)

additionally patch distance (+patch-dist) branch coverage (+cov)

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 8International Conference on Software Engineering (ICSE) 2020

ExperimentsRegression Analysis

Side-Channel Analysis

Robustness Analysis of Neural Networks

HyDiff classifies all subjects correctly

significantly more output and decision differences

HyDiff shows good trade-off between DSE and DF

no significant amplification of the exploration

stress test for HyDiff

HyDiff significantly more output differences

Problem Solution SummaryEvaluation

9yannicnollerhu-berlinde

HyDiff Hybrid Differential Software Analysis

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

Problem Solution SummaryEvaluation

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020

Page 4: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

yannicnollerhu-berlinde 4International Conference on Software Engineering (ICSE) 2020

Inputprogram versions

seed input files

change-annotated program

24 differential program analysis 21

Table 1 Change-Annotations by Shadow Symbolic Execution [141]

Change Type Example

Update assignment x = x + change(E1 E2)

Update condition if(change(E1 E2))

Add extra assignment x = change(x E)

Remove assignment x = change(E x)

Add conditional if(change(false C))

Remove conditional if(change(C false))

Remove code if(change(true false))

Add code if(change(false true))

existing assignment x = x + change(E1 E2) the variable x holds two expressions x +

E2 for the new version and x + E1 for the old versionSSE performs dynamic symbolic-execution on such a unified program version which is

implemented in two phases (1) the concolic phase and (2) the bounded symbolic execution(BSE) phase

In the first phase SSE simply follows the concrete execution of test inputs from an ex-isting test suite while it checks for divergences along the control-flow of the two versionsThis exploration is driven by the idea of four-way forking In traditional symbolic executionevery branching condition introduces two forks to explore the true and false branchesShadow symbolic execution instead introduces four forks to investigate all four combina-tions of true and false branches for both program versions As long as there is no concretedivergence SSE follows the so-called sameTrue and sameFalse branch which denotes thatboth concrete executions take the same branches Additionally SSE checks the satisfiabilityof the path constraints for the other two branching options where both versions take dif-ference branches These branches are called diffTrue and diffFalse paths For every feasiblediff path SSE generates a concrete input and stores the divergence point for later explo-ration by the second phase As long there is no concrete divergence SSE continues untilthe end of the program

When SSE hits the mentioned addition or a removal of straightline code blocks itimmediately stores a divergence point This conservative handling leads to an over-approximation of the diff paths because the added deleted code may not necessarilylead to an actual divergence

The second phase performs bounded symbolic execution (BSE) only on the new versionfrom the stored divergence points to further investigate the divergences

At the end Palikareva et al [141] perform some post-processing of the generated inputsto determine whether they expose some observable differences eg by comparing theoutputs and the exit codes Palikareva et al [141] implemented their approach on top ofthe KLEE symbolic execution engine [33]

Limitations of Shadow Symbolic Execution Shadow symbolic execution as introducedby Palikareva et al [141] is driven by concrete inputs from an existing test suite While thisexploration strategy tries to focus on constraining the search space it might miss importantdivergences as it strongly depends on the quality of these initial test input In particular SSEmight miss deeper divergences in the BSE phase because of limiting prefixes in the pathconstraints Since BSE is started from the identified divergence points it inherits the pathconstraint prefix from the concrete input that has been followed to find this divergence Ingeneral when there are several paths from the beginning of the program to this divergence

[ March 29 2020 at 1153 ndash classicthesis version 01 ]

change-annotations by Palikareva et al [2]

input = change(input1 input2)

HyDiffrsquos Inputseed inputs

program under test

two different change types(1) inside the program code (2) in the input

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 5International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Greybox Fuzzing (DF)built upon AFL [1] (genetic algorithm)

mutant selection driven by differential heuristics output difference decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 6International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Symbolic Execution (DSE)built upon Symbolic PathFinder (SPF) [3]

central data structure trie

node selection driven by differential heuristics decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 7International Conference on Software Engineering (ICSE) 2020

Output input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

HyDiffrsquos Outputset of generated inputs

classified by divergence output difference (+odiff) control-flow (+ddiff) crashing behavior (+crash) execution cost (+cdiff)

additionally patch distance (+patch-dist) branch coverage (+cov)

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 8International Conference on Software Engineering (ICSE) 2020

ExperimentsRegression Analysis

Side-Channel Analysis

Robustness Analysis of Neural Networks

HyDiff classifies all subjects correctly

significantly more output and decision differences

HyDiff shows good trade-off between DSE and DF

no significant amplification of the exploration

stress test for HyDiff

HyDiff significantly more output differences

Problem Solution SummaryEvaluation

9yannicnollerhu-berlinde

HyDiff Hybrid Differential Software Analysis

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

Problem Solution SummaryEvaluation

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020

Page 5: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

yannicnollerhu-berlinde 5International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Greybox Fuzzing (DF)built upon AFL [1] (genetic algorithm)

mutant selection driven by differential heuristics output difference decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 6International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Symbolic Execution (DSE)built upon Symbolic PathFinder (SPF) [3]

central data structure trie

node selection driven by differential heuristics decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 7International Conference on Software Engineering (ICSE) 2020

Output input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

HyDiffrsquos Outputset of generated inputs

classified by divergence output difference (+odiff) control-flow (+ddiff) crashing behavior (+crash) execution cost (+cdiff)

additionally patch distance (+patch-dist) branch coverage (+cov)

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 8International Conference on Software Engineering (ICSE) 2020

ExperimentsRegression Analysis

Side-Channel Analysis

Robustness Analysis of Neural Networks

HyDiff classifies all subjects correctly

significantly more output and decision differences

HyDiff shows good trade-off between DSE and DF

no significant amplification of the exploration

stress test for HyDiff

HyDiff significantly more output differences

Problem Solution SummaryEvaluation

9yannicnollerhu-berlinde

HyDiff Hybrid Differential Software Analysis

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

Problem Solution SummaryEvaluation

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020

Page 6: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

yannicnollerhu-berlinde 6International Conference on Software Engineering (ICSE) 2020

Fuzzing Symbolic Execution

import

HyD

iff

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Symbolic Execution (DSE)built upon Symbolic PathFinder (SPF) [3]

central data structure trie

node selection driven by differential heuristics decision history difference cost difference patch distance

additionally guided by branch coverage

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 7International Conference on Software Engineering (ICSE) 2020

Output input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

HyDiffrsquos Outputset of generated inputs

classified by divergence output difference (+odiff) control-flow (+ddiff) crashing behavior (+crash) execution cost (+cdiff)

additionally patch distance (+patch-dist) branch coverage (+cov)

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 8International Conference on Software Engineering (ICSE) 2020

ExperimentsRegression Analysis

Side-Channel Analysis

Robustness Analysis of Neural Networks

HyDiff classifies all subjects correctly

significantly more output and decision differences

HyDiff shows good trade-off between DSE and DF

no significant amplification of the exploration

stress test for HyDiff

HyDiff significantly more output differences

Problem Solution SummaryEvaluation

9yannicnollerhu-berlinde

HyDiff Hybrid Differential Software Analysis

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

Problem Solution SummaryEvaluation

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020

Page 7: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

yannicnollerhu-berlinde 7International Conference on Software Engineering (ICSE) 2020

Output input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

HyDiffrsquos Outputset of generated inputs

classified by divergence output difference (+odiff) control-flow (+ddiff) crashing behavior (+crash) execution cost (+cdiff)

additionally patch distance (+patch-dist) branch coverage (+cov)

Problem Solution SummaryEvaluation

yannicnollerhu-berlinde 8International Conference on Software Engineering (ICSE) 2020

ExperimentsRegression Analysis

Side-Channel Analysis

Robustness Analysis of Neural Networks

HyDiff classifies all subjects correctly

significantly more output and decision differences

HyDiff shows good trade-off between DSE and DF

no significant amplification of the exploration

stress test for HyDiff

HyDiff significantly more output differences

Problem Solution SummaryEvaluation

9yannicnollerhu-berlinde

HyDiff Hybrid Differential Software Analysis

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

Problem Solution SummaryEvaluation

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020

Page 8: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

yannicnollerhu-berlinde 8International Conference on Software Engineering (ICSE) 2020

ExperimentsRegression Analysis

Side-Channel Analysis

Robustness Analysis of Neural Networks

HyDiff classifies all subjects correctly

significantly more output and decision differences

HyDiff shows good trade-off between DSE and DF

no significant amplification of the exploration

stress test for HyDiff

HyDiff significantly more output differences

Problem Solution SummaryEvaluation

9yannicnollerhu-berlinde

HyDiff Hybrid Differential Software Analysis

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

Problem Solution SummaryEvaluation

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020

Page 9: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

9yannicnollerhu-berlinde

HyDiff Hybrid Differential Software Analysis

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

Problem Solution SummaryEvaluation

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020

Page 10: HyDiff: Hybrid Differential Software Analysis · shallow bugs, but bad in finding deep program paths input reasoning ability, but path explosion and constraint solving Problem Solution

References

10yannicnollerhu-berlinde

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

International Conference on Software Engineering (ICSE) 2020