bounded model checking for c programs in an enterprise environment

24
Bounded Model Checking for C Programs in an Enterprise Environment Michael Tautschnig Amazon Web Services & Queen Mary University of London

Upload: adacore

Post on 20-Jan-2017

241 views

Category:

Technology


1 download

TRANSCRIPT

Bounded Model Checking for C Programs in an Enterprise Environment

Michael TautschnigAmazon Web Services & Queen Mary University of London

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Customer: I would like to get a guarantee that there are no security bugs in this software.

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

“Software”

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

“Software” eco system of can’t be published, but …

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Ample Open-Source Software “out there”

• Debian (http://sources.debian.net/stats/ 21st October 2016) • 26,900 source packages • 13,736,903 individual source files • 1,276,743,654 lines of source code (any programming language) • 45.5% (approx 500M) C code, 22.2% C++, 5.6% shell, 4.7% Java

• SourceForge, github, CodePlex, ...: how to automate any kind of analysis?

• Distributions (RedHat, Ubuntu/Debian, SuSE, … - but also industrial set ups)! • Software organised in source packages • Uniform interface to access/download packages • Uniform build interface, dependency management

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

How?

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Building one Source Package: Compiler Tool-chain

• For now: C source code only

• goto-cc (part of CBMC distribution) • Uses compiler’s (here: GCC’s) preprocessor • Own C parser/front end (no Cil, LLVM, EDG, ...) • Supports GCC, Visual Studio, CodeWarrior, ARM-CC dialects and command

line options • Builds intermediate representation understood by CBMC/CProver tools • Linking of compiled files/archives/libraries

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Supporting arbitrary Build Systems

• Builds are performed in chroot environments • /usr/bin/gcc and /usr/bin/ld replaced by scripts invoking goto-cc (+ more work) • Key procedure:

1. Run real compiler/linker (gcc/ld) 2. Compile/link using goto-cc 3. Add result as additional ELF section

• Resulting file remains executable • Stable under file renaming, archiving, etc. • Linking stage extracts intermediate representation from extra ELF section

x86 binary CProver

IR

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Building Thousands of Packages

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Infrastructure: (Ab-)using Jenkins

Scripts, notes, configuration: https://github.com/tautschnig/cprover-debian

Jenkins master: 4 cores, 64 GB

5 slave nodes: each 64 cores,

256 GB memoryUltimate Debian

Database: Package versions, bugsSQL

SSH

Debian mirror: source archives

FTP

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Current per-package Work Flow

Compile, linkStore archive of all object

files/executables

dump-c: create human-

readable C code from IR

Add generic assertions (pointer checks,

arithmetic overflow, no-

NaN, ...)Run CBMC

w/unwinding bound 1, Z3/

Minisat (DAC’03,

TACAS’04, CAV’13)

Loop acceleration

(CAV’13)

Re-compile using goto-cc

Static weak memory cycles

(TOPLAS/PLDI’14)

re-compile using gcc (errors not

fatal)

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Results?

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Exercising Language Front Ends

Compile, link

Store archive of all object

files/executables

dump-c: create human-

readable C code from IR

Add generic assertions (pointer checks,

arithmetic overflow, no-

NaN, ...)Run CBMC

w/unwinding bound 1, Z3/

Minisat (DAC’03,

TACAS’04, CAV’13)

Loop acceleration

(CAV’13)

Re-compile using goto-cc

Static weak memory cycles

(TOPLAS/PLDI’14)

re-compile using gcc (errors not

fatal)

+

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Exercising Language Front Ends

• Many bug fixes and improvements to the parser, type checker • Re-engineering of parts of the linker • Bug fixes in IR construction

• Compilation (without further analysis steps) of entire archive: ~2 days • > 250 GB of compressed archives of IR object files/executables

• 10314 archives available:

http://theory.eecs.qmul.ac.uk/debian+mole/pkgs/

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Results for relevant to Practitioners: Bug Reports

• Key feature: type checking at link time • 844 bugs reported, 530 already fixed by developers • Hundreds still to be reported

• http://bugs.debian.org/cgi-bin/[email protected]&tag=goto-cc&archive=both

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Reporting bugs

Automated Testing using SMID | Michael Tautschnig

Where are the cats?

• CAV’14: J. Alglave, D. Kroening, V. Nimal, D. Poetzl: Don't sit on the fence: A static analysis approach to automatic fence insertion

• PLDI’14/TOPLAS: J. Alglave, L. Maranget, M. Tautschnig: Herding Cats - Modelling, simulation, testing, and data-mining for weak memory (cited in Linux Weekly News and C/C++ WG21/N4036)

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Focus on improving/developing Methods

Compile, linkStore archive of all object

files/executables

dump-c: create human-

readable C code from IR

Add generic assertions (pointer checks,

arithmetic overflow, no-

NaN, ...)Run CBMC

w/unwinding bound 1, Z3/

Minisat (DAC’03,

TACAS’04, CAV’13)

Loop acceleration

(CAV’13)

Re-compile using goto-cc

Static weak memory cycles

(TOPLAS/PLDI’14)

re-compile using gcc (errors not

fatal)

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

TOPLAS/PLDI’14: analysing 200 million LOC for potential weak memory susceptibility

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Automated Information Leak Detection

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Analysing the Patched Version

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Overall Analysis Status (preliminary!)

Compile, linkStore archive of all object

files/executables

dump-c: create human-

readable C code from IR

Add generic assertions (pointer checks,

arithmetic overflow, no-

NaN, ...)Run CBMC

w/unwinding bound 1, Z3/

Minisat (DAC’03,

TACAS’04, CAV’13)

Loop acceleration

(CAV’13)

Re-compile using goto-cc

Static weak memory cycles

(TOPLAS/PLDI’14)

re-compile using gcc (errors not

fatal)

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig

Overall Analysis Status (preliminary!)

• In addition to 314 bugs reported and not yet fixed: 4915 packages with error reports - top causes:

1789 CBMC counterexamples (including several using loop acceleration) 1711 Loop acceleration bugs 200 Floating point support in Z3 back end 198 Type-inconsistent access to heap with symbolic offset 129 CBMC Out-of-memory 54 Parameter counts differ 48 Conflicting array sizes 46 Conflicting types 42 Conflicting struct types 32 Conflicting return types (byte size)

Questions

Software? Yes.

Guarantees? Sometimes.