language-based information flow bruno pontes soares rocha security group tu eindhoven department of...

Post on 12-Jan-2016

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Language-basedinformation flow

Bruno Pontes Soares RochaSecurity Group

TU EindhovenDepartment of Mathematics and Computer Science

Language-based information flow 2

MotivationWhy information flow?Why language-based?

Language-based information flow 3

Confidential data

• Networked information systems:– PCs store passwords, email, finances,…– Businesses rely on computing

infrastructure– Military & government communications

• Security of data and infrastructure is critical [Trust in Cyberspace, Schneider et al. ’99]

• How to protect confidential data?[Some slides borrowed from Steve Zdancewic’s talk in FOSAD’08, in Bertinoro, Italy]

Language-based information flow 4

Technical Challenges

• Software is large and complex– Famous bugs: e.g. MS HotMail– Buffer overflows

• Security policies become complex– Mutually distrusting parties

• Requires tools & automation• Look at traditional security concerns to set

the context…– Confidentiality– Integrity– Availability

Language-based information flow 5

Quality 1: Confidentiality

• Keep data or actions secret• Related to: privacy, anonymity, secrecy• Examples:

– Coca-cola secret formula– Medical information– Personal records (e.g. credit card information)– Military secrets

Language-based information flow 6

Quality 2: Integrity

• Protect the reliability of data against unauthorized tampering

• Related to: corruption, forgery, consistency• Example:– Bank statement agrees with ATM transactions– The mail you send is what arrives

Language-based information flow 7

Quality 3: Availability

• Resources usable in timely fashion by authorized principals

• Related to: reliability, fault tolerance, denial of service

• Example:– You want the web-server to reply to your requests– The military communication devices must work

Language-based information flow 8

Information-flow Policy

• Downloadable financial planner:

• Access control insufficient• Encryption necessary, but not end-to-end

Access Control

• There is a program in your computer, right now, which is allowed to:– Read all your personal folders– Connect to a unknown network host and

send this information– Erase important data– Modify the permissions on your data,

allowing others to see it

• Which program is this?Language-based information flow 9

Access Control

Language-based information flow 10

Language-based information flow 11

Access Control

• Access Control– e.g. File permissions– Access control lists or capabilities– Modern variants: stack inspection

• Drawback:– Does not regulate propagation of

information after permission has been granted

Language-based information flow 12

Cryptography

• Essential for:– Protecting confidentiality & integrity of

data transmitted via untrusted media– Authentication protocols

• Drawbacks:– Impractical to compute with encrypted

data!• There are secret sharing techniques

– Doesn’t prevent information propagation once decrypted

Language-based information flow 13

Requirements

• Need a way to distinguish confidential information from public information– Some simple policy language

• Need a way to track the effects of computation with respect to secrets– When is a secret leaked?

• Need a way to securely and efficiently enforce the policy

Language-based information flow 14

A practical example

• A mobile phone application• It uses your contact list to synchronize

with your friends• Therefore, it reads your contact list

information, and also accesses the network

• However, you don’t want it to send you contact information over the network

• How do you specify/enforce that?

Language-based information flow 15

One approach: capabilitiesOk, we know what information flows are…

Language-based information flow 16

A sealer/unsealer mechanism

• Sensitive data is sealed by a protection mechanism• Only certified and/or isolated code can access

(unseal) it• It can only flow to output devices for which is

allowed to go

Language-based information flow 17

Looks nice…

• It has been done: Google Caja• One little problem: what if the destination output

channel depends on the content of the data itself?– e.g. the app should send an email to a contact if it has an

email address attached– otherwise, it should send a SMS

• Caja: allow external modules to return a boolean covert channel attack is possible

• What if we prohibit external modules at all? Impractical for real applications

• It just feels a little awkward… what if we bring the concept inside the language?

Language-based information flow 18

Language-basedinformation flowTa-da!

Language-based information flow 19

One idea: dynamic tags

• Add a “tag” to each piece of data– Tags: hi (secret) or low (public)

• Modify every operation of the program to propagate tags– e.g.: (1:hi) + (2:low) (3:hi)

• Assign “policy” to communication channels– e.g.: all data sent over network must have low

tag– Check at run time whether policy is met

Language-based information flow 20

Example of dynamic tagging

Variable c will have tag hi, and the output check will fail. Great!

int a = input_int(hi);int b = input_int(low);int c = (a + b) / 2;output_int(low, c);

Language-based information flow 21

Problem with dynamic tags

What is variable c’s tag at the output?

int a = input_int(hi);int c = 1;if (a > 17) {

c = 0;}output_int(low, c);

Language-based information flow 22

It gets worse

What if function f itself does output?

int a = input_int(hi);int c = 1;if (a > 17) {

c = f(); // function f may affect state

}output_int(low, c);

Language-based information flow 23

Sound dynamic enforcement

• To soundly enforce information flow with dynamic tags:– Must track all memory locations that could have

been affected in either branch of a conditional expression

– Update the tags of those memory locations on every branch

• Extremely expensive– Worse: “efficient” implementations are

conservative: tag propagation makes too many locations hi

Language-based information flow 24

Static Analysis

• Uses static analysis (e.g. type systems) rather than dynamic enforcement

• Benefits:– No run-time cost– Have access to the program’s control-flow graph,

so they can approximate all runs of the program– Determine whether the program is secure before

running it

• Drawbacks:– No run-time information means approximation

Language-based information flow 25

End-to-end solution

• Rely on access control & encryption– Essential (authentication, untrusted

networks, etc.)

• But… also use language techniques:– Verify programs to validate information

flows that they contain

Language-based information flow 26

Benefits

• Explicit, fine-grained policies– Level of single variable if necessary– Bytecode or assembly level

• Program abstractions– Programmers can design custom policies

• Regulate end-to-end behavior– Information flow vs. access control

• Tools: increase confidence in security

Language-based information flow 27

Lattice model for policies

• Proposed by Denning ’76• Use a lattice of security labels– Higher in lattice is more “confidential” or

“secret”– Use for order relation– Use for join (least upper bound)– Use for meet (greatest lower bound)

• Prototypical example: low hi

Language-based information flow 28

Noninterference

• Private data does not interfere with network communication

• Baseline confidentiality policy

[Reynolds ’78, Goguen&Meseguer ’82,’84]

Language-based information flow 29

Noninterference

• Proved by:– Logical relations– Simulation techniques– Self composition techniques

Language-based information flow 30

External observation

• External behavior– Observations seen by someone “outside” the system– Outputs (i.e. strings printed to terminal)– Running time– Power or memory consumption– Comments– Variable names

• Very hard to regulate!– There is always some attack bellow the level of abstraction

you choose– But… attacks against external behavior tend to be difficult

to carry out and/or have low bandwidth

Language-based information flow 31

Internal observation

• Internal behavior– At the programming language level of

abstraction– Note that many “external observations”

can be internalized by enriching the language (e.g. add a clock)

• Observational equivalence (roughly):

veCveC

Cee

*][ iff *][

[] allfor iff

21

21

Language-based information flow 32

Observations

• Final output of the program– Pure, functional language

• Aliasing for pointers– Lambda calculus with state

• Thread scheduling decisions– Multithreaded languages with

state/message parsing

• Timing behavior

Language-based information flow 33

Low Equivalence

• Captures what a “low-security” observer can “see”

• Example: Suppose program states consist of pairs (hi, low)

Language-based information flow 34

Attack Models

• Low equivalence captures the power of the attacker– e.g. If the attacker can see all intermediate states

of the computation, then the observational model must distinguish programs that generate different traces

• It’s convenient to take the attacker to be a program context– The attacker operates at the same level of the

program– Any abstraction violation may lead to security

holes…

Language-based information flow 35

Security type systemsUsing security types to specify and enforce information flow policies

Language-based information flow 36

Type Systems

• Logical rules that determine valid operations on a programming language, regarding data-types

• Example:– In Java, String and int are “incompatible” types, i.e.,

variables of these types cannot be assigned to each other.

– However, short is a subtype of int. A short value can be assigned to an int value. The opposite can also happen, but with loss of precision.

– Classes are user defined types. Inheritance represents subtyping.

• Types form a lattice, pretty much like the security label lattice.

Language-based information flow 37

Types for Information Flow

• Basic idea: assign types that include security labels

• Use the type system to track the flow of information

• Static semantics– Label lattice lifted to a subtyping relation– “Standard” information-flow type system

• Each variable has a security type, i.e. a security label that works as a type

Examples of security types• Examples:

– string[] {hi} exam_questions;– double[] {low} marks;– int {Alice} x;

Language-based information flow 38

int {hi} H;int {low} L;int {Alice} A;H := L;L := H;L := A;

// OK// Not allowed!// Only allowed if Alice L

Language-based information flow 39

Simple state & implicit flows

• The previous example is extremely limited no loops

• Also, it does not treat implicit information flows

• For that, we need to keep track of security label state

• That is, create a PC security label

Language-based information flow 40

Simple state & implicit flows

Language-based information flow 41

Simple state & implicit flows

Language-based information flow 42

Function Calls

Language-based information flow 43

Function Calls

Language-based information flow 44

Polymorphism & Inference

• Add quantification over security labels– – Reuse code at multiple security levels

• Inference of security labels– Type system generates a set of lattice

inequalities– Equations have the form– Constraint of this form can be solved

efficiently

Language-based information flow 45

Polymorphism in Flow Caml

• Lists in Flow Caml• Base types parameterized by security

level bool{low} are written low bool• Type of lists also parameterized:

[Simonet & Pottier ’02,’03]

x1 : hi int[1;2;3;4] : (‘L int,‘M) list[x1;x1] : (hi int,‘L) list

Language-based information flow 46

Example: List Length

• Length does not depend on contents of the list:

let rec length l =match l with | [] -> 0 | _ :: tl -> 1 + length tl: (‘a,‘M) list -> ‘M int

Language-based information flow 47

Example: has0

• Lookup depends on both contents and structure of the list:

let rec has0 l =match l with | [] -> false | hd :: tl -> hd = 0 || has0 tl: (‘L int,‘L) list -> ‘L bool

Language-based information flow 48

Sums & Datatypes

• In general: destructors reveal information• Accuracy of information flow analysis is

important• Suppose x:bool{L1}, y:bool{L2}, z:bool{L3}

• What is label of i?

[Simonet CSFW’02]

type t = A | B | Clet v = if x then (if y then A else B) else (if z then A else C)let i = match v with | A | B -> 1 | C -> 0

Language-based information flow 49

More Effects

• Exceptions– Very important to track accurately– Related to sums– Uncaught exceptions can create implicit flows

• Termination & Timing– Is termination observable?– For practicality, we sometimes want to allow

termination channels– Timing behavior can be regulated by padding

(but is expensive!) [Agat ‘00]

Language-based information flow 50

Covert Channels

• Very hard to treat huge variety/different abstraction levels

• Channels related to I/O are more tractable via information flow policies– File locks

• However, not all are related to standard I/O:– Termination channels– Timing channels– Probabilistic channels

• Depends a lot on your computational model

Language-based information flow 51

Concurrency

• Threads can “observe” each other’s behavior• Information can leak through scheduler and through

synchronization mechanisms• Application of bisimulation & observational

equivalence• Application of information flow technology to

distributed systems• Some approaches:

– Extend noninterference for nondeterminism– Mechanisms to enforce secure thread scheduling– Information flow in distributed systems that incorporate

mutual distrust and arbitrary component failure

Language-based information flow 52

Concurrency

(thread1) (thread2)h := 0; h := h’;

l := h; …

Who runs first?

Another example:(if h = 1 then Clong else skip); l := 1; || l := 0;

Language-based information flow 53

Language-basedsecurity in practiceAn example over Java

Language-based information flow 54

Jif: Java + Information Flow

• Java: with some restrictions– Static fields initialized only with constants– No nested classes– No initializer blocks– No threads

• Policy language:– Principals, labels, authority– Principal hierarchy (delegation)– Confidentiality & integrity constraints– Robust declassification & endorsement– Language features (i.e. polymorphism)

• http://www.cs.cornell.edu/jif

[Myers, Nystrom, Zdancewic, Zheng]

Language-based information flow 55

Parameterized Classes

• Jif allows classes to be parameterized by labels and principals– Code reuse– e.g. Containers parameterized by labels

• Class MyClass[label L] {int{L} x;

}

Language-based information flow 56

Decentralized Labels

• Real world applications need discrete security labels other than just high and low

• Security labels form a lattice of security levels

• The example of Jif:– Simple component {owner: readers}

• {Alice: Bob, Eve}“Alice owns this data and she permits Bob & Eve to read it”

– Compound labels• {Alice: Charles; Bob: Charles}

“Alice & Bob own this data but only Charles can read it”

Language-based information flow 57

Decentralized Label Lattice

Language-based information flow 58

Integrity Constraints

• Specify who can write a piece of data– {Alice? Bob}

“Alice owns this data and she permits Bob to change it”

• Both kinds of constraints– {Alice: Bob; Alice?}

“Alice owns this data and she permits Bob to read it, but not change it”

Language-based information flow 59

Integrity/Confidentiality Duality

• Confidentiality policies constrain where data can flow to

• Integrity policies constrain where data can flow from

• Confidentiality: Public Secret• Integrity: Untainted

Tainted

Language-based information flow 60

Weak Integrity

• Integrity, if treated dually to confidentiality is weak– Guarantee about the source of data– No guarantee about the quality of data

• In practice, probably want stronger policies on data:– Data satisfies an invariant– Data only modified in appropriate ways by

permitted principals– More on that later…

Language-based information flow 61

Dynamic Policies

• Dynamic Principals– Identity of principals may change at run time– Policy may depend on identity– Requires authentication– Add a new primitive type principal

• Dynamic Labels– Policies for dynamic principals– May need to examine label dynamically– Add a new primitive type label

Language-based information flow 62

Interface to Outside World

• Should reflect OS file permissions into security types– Requires dynamic test of access control

• Legacy code is a problem– Interfaces need to be annotated with

labels that soundly approximate information flow

Language-based information flow 63

Unix cat in Jifpublic static void main{}(String{}[]{} args) {

String filename = args[0];final principal p = Runtime.user();final label lb;lb = new label{p:};Runtime[p] runtime = Runtime.getRuntime(p);FileInputStream{*lb} fis = runtime.openFileRead(filename, lb);InputStreamReader{*lb} reader =

new InputStreamReader{*lb}(fis);BufferedReader{*lb} br =

new BufferedReader{*lb}(reader);PrintStream{*lb} out = runtime.out();String line = br.readLine();while (line != null) {

out.println(line);line = br.readLine();

}}

Language-based information flow 64

Jif Applications

• Many small examples• Jif/split – distributed system extraction

– Myers, Zdancewic, Zheng, Chong• Jif Web Servlet – web applications in Jif

– Myers, Chong• Civitas – voting software

– Myers, Clarkson• Distributed Poker

– Sabelfeld et al.• JPMail

– McDaniel, Hicks et al.• More in progress… There is a Jif users mailing list

Language-based information flow 65

DeclassificationIt gets worse

Language-based information flow 66

Remember Weak Integrity?

• Integrity, if treated dually to confidentiality is weak– Guarantee about the source of data– No guarantee about the quality of data

• In practice, probably want stronger policies on data:– Data satisfies an invariant– Data only modified in appropriate ways by

permitted principals

Language-based information flow 67

Richer Security Policies

• More complex policies:“Alice will release her data to Bob, but only

after he has paid $10”

• Noninterference too restrictive– In practice programs do leak some information– Rate of information leakage too slow to matter– Justification lies outside the model (i.e.

cryptography)

Language-based information flow 68

Need to Declassify

• Usually:– a : secret ; b : secret

(c := a + b) : secret• But what if:– p : secret

k : secret(e := enck(p)) : ?

– passwd : secretcheck(passwd) : ?

– emp1.salary : secret(avg := (Σ empi.salary)/num_emps) : ?

Language-based information flow 69

Declassification in Jif

Language-based information flow 70

Declassification Problem

• Declassification is necessary & useful• …but, it breaks the noninterference

theorem– Like a downcast mechanism

• So, must constrain its use. How?– Arbitrary specifications too hard to check– Decentralized label model: Authority– Robust declassification– Subject of many, many research papers

Language-based information flow 71

Robust Declassification

Language-based information flow 72

Does it help?

• Intuitively appealing for programmers– But programmers are still trusted– Easy to implement

• Declassification doesn’t change the integrity level of a piece of data– Noninterference for integrity sublattice still holds– Weaker guarantee than needed?

• Could further refine auth(L’,L)– Restrict declassification to data with particular

integrity labels

Language-based information flow 73

Endorsement

• The integrity dual of declassification is called endorsement– Increases the integrity level of a value– Also an unsafe “downcast”

• Jif syntax: endorse(x,{Alice?})

• Decentralized Label Model:– Endorsing requires authority of the owner

Language-based information flow 74

Dimensions and Principles of Declassification

• How to prove that a declassification mechanism is good?– What is good?

• Dimensions of declassification: what, who, where, when• Principles for declassification:

– Semantic consistency: “The (in)security of a program is invariant under semantics-preserving transformations of declassification-free subprograms”

– Conservativity: “Security for programs with no declassification is equivalent to noninterference”

– Monotonicity of release: “Adding further declassifications to a secure program cannot render it insecure”

– Non-occlusion: “The presence of a declassification operation cannot mask other covert information leaks”

– Trailing attacks: “Appending an insecure program with fresh variables to a secure terminating program should not result in a secure program”

[Sabelfeld et al. CSFW’05]

Language-based information flow 75

Current state• Information-flow security is a promising application

domain for language technology• There are a lot of good results:

– Basic theory– Polymorphism & Inference– State & Effects– Implementations

• But more are needed• There is an excellent survey paper by Sabelfeld and

Myers:– Language-based Information-flow Security– JSAC 21(1) 2003– 147 references to other work

Language-based information flow 76

Roadmap

Language-based information flow 77

Low-level information flow analysis

• More difficult: control-flow structures of the program are not easily recognizable

top related