behind an application firewall, are we safe from sql injection attacks?

29
.lu software verification & validation V V S Behind an Application Firewall, Are We Safe from SQL Injection Attacks? Dennis Appelt , Cu D. Nguyen, Lionel Briand

Upload: lionel-briand

Post on 21-Jul-2015

48 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

.lusoftware verification & validationVVS

Behind an Application Firewall, Are We Safe from SQL Injection Attacks?

Dennis Appelt, Cu D. Nguyen, Lionel Briand

Page 2: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

• A Web Application Firewall (WAF) is the first layer of defense

• Stops attacks before they reach (vulnerable) applications

Onion Defense Paradigm

2

Page 3: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Problem Statement

• Ensuring that a WAF can reliably identify attacks is critical for protecting IT infrastructures

• Configuring and maintaining a WAF is difficult and error-prone •  False positives: By default, WAF rule sets are strict. This results in

legit requests being classified as attacks.

•  False negatives: Tailoring a WAF rule set to a specific application & attack types often relaxes the rule set too much.

3

Page 4: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Problem Statement

4

(?i:(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|users?|ascii))|m(?:sys(?:(?:queri|ac)e|relationship|column|object)s|ysql\.(db|user))|c(?:onstraint_type|harindex)|waitfor\b\W*?\bdelay|attnotnull)\b|(?:locate|instr)\W+\()|\@\@spid\b)|\b(?:(?:s(?:ys(?:(?:(?:process|tabl)e|filegroup|object)s|c(?:o(?:nstraint|lumn)s|at)|dba|ibm)|ubstr(?:ing)?)|user_(?:(?:(?:constrain|objec)t|tab(?:_column|le)|ind_column|user)s|password|group)|a(?:tt(?:rel|typ)id|ll_objects)|object_(?:(?:nam|typ)e|id)|pg_(?:attribute|class)|column_(?:name|id)|xtype\W+\bchar|mb_users|rownum)\b|t(?:able_name\b|extpos\W+\()))!

Page 5: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

5

Bypass Testing of WAFs

Page 6: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Approach

6

Page 7: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

7

Space of SQL Injection attacks

? •  Large input space"

•  Exhaustive Search infeasible"

•  Random search ineffective"

•  Difficult to guide the search

Page 8: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

How to guide the test generation towards bypassing the WAF?"

8

Page 9: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Generate new test cases by learning from previously executed test cases!

9

Page 10: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Learning from test cases

10

Attack String Blocked? “ Union Select 1 From all_tables Yes

“ AND false # Yes

1 OR/**/”a”=“a” OR No

“ Union/**/Select 1 From all_tables ?

“ AND false OR ?

Page 11: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

SQL Injection Grammar

• Attacks are generated from a context-free grammar

• Each attack has a derivation tree 11

-- ⌅‘0’ ⌅ hwspi hbooleanAttacki hwspi⌃ ‘)’ hwspi hbooleanAttacki hwspi ‘OR’ ‘(’ ‘0’ ⇧⌃ ⌅⌃ ‘)’ ⇧⌥ hwspi ⌅ hunionAttacki⌃ hpiggyAttacki ⇧⌃ hbooleanAttacki ⇧⌥ hcmti ⇧

⌃ ’ ⌅ hwspi hbooleanAttacki hwspi ‘OR’ ’⌃ ‘)’ hbooleanAttacki hwspi ‘OR’ ‘(’ ’ ⇧⌃⌅⌃ ‘)’ ⇧⌥ hwspi ⌅ hunionAttacki⌃ hpiggyAttacki ⇧⌃ hbooleanAttacki ⇧⌥ hcmti ⇧

⌥ ⇧

⌃ ” ⌅ hwspi hbooleanAttacki hwspi ‘OR’ ”⌃ ‘)’ hbooleanAttacki hwspi ‘OR’ ‘(’ ” ⇧⌃⌅⌃ ‘)’ ⇧⌥ hwspi ⌅ hunionAttacki⌃ hpiggyAttacki ⇧⌃ hbooleanAttacki ⇧⌥ hcmti ⇧

⌥ ⇧

⌥ -�

Fig. 1. The syntax diagram of the proposed grammar.

protected by a WAF and are labelled as “P” or “B” dependingwhether they bypass or are blocked by the WAF, accordingly.We encode and make use of these test results as initial trainingdata to learn a model predicting the likelihood (f ) with whichtests can bypass the WAF. Using this measure we can rank,select, and mutate tests with high predicted f values to producenew tests, with hopefully even higher bypassing probabilities.These new tests are then executed and their results (“P” or“B”) are in turn used to feed a machine learning algorithmand improve the prediction model, which will in turn helpgenerating more tests that bypass the WAF.

Our approach is inspired by genetic programming andsearch-based test generation [5], [12], [1]. We face the problemto efficiently choose from a large set of SQLi attacks the onesthat are more likely to reveal holes in the WAF. The problemis challenging because there is little information available tocalculate how close a test comes to bypassing the WAF. Whena test is executed only one of the following two events canbe observed: bypassing, or blocked. This leaves the searchwith no guidance to effectively assess how close a blockedattack is from bypassing the WAF. To tackle the problem,we use machine learning to model how the elements (featuresof attacks) of the tests are associated with high likelihoodsof bypassing the WAF. In the search process, tests that arepredicted to have such high likelihood are considered to havea high fitness and are likely candidates for mutation.

In what follows, we will discuss in detail how tests are de-composed and encoded for machine learning and the mutationprocess, which we use to generate new SQLi attacks. Finally,we describe our overall ML-driven test generation approachthat aims at iteratively finding new and effective attacks.

1) Test Decomposition: From the defined grammar, we canderive tests by recursively applying production rules, startingwith the <ATTACK> rule. A derivation tree (also calledas parse tree) of a test is a graphical representation of thederivation steps that are involved in producing the test. In aderivation tree, an intermediate node presents a non-terminalsymbol, a leave node represents a terminal one, and edgesare derivations. Figure 2 depicts the derivation tree of theBOOLEAN attack test: ’ OR“a”=“a”--. In the course ofgenerating this test, we first apply the <ATTACK> rule:

<START>

<sQuoteContext>

<squote> <wsp> <sqliAttack> <cmt>

<booleanAttack>

<orAttack>

<opOr> <booleanTrueExpr>

<binaryTrue>

‘ ␣ - -

OR

<dquote> <char> <dquote> <opEqual> <dquote> <char> <dquote>

=“ ” “ ”a a

Fig. 2. The derivation tree of the “boolean” SQLi attack: ’ OR“a”=“a”--.

hATTACKi ::= hnumericContexti| hsQuoteContexti| hdQuoteContexti ;

and derive <sQuoteContext>. We then apply the third rule ofthe grammar to derive <squote>, <wsp>, <sqliAttack>, and<cmt>. This procedure is repeated until all the leave nodesare terminal symbols.

We make use of derivation trees to identify which parts ofa SQLi attack are likely to be responsible for the attack beingblocked or passing. Specifically, for each test, we decomposeits derivation tree into slices, which are defined as follows:

Definition 1 (Slice). A slice s of a derivation tree T is a sub-tree of T such that the root of s is a non-terminal node of T ,except those that represent <ATTACK>, <numericContext>,<sQuoteContext>, and <dQuoteContext>.

We skip the start symbol and its children because sub-trees extracted from them will be (closely) equivalent to theoriginal derivation tree. Such decompositions provide no orlittle information as to why a test is blocked or bypassing.

Page 12: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Slicing

12

<START>

<sQuoteContext>

<squote> <wsp> <sqliAttack> <cmt>

<booleanAttack>

<orAttack>

<opOr> <booleanTrueExpr>

<binaryTrue>

‘ ␣ #

OR

<dquote> <char> <dquote> <opEqual> <dquote> <char> <dquote>

=“ ” “ ”a a

<squote> <wsp> <sqliAttack> <cmt>

<booleanAttack>

<orAttack>

<opOr> <booleanTrueExpr>

<binaryTrue>

‘ ␣ #

OR

<dquote> <char> <dquote> <opEqual> <dquote> <char> <dquote>

=“ ” “ ”a a

s1 s2 s4s3

‘ OR”a”=“a”# S = {‘, ,OR”a”=“a”,#}

Page 13: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Learning Attack Patterns

13

S1 S2 … Sn Class

A1 1 0 … 1 Blocked

A2 1 1 … 0 Passing

… … … … … …

Am 0 1 … 0 Blocked

•  Each attack becomes one observation in the training data •  An observation indicates out of which slices an attack consists •  From the training data a decision tree is built •  The decision tree predicts how close a test case comes

Page 14: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Decision Tree

•  Decision tree groups attacks based on the presence/absence of slices •  Benefits of using a decision tree

•  Interpretable •  Performance

14

S3

S5

0 1

S1

0S2

S4

1

Blocked Passed

0 1

Page 15: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Guiding Test Generation

• Amongst the training data, select the attacks that are most likely to pass for mutation.

•  Breadth-first: Select many attacks and mutate each attacks only a few times.

•  Depth-first: Select few attacks and mutate each attack many times.

• The structure of the decision tree is exploited to generate new attacks.

15

Page 16: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Exploiting the Decision Tree

• Attack A = <S1, S3, S6>

•  Path Condition P = S3 ∧ ¬S5 ∧ S1

• Mutate A so that the mutants satisfy P 16

S3

S5

0 1

S1

0S2

S4

1

Blocked Passed

0 1

Page 17: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Mutation

17

A = <S1, S3, S6> P = S3 ∧ ¬S5 ∧ S1

M1 = <> M1 = <S1,> M1 = <S1, S3, S11> M1 = <S1, S3,>

S7 è … | S6 | S11 | S9 | S5 | …

Production Rule for S6

Page 18: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Iterative Learning

18

Prepare Training

Data

Build Classifier

Mutate best

attacks

Slice attacks

Page 19: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Evaluation

19

Page 20: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Test subject

20

Page 21: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Research Question 1

How does the decision tree improve over training iterations in terms of F-measure?

21

Page 22: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Improvement of the Classifier

•  F-Measure improves constantly

•  In later iterations the improvement decreases

22

Page 23: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Research Question 2

Among ML-Driven breadth-first, ML-Driven depth-first and RAN, which one yields better performance in terms of the

number of bypassing tests (Dt ) and efficiency?

23

Page 24: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Number of Bypassing Tests

• ML-B and ML-D outperform RAN

• ML-D better in the beginning (< 75 Min.)

• ML-B better in the later stage (> 75 Min.)

24

Page 25: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Efficiency

•  Over time it becomes harder to find more bypassing tests.

•  Efficiency for RAN steadily decreases.

•  Efficiency for ML-D and ML-B increases in the first hour; then decreases.

25

Page 26: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Summary

26

Page 27: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

27

Page 28: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Research Question 1

What is the best choice for parameter K for the RandomTree algorithm to generate a good classifier in terms of F-measure

and Msize?

28

Page 29: Behind an Application Firewall, Are we Safe from SQL Injection Attacks?

Evaluation of parameter K

Selecting K:

•  Tradeoff computation time óF-measure/Msize

•  K ~ 40% reasonable compromise

29