use of ai algorithms in design of web application security testing framework

Use of AI algorithms in design of Web Application Security Testing Framework

HITCON

Taipei 2006

Or a “non-monkey” approach to hacking web applications

By fyodor and meder

fygrave@o0o.nu meder@o0o.nu

“No. we are not writing another web scanner!!”

Agenda

Why hacking web applications What scanners do. Why they are useless (or not) What else could be done, but isn’t (yet) Introduction to YAWATT

User-session based approach Distributed Intelligent (or not?) Modular More than “application security scanner” coverage

This work background

STIF, STIF2 automation – agent-based cooperative automated hacking environment

http://o0o.nu/sec/STIF

So, why going for the web

They learnt to configure their firewalls They learnt to disable services they don’t

want They finally know how to use nmap (and

even nessus!!)

…. But they still want web And they can’t learn to code

So why web applications

Applications get complex Multilayered frameworks make it even more

fun Amount of web application based services

grow Number of web application programmers

increase (home brewed web applications)

but …

Web application remains a larger hole into one’s network Web application programmers skills aren’t

usually the best Firewalls are there – just to let you in application firewalls can stop limited number

of web application attacks, but are useless when it comes to detection of logical vulnerabilities

IDS systems aren’t smart enough to pick up on Application attacks

Scanners.. use of..

checking for enumeration ... YES checking for exectution ... YES checking if we can drop table YES checking if we can drop database .. YES .. CANNOT CONNECT TO APPLICATION

Scanners - summary

Nessus et all – don’t see web applications beyond the underlying software configuration

Libwhisker/nikto – signature based. Relatively primitive. Efficient for default bugs

Wikto/e-Or – session aware, coding flaws scanner

Kavado/Appscan/Webinspect/N-Stalker/Watchfire Appscan – intelligent scanners. Session aware. Closed “blackbox” (some allow scripted plugins)

Why scanners ain’t enough

Single-host based Commercial scanners are black-box (not

extendable, non-correctable) Little or no control on “hacking” process Not easily extendable on the fly with new

‘automation” modules Often primitive, signature based logic

What would we like to have

Maximum automation of web hacking process

Minimum of code writing. Autonomous functionality Knowledge transfer Ability to add ‘hacks’ on the fly Deal with uncertainty in “intelligent way” Learn from valid user session data

Other good things to have

Be able to test new class of bugs (i.e. session hijacking)

Be able to attack web application from multiple-locations (bypass IP restrictions, improve brute-forcing process)

Be able to automate testing of application logic bugs

Be able to make intelligent guesses

Introducing YAWATTmethod

User sessions

User sessions – collections of user request/response pairs (url, name/value pairs, session information and selective HTTP protocol data)

Classified user session data include semantic classification of URL, parameters, responses and HTTP protocol data (server type, backend system(s) if visible, “unusual” HTTP headers content)

User sessions

User session data can be obtained from: Proxy servers (burp, paros, ..) Web server logs Browser automation scripts (i.e. WATIR

framework) Spiders (burp)

Less code, more automation

Application content is learnt from user sessions (data feeders) Additional application information could be gathered by agent’s

plugins (i.e. directory splitting tests) User session data is classified by:

Semantic and functional classification of URL HTTP protocol classificators (server type, cookies ..) Session classificators Input data classification – type, semantics Output classification (application error detection, redirects,

“bogus’ responses etc) Test-case suites and executed in groups

Stateless tests Stateful tests Mixed

Classification process as new data arrives into the system

Go Intelligent

Main components: Web application components (URL) classification Semantic classification for web application input

data LSI based mapping and comparison of web content

In response analysers. Use of external search engines Limited “binary analysis” of downloaded files

(decoding pdf, doc, rtf (other formats later)

Knowledge Transfer to machine Possibility to create new classification rules

on the fly (and let the system re-learn from it) Possibility to ‘reclassify’ application

responses Possibility to add new ‘testing’ plugins on the

How is URL classification used

Vulnerability scenario testing – uses ‘classificators’ subscribtion mechanism.

For example: login page tester will need ‘login’, ‘executable’ and ‘session’

How does input data semantics identification happen

How the classified user session data is used

Additional research directions

Other ideas to work on: Detection of “hidden” parameters Identification of “hidden” urls Identification of “negative” and ‘positive”

responses Detection of application failures, redirects Evaluation and priority based execution for

plugins

A note on distributed architecture Cooperative Agents Infrastructure

Design cooperative agent system Multi-platform Portable

Distributed architecture

Distributed architecture (another look)

What distributed approach gives us: DDoS – EASY!!! Distributed brute-forcing. Bypassing IP based

restrictions, bandwidth limitations IDS – more tricks Bypass packet filtering restrictions

an agent behind the firewall!

Communication framework

Modified version of spread Robust Reliable message delivery Portable (windows/unix) Available in C/C++ and Java flavours. Bindings

exist for Python, Ruby!

In progress

Agents communicate with message Task distribution algorithms – in progress

use of ai algorithms in design of web application security testing framework

funamount of web application

web scanner

web applicationsapplications

application attacksscanners

session aware

home brewed web applications

session information

session hijackingbe

Documents

algorithms and simulation framework for residential demand...

ai maturity framework for enterprise applications

artificial intelligence framework€¦ · vodafone group...

a comparative study of record matching algorithms shirley...

modern game ai algorithms - uni-muenster.de · modern game...

comparison of searching algorithms in ai against human

infra - intelligence framework inc. ai. artificial

scaling learning algorithms towards ai - yann...

more common lisp algorithms for ai and nlp language...

ai applications in genetic algorithms

ai algorithms

6 metropolis{hastings algorithms - .net framework

a comparative study of record matching algorithms shirley...

ai - based framework for private cloud computing

communication-efﬁcient edge ai: algorithms and systems

adaz ai case study collection - .net framework

04 problem solving in ai - search algorithms

use of ai algorithms in design of web application security...

behavior selection algorithms - game ai pro selection...

ai-lab reference architecture framework