Effective multimodel anomaly detection using cooperative negotiation
Alberto Volpatto, Federico Maggi, Stefano ZaneroDEI, Politecnico di Milano
Anomaly detectionExample: detecting malicious HTTP messages
Anomaly detectionExample: detecting malicious HTTP messages
Dynamic web pageBad guy
Malicious HTTP Request
GET /login/id/<script>..</script>
Malicious HTTP Response
<script>iInfectPCs();</script>
HTTP Redirect
www.iSpreadMalware.org Bad guy's page
UnluckyClient
Malicious HTTP Response
/* Attack 3rd party plugin */
Anomaly detectionModeling non-malicious messages to find malicious ones
Clients
Webserver
Millions of good HTTP messages
Anomaly detectionModeling non-malicious messages to find malicious ones
Clients
Webserver
Millions of good HTTP messages
Learning phaseClient
Webserver
Models of good messages
M1 MnM2 M3
Learning phaseClient
Webserver
Models of good messages
M1 MnM2 M3
Learning phaseClient
Webserver
Models of good messages
M1 MnM2 M3
Learning phaseClient
Webserver
MnM1 M3M2
Example of models— parameter string length— numeric range— probabilistic grammar of strings— string character distribution
GET /page?uid=u44&p=14&do=delete
Learning phaseClient
Webserver
Models of good sessions
M1 MnM2 M3
C1
C3
C2M1
C7 C1
C3M2
C2C10 C7
Mn
GET /page?uid=u43&p=10&do=add
Learning phaseClient
Webserver
M1 MnM3M2
C1
C2
C3M1
C7 C1
C3M2
C2 C5C10
C3
C7Mn
C1
GET /page?uid=s10&do=add
Detection phaseClient
Webserver
Detection of bad messages
M1 MnM2 M3
GET /page?do=<script>MaliciousCode();
Issue
Anomaly value aggregation
• Not always well formalized (exceptions exist)
• Each model gives a partial anomaly value
• Issue: combining partial anomaly values is not trivial
• simple average (too simple)
• weighted average (how to set weights?)
• Bayesian networks (how to tune them easily?)
• etc. (ample literature on the subject)
Proposed approach
Proposed approach
• Models treated as autonomous agents
Proposed approach
• Models treated as autonomous agents
• Overall anomaly value negotiated iteratively through a mediator
New detection phase
. . .
Client
Webserver
Partial models (i.e., agents)
M1 MnM2
HTT
P re
ques
ts
Mediator
. . .
q
Partial anomaly values
. . .
pt1 pt2 pti ptn
Client
Webserver
Partial models (i.e., agents)
M1 MnM2
HTT
P re
ques
ts
Mediator
. . .
q
t
Anomaly value
. . .
atatat at
Client
Webserver
Partial models (i.e., agents)
M1 MnM2
HTT
P re
ques
ts
Mediator
. . .
q
t
Partial anomaly values
. . .
Client
Webserver
Partial models (i.e., agents)
M1 MnM2
HTT
P re
ques
ts
Mediator
. . .
q
pt+11 pt+1
2 pt+1i pt+1
n t+ 1
. . .
Until agreementAll partial anomaly values are equal
Overall anomaly value is selected
pt�
1 = pt�
2 = · · · = pt�
n
a = at�
Negotiation function
partial anomaly value of each agent at next iteration
anomaly value at current iteration
agreement coefficient for each agent
pt+1i = Fi(p
ti, a
t) = pti + αi(at − pti)
pt+1i
at
αi
Agreement coefficientαi = fα(wi) =
1
1 + eh(wi−k)
weights are the same trust levels of each agent (i.e., partial model)
are tuning parameters
wi
h, k
Agreement coefficient
• values close to one:
• change i-th offer
• values close to zero:
• preserve i-th offer
Negoziazione Cooperativa
• Il parametro !i è chiamato coefficiente d’accordo e indica la volontà dell’agente i di adattarsi alla contro-offerta del mediatore. ! se prossimo a 1 l’agente è propenso ad un accordo ! se prossimo a 0 l’agente non è intenzionato a
mutare l’offerta
• È calcolato sulla base del livello di trust ! Se prossimo a 1 la valutazione dell’agente è
altamente attendibile e non si vuole mutare la propria offerta
! Se prossimo a 0 la valutazione non è attendibile e l’offerta può essere tenuta meno in considerazione
• Possibili problemi: ! Comportamento dittatoriale ! Convergenza
αi1
1 + eh(wi−k)
Agreement function
anomaly value at current iteration
weights are trust levels of each agent (i.e., partial model)
at each iteration, the agreement is, e.g., the weighted average of the the anomaly value p across all the agents
wi
pti
at = f(pi, wi) =
�i p
tiwi�
i wi
at
Modification of the learning phase
• A trust level for each model is needed
• Typically, modern tools compute it during learning [Criscione et al., EC2ND 2010]
• Examples: standard deviation of string length model, kurtosis measure of integer models
Trust levelsconstant over time
• Two-way communication between agents and mediator is not needed
• The mediator would just receive the initial offers (i.e., partial anomaly values) and run the iterative algorithm
wti = wt+1
i = · · · = wi
Optimized learning
• Monitor the trust level of each model using a simple sliding window
• When the j-th learning sample comes in
• Stop learning when “stability” is reached
δW (j) := maxj∈W
wji − min
j∈Wwj
i
δW (j�) < ε
W
Stop learningwhen trust is “stable”
Auto-calibrazione – terminazione apprendimento
• La durata dell’apprendimento determina l’efficacia di rilevazione ! Se troppo lunga può generare overfitting ! se troppo breve può non consentire la costruzione di
un adeguato modello di normalità • Introduzione di un meccanismo automatico per la
terminazione dell’apprendimento ! Un modello m è considerato stabile se non vi sono più
significative variazioni del proprio livello di trust T() nelle ultime w osservazioni in
! Un Anomaly Engine è stabile per una classe d’evento se ogni modello che implementa è stabile
Evaluation
• Background traffic: clean UCSB iCTF 2008 data (22,051 HTTP requests-responses)
14,961 attack-free training samples
7,090 attack-free testing samples
• Attack data: instances of the most popular real-world attacks (SQL injections, JavaScript injections, command injections) inserted with random mutations
Impact of modifications
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
DR
FPR
Weighted averageCooperative negotiation
Cooperative negotiation and optimized learning
Limitations (1)i.e., future-works
1. strict convergence not formally proven
• but parameters influence detection quality only minimally, and predictably
• Mitigation: choose h and k to guarantee convergence
Convergence in practice
0
200
400
600
800
1000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Itera
tions
k
h = 2.5h = 5.0h = 7.5
h = 10.0h = 12.5h = 15.0h = 17.5h = 20.0h = 22.5h = 25.0
Convergence in practice
0
200
400
600
800
1000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Itera
tions
k
h = 2.5h = 5.0h = 7.5
h = 10.0h = 12.5h = 15.0h = 17.5h = 20.0h = 22.5h = 25.0
Convergence in practice
0
200
400
600
800
1000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Itera
tions
k
h = 2.5h = 5.0h = 7.5
h = 10.0h = 12.5h = 15.0h = 17.5h = 20.0h = 22.5h = 25.0
0
0.001
0.002
0.003
0.004
0.005
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
FPR
k
h = 2.5h = 5.0h = 7.5
h = 10.0
h, k versus FPR
0.94
0.945
0.95
0.955
0.96
0.965
0.97
0.975
0.98
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DR
k
h = 2.5h = 5.0h = 7.5
h = 10.0
h, k versus DR
Limitations (2)i.e., future-works
2. trust level independent from input
• possible concept drift not taken into account
• Mitigation: already addressed by other approaches [Maggi et al., RAID 2009]
Limitations (3)i.e., future-works
2. relax cooperative assumption
• important in case of distributed detections
• agents may cheat because:
• outdated training base (mitigation: [Robertson et al., NDSS 2010])
• intruders took over a detector
Conclusions
• very simple to implement
• distributed detection is “embedded” in the model
• meaning of weights is now defined
• easy to generalize to more complex schema
• within the scope of our preliminary evaluation, it works against real-world attacks
Questions?Alberto Volpatto, Federico Maggi, Stefano Zanero
[email protected]://home.dei.polimi.it/fmaggi/
http://www.vplab.elet.polimi.it