strata san jose 2016 - reduce false positives in security
TRANSCRIPT
Powerball Predictor
Photo Credit: Sean McGrath
Crystal ball tells me with 99% accuracy if a powerball prediction is a winner.
Powerball Predictor
Photo Credit: Sean McGrath
● ~300 million samples.● ~ 3 million false positives.● 1 true positive.
Powerball Predictor
Photo Credit: Sean McGrath
The overwhelming majority of tickets are not winners.
Failing to recognize this is falling victim to the base rate fallacy.
Security Crystal Ball
Photo Credit: Sean McGrath
The overwhelming majority of log entries and data points do not represent fraud and intrusions.
Failing to recognize this is falling victim to the base rate fallacy.
FRAUD Intrusion
Detection
System
Source: MXLabs
Base Rate Fallacy
Why False Positives?
Case Study: Outlier Detection
Using an outlier detection system to identify fraudsters within the environment.
For a set of generating mechanisms find the unusual ones.
Example Time Series
Photo credit SuperCar-RoadTrip.fr under Creative Commons Attribution 2.0
Change in the data over time in unforeseen ways.
Concept Drift
Solution: Feedback Loop
Explicit Feedback Loop
Photo credit Alan Levine under Creative Commons Attribution 2.0
Explicit Feedback Loop
Photo credit Alan Levine under Creative Commons Attribution 2.0
Implicit Feedback Loop
Fraud: Takeaways
- Concept Drift is a shift in behavior.- Feedback combats concept drift.- Implicit Feedback > Explicit Feedback
IDS: Anatomy of Successful Detection
Context: Security Analyst
Red team Kill Chain
Blue team Kill Chain
False positives: Lose Ability to Triage
Fact: You cannot salvage a false positive with Contextual Info or Visualization
What is a Successful detection?
Properties + Frameworks
Successful detection captures Adversary TTP from Sensor data ignoring Expected activity
Source: @MSwannMSFT
Properties of a Successful Detection
Adaptability
Credible
Interpretability
Actionable
Basic Advanced
Less Useful
More U
seful
Sophistication of Algorithms
Usefulness of A
lerts
Secu
rity
Dom
ain
Kno
wle
dge
Framework for a Successful detection
Basic Advanced
Less Useful
More U
seful
Sophistication of Algorithms
Usefulness of A
lerts
Secu
rity
Dom
ain
Kno
wle
dge
Outlier
Basic Advanced
Less Useful
More U
seful
Sophistication of Algorithms
Usefulness of A
lerts
Secu
rity
Dom
ain
Kno
wle
dge
Outlier
Anomaly
Increase Complexity
Basic Advanced
Less Useful
More U
seful
Sophistication of Algorithms
Usefulness of A
lerts
Secu
rity
Dom
ain
Kno
wle
dge
Outlier
AnomalyIncrease Complexity
Security InterestingAlerts
Incr
e ase
Dom
ain
Kno
wle
dgeSuccessful
Detections incorporate Domain Knowledge Alerts
How to encode Domain Knowledge: Embrace Rules
• Business Heuristics to filter out the “Security interesting anomalies”
• Rules can take many forms: •TI feeds •IOCs, IOAs•TTPs
• Rules are awesome • Credible, Interpretable, Adaptable (to some
extent), Actionable!• Highest Precision • Highest Recall
Three ways to combine ML and Rules
Three Ways to combine Rules and ML 1.Above Machine Learning Systems
a.Business Heuristics to filter alerts i. “For account _foo_, only raise sev 2 alerts until March 28th, 2016”,
Work by Dan Mace et. al, Microsoft
2. Below Machine Learning Systemsa. Featurizations - “If IP address present in List of malicious IP dataset, flag 1”b. Utilizes Threat Intel feeds (Cymru, Virus total, FireEye)
3: Combining Rules and Machine Learning together using Markov Logic Networks
Initial Ideas given by Vinod Nair, MSR
Intuition
•Rules alone place a set of hard constraintson the set of possible worlds•Let’s make them soft constraints:When a world violates a formula,It becomes less probable, not impossible•Give each formula a weight(Higher weight ⇒ Stronger constraint)
Source: Lectures by Pedro Domingos
Interactive logons from service accounts causes attack
Similar service accounts tend to have similar logon behavior
Example: Service Accounts
Domain Knowledge
Example: Service Accounts
Encode as First Order Logic
Example: Service Accounts
1.5
1.1
Example: Service Accounts
AssociateEach Rule With the Learned Weight
Example: Service Accounts
1.5
1.1
Attack(A)
InteractiveLogon(A)
InteractiveLogon(B)
Attack(B)
Example: Service Accounts
Consider two service accounts: A,B
Example: Service Accounts
1.5
1.1
Attack(A)
InteractiveLogon(A)
InteractiveLogon(B)
Attack(B)Similar(A,
B)
Similar(B,A)
Similar(A,A)
Similar(B,B)
Example: Service Accounts
1.5
1.1
Attack(A)
InteractiveLogon(A)
InteractiveLogon(B)
Attack(B)Similar(A,
B)
Similar(B,A)
Similar(A,A)
Similar(B,B)
Example: Service Accounts
1.5
1.1
Attack(A)
InteractiveLogon(A)
InteractiveLogon(B)
Attack(B)Similar(A,
B)
Similar(B,A)
Similar(A,A)
Similar(B,B)
•How to learn the structure? •Begin with hand-coded rules•Use Inductive Logic Programming, but need to infer arbitrary clause
•How to learn the weights? •For generative learning, depend on pseudolikelihood
•Checkout Alchemy -- http://alchemy.cs.washington.edu/
Call for Action - After the conference • One Week
•Review •@CodyRioux - IPython Notebook•@Ram_ssk - Follow Up material
•Think comprehensively about Rules
• One Month •Ask your data scientists to literature review section
•Implement the rules on TOP of ML systems
• One quarter•Implement a feedback system to capture training data
•Implement all TI feeds within an ML System
•Play with Alchemy
Literature● The Base-Rate Fallacy and its Implications for the Difficulty of Intrusion Detection
(Alexsson, 1999)
● Enhancing Performance Prediction Robustness by Combining Analytical Modeling
and Machine Learning (Didona et al., 2015)
● Richardson, Matthew, and Pedro Domingos. "Markov logic networks."Machine
learning 62.1-2 (2006): 107-136.