kind of big data in info sec
TRANSCRIPT
(Kind of) Big Data in Information SecurityBen Finke
May 2015
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Who is this guy?
So glad you asked!
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Security Architect
Director of Security
Operations
Defender
Bonafides• Security team at Enterprise Integration
• Bachelor’s Degree in Computer Science
• US Air Force – Communications Officer
• Multiple Industry Certifications
• 11 years of experience in information security industry (yikes!)
• Supporting 26 customers – well over 40K users and 75K systems
• Do a lot of what we are going to discuss everyday!
• Would probably do this even if I wasn’t paid….
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Ben Finke
@benfinke
[email protected] [email protected]
https://www.linkedin.com/pub/ben-finke/3/95a/8a1
blog.eiblackops.comblog.benfinke.com Be
n Fi
nke
-Sec
urin
g th
e Cau
se-@
benf
inke
A brief primer – Information Security• Security is unique to every person and every business
• Confidentiality
• Integrity
• Availability
• Compliance versus Defense
• What is a security event?
• Prevention and Detection
• What is the real target for attackers? Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
So are attackers the big concern?Yes and No
Any downtime or loss of data can be a security event. However, the vast majority of the problems we will be focusing on today will involve a third party who wants something we have.
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Who are these attackers anyways?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Diff
icul
ty to
Def
end
Capabilities and Resources
Script Kiddie
Militia/Terrorist Groups
Hacktivists
Corporate Espionage
Organized Crime
Nation States
Botnets• Organized crime groups have large and complex ecosystems, one of which is
the creation and maintenance of huge botnets – compromised computers that can be leveraged as part of attacks or scams.
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
http://www.spamhaus.org/news/article/720/spamhaus-botnet-summary-2014 Waledac Botnet - http://blogs.microsoft.com/blog/2010/02/24/cracking-down-on-botnets/
Stuxnet and its legacyStuxnet targeted very specific Programmable Logic Controllers (PLC)
It was one of the most complex pieces of malware ever written.
There have been follow-on variants similar. But Stuxnet was the first to be viewed by most researches as a weapon created by a Nation State.
Commercial cybercrime operators are undoubtedly studying this incredible engineering to improve their own product….
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
http://spectrum.ieee.org/telecom/security/the-real-story-of-stuxnet
We haven’t been gaining ground
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Veri
zon
Dat
a Br
each
Inve
stig
atio
n re
port
–20
15 -
http
://w
ww
.ver
izon
ente
rpri
se.co
m/D
BIR/
2015
/
But how could this be?• How many Billions of $ are spent on security every year?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Source: http://www.wsj.com/articles/financial-firms-bolster-cybersecurity-budgets-1416182536
So it’s not $... What’s happening?• We’ve lost the defender’s advantage..
� Organizations don’t know the terrain� Maintaining operations in a secure state takes work� We’ve been betrayed by our own information systems….
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
The ugly truth about modern IT systems“Unfortunately, modern computing and communications technologies, for all their benefits, are also notoriously vulnerable to attack by criminals and hostile state actors.”…..
“It is a regrettable (and yet time-tested) paradox that our digital systems have largely become more vulnerable over time, even as almost every other aspect of the technology has (often wildly) improved”…..
“Modern digital systems are so vulnerable for a simple reason: computer science does not yet know how to build complex, large-scale software that has reliably correct behavior. This problem has been known, and has been a central focus of computing research, since the dawn of programmable computing. As new technology allows us to build larger and more complex systems (and to connect them together over the Internet), the problem of software correctness becomes exponentially more difficult.”
Matt Blaze
April 2015
http://www.crypto.com/papers/governmentreform-blaze2015.pdf
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Anyone recognize these?We’ve started seeing the designer vulnerability – with great marketing and all!
These vulnerabilities take advantage of underlying software that so many other systems are built on, causing panic and confusion about what is actually vulnerable. Worse, many “appliances” leverage vulnerable versions, and patches and upgrades can take months….
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Heartbleed Shellshock Poodle Ghost Venom
But wait, there’s more!I don’t really know how to put this, so I’m just going to put it….
Verizon 2015 Data Breach Investigations Report
http://www.verizonenterprise.com/DBIR/2015/
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Bruce Schneier• Pioneer in information security…
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
“I am regularly asked what the average Internet user can do to ensure his security. My first answer is usually “Nothing, you’re screwed’.”
Bruce’s quote notwithstanding…We have a situation where we know the building blocks of our IT systems will continue to go through the research-vuln-patch-fix cycle.
What this really means is that pure prevention is impossible. We simply can not prevent our IT systems from from being compromised without fail.
We need to focus on developing resilient systems where detection and correction are enhanced.
And so we began collecting logs….
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
A lot of logs…
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Big Data in Security• Logs from all kinds of different systems
� Authentication Logs� Firewall Activity� Network Devices� Windows Event Logs� Linux Logs� Web Server Logs� Anti-Malware Logs� Web Proxy Logs� Email Security Logs� Web Application Firewalls logs� Database activity� Cloud Services*
• Other data we can apply….
*Notoriously difficult and not usually timely
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Logs
Netflow
Network Security Monitoring
Derived Data
Security Testing Data
Context
ContextInformation about the environment that a human being would know or infer.
• Critical Systems List
• Admin-level accounts
• Location (Of person and device)
• Hardware Inventory
• Software Inventory
• Internet Facing?
• Open Tickets
• Change Controls
• System history
• Network Location
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Logs
Netflow
Network Security Monitoring
Derived Data
Security Testing Data
Context
What can context do?• Firewall log event:
May 22 14:02:51 172.21.250.1 %ASA-6-302013: Built inbound TCP connection 237062557 for outside:58.71.107.127/44975 (58.71.107.127/44975) to dmz:192.168.250.130/443 (74.129.196.130/443)
• Context:
Source IP Country – China
Reputation Score – Reported Spam and Web Login Brute Forcing
DShield Listing – Active Attacker list
Associated IOCs - ….
Previous Communications …
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Security Testing DataIncorporate the information we generate during security testing efforts.
• Vulnerability scans
• Web Application security assessments� What kinds of requests would indicate attack
activity?
• Bridge between network segments?� Attackers look to pivot to gain access
• Identified services and ports� Something new show up?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Logs
Netflow
Network Security Monitoring
Derived Data
Security Testing Data
Context
Security Testing Data• Firewall log event:
May 22 14:02:51 172.21.250.1 %ASA-6-302013: Built inbound TCP connection 237062557 for outside:58.71.107.127/44975 (58.71.107.127/44975) to dmz:192.168.250.130/443 (74.129.196.130/443)
• Vuln Testing Details:
Target server is Windows 2012R2 running IIS
Critical Apps – Yes (SharePoint)
Vulnerability Status – 0 critical, 0 highs, 1 medium, 2 low
Behind Web Application Firewall – Yes
Part of HA – Yes
Mapped DB instances - ……. Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Derived DataInformation about a system that needs to be generated by a script or action.
• Netstat� What is this system talking to?
• Running processes� Everything we expect to see and nothing more?
• Logged On Users� Active Window and Idle time
• State of defensive components� Status of AV and HIPS?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Logs
Netflow
Network Security Monitoring
Derived Data
Security Testing Data
Context
Network Security MonitoringUsing purpose built platforms to analyze everything passing by on the network.
• Passive Endpoint Detection
• SNMP and Syslog activity
• Encrypted traffic analysis
• PKI Certificates in use
• Traffic matching IDS signatures
• Packet Captures
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Logs
Netflow
Network Security Monitoring
Derived Data
Security Testing Data
Context
NetflowSummarizing all network communications between systems.
• Allows lengthy retention
• Easy baselining of network activity
• Capture utilization statistics
• Identify new traffic patterns
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Logs
Netflow
Network Security Monitoring
Derived Data
Security Testing Data
Context
A good start..Hooray for logs! We certainly have a lot of data (at least we thought we did)
Average size network (~1000 users) = 100 GB/day **
It quickly became evident that a new practice was necessary for information security teams – what are we going to do with all of this data?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Big Data Platforms in InfoSec• ELK – Elastic, Logstash, Kibana
• ELSA
• Greylog2
• Splunk
• Commercial SIEMs
• Lots of custom Hadoop lashups
• Only as good as the analysts who take care of them
• Lack of good tools to build predictive models
• Lack of good tools to build useful visualizations
• Lack of good integration into the overall defensive systems Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Capability Example - SplunkCollects structured or unstructured data
Field Extractions
Statistical Tools built into search interface
Visualization engine
Software that scales nicely on commodity hardware
Writing “apps” for Splunk
Connect to DBs and Hadoop Clusters (and some MongoDB goodness too!)
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Splunk Architecture
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
So what can we achieve with these?• Correlation
• Context
• Log Shipping – the art of collecting logs from critical systems and delivering them to the log management collectors, as close to real time as technically feasible.
• Real-time logging = Real Time alerting, forensics, and statistics
• Batch Logging = Forensics and statistics, with relative alerting
• Oh yeah, and you may hear the phrase “Kill Chain”….
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
The Kill Chain
Source: SecureState - http://blog.securestate.com/open-source-threat-intelligence-sony-breach/
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Example
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
User logs into the VPN• IP Address = IP X• User identity = UID• User location• Create a session for UID
Business Application reports login for “admin” from IP X• IP X is tied to VPN session• We map true person to account• Verify against user traffic profile
Intranet Site reports traffic from IP X• IP X is tied to VPN session• Unauth activity is attributed• Transaction is added to user session
NSM reports activity by IP• Active Device Fingerprinting• Application identification• Malicious activity detection
Remote Access and External ServicesA quick word problem: If a user logs in from Jacksonville, FL at 9 AM and Chicago, IL at 1030 AM, is it possible this is the same actual person?
What about Jacksonville at 8 AM and Amsterdam at 11 AM (UTC)?
Using a haversine function we can tell the distance between two geolocations
We can then use the distance and time difference to determine if a given action is likely to actually belong to the correct person.
Chart this both by biggest deltas in distance and in required speed.
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Example 2
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
User logs into local network• IP Address = IP X• User identity = UID• Device in use• Create a session for UID
Business Application reports login for “admin” from IP X• IP X is tied to local session• We map true person to account• Verify against user traffic profile
Intranet Site reports traffic from IP X• IP X is tied to local session• Unauth activity is attributed• Transaction is added to user session
Wireless network activity• Auth network by UID• Tie multiple devices to UID
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
So how well is that working?
SANS 2012 Survey - http://www.sans.org/reading-room/whitepapers/analyst/eighth-annual-2012-log-event-management-survey-results-sorting-noise-35230
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
The birth of Threat IntelThese tools now enable us to find and start blocking attack activity
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Attack
Logs Sent to SIEM
Analyst sees attack, enables block via defenses
This happens all day, every dayBlue teams generate tons of data on their own about attackers
IP Addresses
Domain Names
Email Subject Lines
Malware Behavior
The natural question: How can we get access to all of this data that others are collecting, and share what we see?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Attack Indicators
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Threat Intelligence!Because of course we want to be smart about it!
Various formats and protocols emerge to share this info
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Not to mention commercial offerings
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Surprisingly, its not perfect!“Why did this domain get listed as malicious again?”
“This list has over 2 million IP addresses in it!”
“So that breach we just had…. None of those IPs or domains were in our threat lists…”
“We added that block list to the firewall, but now the config file is bigger than the device can handle…”
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Problems• The only place to put all this stuff is in the SIEM
• Almost everything is entirely reactive (AV Signatures)
• Threat “Intel” can create lots of noise for the humans
• Threat Intel sources are (almost always) very expensive, even for large companies
• Loss of context for why a thing is bad
• False Positives and Botnets (your Mom’s PC, probably)
• Threat Intel sources suffer from a numbers problem…
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
What can we do?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
MLSec Project• Machine Learning Security Project
• Provides research and tools to help organizations understand how effective this threat intel is, and how they can leverage machine learning and predictive models into their information security operations.
• http://www2.mlsecproject.org/
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
MLSec Projects• Combine
� Python program to harvest intel feeds from various sources
• SecRepo� Repo of data samples to assist during development and testing of security
integrations with machine learning and predictive models….
• TIQ Test� Statistical comparison of threat intel data – provides visual output!
• Thanks to Alex for all the help!
• TIQ Test was featured in the 2015 Verizon DBIR report. How did they do?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Lots of overlap, right?
• Nope. Hardly at all.
• 97% of intel was unique
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Blue Team Nirvana• Human analysts training an army of machine learning robots
• Scale is met by the blue team robots
• Humans do the creative stuff
• Real-time sharing of threat indicators for later use (context)
• Automating reactions to detected threats
• Distributed early warning systems� Honeypots� Sandboxes� Network Security Monitoring
The end goal is to have machines handle all events and research, and only present data to humans to have a decision made.
Over time the machines learn and act just like the human analysts.
Free the humans to do what humans do best!
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Machine Learning/Predictive Models• Behavior Anomalies
� Has this user ever logged into this application before?
• Network Traffic Anomaly Detection� “Did this ever talk to that before?” and “Does traffic volume from each system look
right?”
• Incident Response Automation� Can this machine be reliably cleaned by our tools and techniques?
• Obvious Attack Blocking� That http request looks like a RFI attack against PHP, we run .Net – Block
• Reviewing possible security events� Float the really interesting stuff up to the humans
• But that’s sort of obvious stuff that lots of folks are trying (which will be awesome!!) Bi
g D
ata
in In
foSe
c -Be
n Fi
nke
-@be
nfin
ke
Is that website likely to be hacked?
https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-soska.pdf Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Is that software vulnerable?VDiscover – Improving binary software vulnerability detection through ML
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
http://www.vdiscover.org/
Identify Users at Risk• We’ve been developing a scoring system that ranks the most at risk users.
• Considers dozens of metrics, including:
• Email Activity (inbound # of domains, outbound, etc.)
• Web Activity
• Authentication Activity
• Incident Tickets
• Phishing Exercise performance
• Endpoint systems used
• Access to critical or sensitive systems
• Wireless networks configured
• Findability (how much information is available online)
• Position within the organization
• And more!
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Derived from historical review and modeling.
Predictive Models for Pentesters
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Security tests are really useful for simulating a specific problem, especially an attacker attempting to gain access to critical systems.
Usually these tests function under sever time and resource constraints
Let’s use machine learning and predictive modeling to make our assessments more effective!
Considers factors like
• Pivot Capabilities
• Vulnerability Likelihood
• System role
• Used by admins
• Users most vulnerable to SE attacks
• Discovering relationships between systems and components
Novel approaches to applying ML/PM• “Just in Time” Context for events (Team Cymru, Internet research, etc.)
• Improving Security Testing outcomes (PM for Pentesters!!)
• Building a “Phish” score for customers
• Using customer metadata as a signature
• Using Machine Learning to score the security of a “gold image”
• Building predictive models from early warning systems (honeypots)
• Using Predictive Models to block external sources based on ticket data
• Machine Learning to “shadow” and analyst and emulate (to scale!)
• Using Predictive Models to score system vulnerability levels Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
And lots more on the way…Organizations that position themselves to utilize their existing log management tools will be able to take advantage of the coming wave of machine learning and predictive models. This will enable rapid sharing and implementation of threat intelligence as well.
Without a good foundation, these tools will simply provide more noise and work. While every organization is anxious to leverage these, you need to answer these questions first:
• Do we have a complete inventory of all the devices on our networks?
• Do we know the security posture of those systems?
• Do we have a Single Point of Truth that we trust?
• Do we have the appropriate information from our critical systems collected by our log management system?
• Do we have baseline profiles for our users and our critical applications?
• Do we have defined incident response procedures?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Thank you! Any questions?
Big
Dat
a in
Info
Sec -
Ben
Fink
e -@
benf
inke
Ben Finke
@benfinke
[email protected] [email protected]
https://www.linkedin.com/pub/ben-finke/3/95a/8a1
blog.eiblackops.comblog.benfinke.com Be
n Fi
nke
-Sec
urin
g th
e Cau
se-@
benf
inke