kind of big data in info sec

(Kind of) Big Data in Information SecurityBen Finke

May 2015

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Who is this guy?

So glad you asked!

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Security Architect

Director of Security

Operations

Defender

Bonafides• Security team at Enterprise Integration

• Bachelor’s Degree in Computer Science

• US Air Force – Communications Officer

• Multiple Industry Certifications

• 11 years of experience in information security industry (yikes!)

• Supporting 26 customers – well over 40K users and 75K systems

• Do a lot of what we are going to discuss everyday!

• Would probably do this even if I wasn’t paid….

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Ben Finke

@benfinke

[email protected] [email protected]

https://www.linkedin.com/pub/ben-finke/3/95a/8a1

blog.eiblackops.comblog.benfinke.com Be

n Fi

nke

-Sec

urin

g th

e Cau

se-@

benf

inke

A brief primer – Information Security• Security is unique to every person and every business

• Confidentiality

• Integrity

• Availability

• Compliance versus Defense

• What is a security event?

• Prevention and Detection

• What is the real target for attackers? Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

So are attackers the big concern?Yes and No

Any downtime or loss of data can be a security event. However, the vast majority of the problems we will be focusing on today will involve a third party who wants something we have.

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Who are these attackers anyways?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Diff

icul

ty to

Def

end

Capabilities and Resources

Script Kiddie

Militia/Terrorist Groups

Hacktivists

Corporate Espionage

Organized Crime

Nation States

Botnets• Organized crime groups have large and complex ecosystems, one of which is

the creation and maintenance of huge botnets – compromised computers that can be leveraged as part of attacks or scams.

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

http://www.spamhaus.org/news/article/720/spamhaus-botnet-summary-2014 Waledac Botnet - http://blogs.microsoft.com/blog/2010/02/24/cracking-down-on-botnets/

Stuxnet and its legacyStuxnet targeted very specific Programmable Logic Controllers (PLC)

It was one of the most complex pieces of malware ever written.

There have been follow-on variants similar. But Stuxnet was the first to be viewed by most researches as a weapon created by a Nation State.

Commercial cybercrime operators are undoubtedly studying this incredible engineering to improve their own product….

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

http://spectrum.ieee.org/telecom/security/the-real-story-of-stuxnet

We haven’t been gaining ground

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Veri

zon

Dat

a Br

each

Inve

stig

atio

n re

port

–20

15 -

http

://w

ww

.ver

izon

ente

rpri

se.co

m/D

BIR/

2015

/

But how could this be?• How many Billions of $ are spent on security every year?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Source: http://www.wsj.com/articles/financial-firms-bolster-cybersecurity-budgets-1416182536

So it’s not $... What’s happening?• We’ve lost the defender’s advantage..

� Organizations don’t know the terrain� Maintaining operations in a secure state takes work� We’ve been betrayed by our own information systems….

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

The ugly truth about modern IT systems“Unfortunately, modern computing and communications technologies, for all their benefits, are also notoriously vulnerable to attack by criminals and hostile state actors.”…..

“It is a regrettable (and yet time-tested) paradox that our digital systems have largely become more vulnerable over time, even as almost every other aspect of the technology has (often wildly) improved”…..

“Modern digital systems are so vulnerable for a simple reason: computer science does not yet know how to build complex, large-scale software that has reliably correct behavior. This problem has been known, and has been a central focus of computing research, since the dawn of programmable computing. As new technology allows us to build larger and more complex systems (and to connect them together over the Internet), the problem of software correctness becomes exponentially more difficult.”

Matt Blaze

April 2015

http://www.crypto.com/papers/governmentreform-blaze2015.pdf

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Anyone recognize these?We’ve started seeing the designer vulnerability – with great marketing and all!

These vulnerabilities take advantage of underlying software that so many other systems are built on, causing panic and confusion about what is actually vulnerable. Worse, many “appliances” leverage vulnerable versions, and patches and upgrades can take months….

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Heartbleed Shellshock Poodle Ghost Venom

But wait, there’s more!I don’t really know how to put this, so I’m just going to put it….

Verizon 2015 Data Breach Investigations Report

http://www.verizonenterprise.com/DBIR/2015/

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Bruce Schneier• Pioneer in information security…

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

“I am regularly asked what the average Internet user can do to ensure his security. My first answer is usually “Nothing, you’re screwed’.”

Bruce’s quote notwithstanding…We have a situation where we know the building blocks of our IT systems will continue to go through the research-vuln-patch-fix cycle.

What this really means is that pure prevention is impossible. We simply can not prevent our IT systems from from being compromised without fail.

We need to focus on developing resilient systems where detection and correction are enhanced.

And so we began collecting logs….

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

A lot of logs…

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Big Data in Security• Logs from all kinds of different systems

� Authentication Logs� Firewall Activity� Network Devices� Windows Event Logs� Linux Logs� Web Server Logs� Anti-Malware Logs� Web Proxy Logs� Email Security Logs� Web Application Firewalls logs� Database activity� Cloud Services*

• Other data we can apply….

*Notoriously difficult and not usually timely

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Logs

Netflow

Network Security Monitoring

Derived Data

Security Testing Data

Context

ContextInformation about the environment that a human being would know or infer.

• Critical Systems List

• Admin-level accounts

• Location (Of person and device)

• Hardware Inventory

• Software Inventory

• Internet Facing?

• Open Tickets

• Change Controls

• System history

• Network Location

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Logs

Netflow


Derived Data


Context

What can context do?• Firewall log event:

May 22 14:02:51 172.21.250.1 %ASA-6-302013: Built inbound TCP connection 237062557 for outside:58.71.107.127/44975 (58.71.107.127/44975) to dmz:192.168.250.130/443 (74.129.196.130/443)

• Context:

Source IP Country – China

Reputation Score – Reported Spam and Web Login Brute Forcing

DShield Listing – Active Attacker list

Associated IOCs - ….

Previous Communications …

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Security Testing DataIncorporate the information we generate during security testing efforts.

• Vulnerability scans

• Web Application security assessments� What kinds of requests would indicate attack

activity?

• Bridge between network segments?� Attackers look to pivot to gain access

• Identified services and ports� Something new show up?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Logs

Netflow


Derived Data


Context

Security Testing Data• Firewall log event:

May 22 14:02:51 172.21.250.1 %ASA-6-302013: Built inbound TCP connection 237062557 for outside:58.71.107.127/44975 (58.71.107.127/44975) to dmz:192.168.250.130/443 (74.129.196.130/443)

• Vuln Testing Details:

Target server is Windows 2012R2 running IIS

Critical Apps – Yes (SharePoint)

Vulnerability Status – 0 critical, 0 highs, 1 medium, 2 low

Behind Web Application Firewall – Yes

Part of HA – Yes

Mapped DB instances - ……. Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Derived DataInformation about a system that needs to be generated by a script or action.

• Netstat� What is this system talking to?

• Running processes� Everything we expect to see and nothing more?

• Logged On Users� Active Window and Idle time

• State of defensive components� Status of AV and HIPS?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Logs

Netflow


Derived Data


Context

Network Security MonitoringUsing purpose built platforms to analyze everything passing by on the network.

• Passive Endpoint Detection

• SNMP and Syslog activity

• Encrypted traffic analysis

• PKI Certificates in use

• Traffic matching IDS signatures

• Packet Captures

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Logs

Netflow


Derived Data


Context

NetflowSummarizing all network communications between systems.

• Allows lengthy retention

• Easy baselining of network activity

• Capture utilization statistics

• Identify new traffic patterns

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Logs

Netflow


Derived Data


Context

A good start..Hooray for logs! We certainly have a lot of data (at least we thought we did)

Average size network (~1000 users) = 100 GB/day **

It quickly became evident that a new practice was necessary for information security teams – what are we going to do with all of this data?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Big Data Platforms in InfoSec• ELK – Elastic, Logstash, Kibana

• ELSA

• Greylog2

• Splunk

• Commercial SIEMs

• Lots of custom Hadoop lashups

• Only as good as the analysts who take care of them

• Lack of good tools to build predictive models

• Lack of good tools to build useful visualizations

• Lack of good integration into the overall defensive systems Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Capability Example - SplunkCollects structured or unstructured data

Field Extractions

Statistical Tools built into search interface

Visualization engine

Software that scales nicely on commodity hardware

Writing “apps” for Splunk

Connect to DBs and Hadoop Clusters (and some MongoDB goodness too!)

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Splunk Architecture

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

So what can we achieve with these?• Correlation

• Context

• Log Shipping – the art of collecting logs from critical systems and delivering them to the log management collectors, as close to real time as technically feasible.

• Real-time logging = Real Time alerting, forensics, and statistics

• Batch Logging = Forensics and statistics, with relative alerting

• Oh yeah, and you may hear the phrase “Kill Chain”….

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

The Kill Chain

Source: SecureState - http://blog.securestate.com/open-source-threat-intelligence-sony-breach/

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Example

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

User logs into the VPN• IP Address = IP X• User identity = UID• User location• Create a session for UID

Business Application reports login for “admin” from IP X• IP X is tied to VPN session• We map true person to account• Verify against user traffic profile

Intranet Site reports traffic from IP X• IP X is tied to VPN session• Unauth activity is attributed• Transaction is added to user session

NSM reports activity by IP• Active Device Fingerprinting• Application identification• Malicious activity detection

Remote Access and External ServicesA quick word problem: If a user logs in from Jacksonville, FL at 9 AM and Chicago, IL at 1030 AM, is it possible this is the same actual person?

What about Jacksonville at 8 AM and Amsterdam at 11 AM (UTC)?

Using a haversine function we can tell the distance between two geolocations

We can then use the distance and time difference to determine if a given action is likely to actually belong to the correct person.

Chart this both by biggest deltas in distance and in required speed.

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Example 2

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

User logs into local network• IP Address = IP X• User identity = UID• Device in use• Create a session for UID

Business Application reports login for “admin” from IP X• IP X is tied to local session• We map true person to account• Verify against user traffic profile

Intranet Site reports traffic from IP X• IP X is tied to local session• Unauth activity is attributed• Transaction is added to user session

Wireless network activity• Auth network by UID• Tie multiple devices to UID

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

So how well is that working?

SANS 2012 Survey - http://www.sans.org/reading-room/whitepapers/analyst/eighth-annual-2012-log-event-management-survey-results-sorting-noise-35230

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

The birth of Threat IntelThese tools now enable us to find and start blocking attack activity

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Attack

Logs Sent to SIEM

Analyst sees attack, enables block via defenses

This happens all day, every dayBlue teams generate tons of data on their own about attackers

IP Addresses

Domain Names

Email Subject Lines

Malware Behavior

The natural question: How can we get access to all of this data that others are collecting, and share what we see?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Attack Indicators

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Threat Intelligence!Because of course we want to be smart about it!

Various formats and protocols emerge to share this info

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Not to mention commercial offerings

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Surprisingly, its not perfect!“Why did this domain get listed as malicious again?”

“This list has over 2 million IP addresses in it!”

“So that breach we just had…. None of those IPs or domains were in our threat lists…”

“We added that block list to the firewall, but now the config file is bigger than the device can handle…”

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Problems• The only place to put all this stuff is in the SIEM

• Almost everything is entirely reactive (AV Signatures)

• Threat “Intel” can create lots of noise for the humans

• Threat Intel sources are (almost always) very expensive, even for large companies

• Loss of context for why a thing is bad

• False Positives and Botnets (your Mom’s PC, probably)

• Threat Intel sources suffer from a numbers problem…

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

What can we do?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

MLSec Project• Machine Learning Security Project

• Provides research and tools to help organizations understand how effective this threat intel is, and how they can leverage machine learning and predictive models into their information security operations.

• http://www2.mlsecproject.org/

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

MLSec Projects• Combine

� Python program to harvest intel feeds from various sources

• SecRepo� Repo of data samples to assist during development and testing of security

integrations with machine learning and predictive models….

• TIQ Test� Statistical comparison of threat intel data – provides visual output!

• Thanks to Alex for all the help!

• TIQ Test was featured in the 2015 Verizon DBIR report. How did they do?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Lots of overlap, right?

• Nope. Hardly at all.

• 97% of intel was unique

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Blue Team Nirvana• Human analysts training an army of machine learning robots

• Scale is met by the blue team robots

• Humans do the creative stuff

• Real-time sharing of threat indicators for later use (context)

• Automating reactions to detected threats

• Distributed early warning systems� Honeypots� Sandboxes� Network Security Monitoring

The end goal is to have machines handle all events and research, and only present data to humans to have a decision made.

Over time the machines learn and act just like the human analysts.

Free the humans to do what humans do best!

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Machine Learning/Predictive Models• Behavior Anomalies

� Has this user ever logged into this application before?

• Network Traffic Anomaly Detection� “Did this ever talk to that before?” and “Does traffic volume from each system look

right?”

• Incident Response Automation� Can this machine be reliably cleaned by our tools and techniques?

• Obvious Attack Blocking� That http request looks like a RFI attack against PHP, we run .Net – Block

• Reviewing possible security events� Float the really interesting stuff up to the humans

• But that’s sort of obvious stuff that lots of folks are trying (which will be awesome!!) Bi

g D

ata

in In

foSe

c -Be

n Fi

nke

-@be

nfin

ke

Is that website likely to be hacked?

https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-soska.pdf Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Is that software vulnerable?VDiscover – Improving binary software vulnerability detection through ML

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

http://www.vdiscover.org/

Identify Users at Risk• We’ve been developing a scoring system that ranks the most at risk users.

• Considers dozens of metrics, including:

• Email Activity (inbound # of domains, outbound, etc.)

• Web Activity

• Authentication Activity

• Incident Tickets

• Phishing Exercise performance

• Endpoint systems used

• Access to critical or sensitive systems

• Wireless networks configured

• Findability (how much information is available online)

• Position within the organization

• And more!

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Derived from historical review and modeling.

Predictive Models for Pentesters

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Security tests are really useful for simulating a specific problem, especially an attacker attempting to gain access to critical systems.

Usually these tests function under sever time and resource constraints

Let’s use machine learning and predictive modeling to make our assessments more effective!

Considers factors like

• Pivot Capabilities

• Vulnerability Likelihood

• System role

• Used by admins

• Users most vulnerable to SE attacks

• Discovering relationships between systems and components

Novel approaches to applying ML/PM• “Just in Time” Context for events (Team Cymru, Internet research, etc.)

• Improving Security Testing outcomes (PM for Pentesters!!)

• Building a “Phish” score for customers

• Using customer metadata as a signature

• Using Machine Learning to score the security of a “gold image”

• Building predictive models from early warning systems (honeypots)

• Using Predictive Models to block external sources based on ticket data

• Machine Learning to “shadow” and analyst and emulate (to scale!)

• Using Predictive Models to score system vulnerability levels Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

And lots more on the way…Organizations that position themselves to utilize their existing log management tools will be able to take advantage of the coming wave of machine learning and predictive models. This will enable rapid sharing and implementation of threat intelligence as well.

Without a good foundation, these tools will simply provide more noise and work. While every organization is anxious to leverage these, you need to answer these questions first:

• Do we have a complete inventory of all the devices on our networks?

• Do we know the security posture of those systems?

• Do we have a Single Point of Truth that we trust?

• Do we have the appropriate information from our critical systems collected by our log management system?

• Do we have baseline profiles for our users and our critical applications?

• Do we have defined incident response procedures?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Thank you! Any questions?

Big

Dat

a in

Info

Sec -

Ben

Fink

e -@

benf

inke

Ben Finke

@benfinke

[email protected] [email protected]

https://www.linkedin.com/pub/ben-finke/3/95a/8a1

blog.eiblackops.comblog.benfinke.com Be

n Fi

nke

-Sec

urin

g th

e Cau

se-@

benf

inke

kind of big data in info sec

Technology