apache metron meetup presentation at capital one

34
Apache Metron Meetup & Code Lab George Vetticaden Principal Architect @ Hortonworks Apache Metron Committer James Sirota Engineering Lead & Chief Data Scientist @ Hortonworks Apache Metron Committer

Upload: gvetticaden

Post on 16-Jan-2017

661 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Apache metron meetup presentation at capital one

Apache MetronMeetup & Code Lab

George VetticadenPrincipal Architect @ Hortonworks

Apache Metron Committer

James SirotaEngineering Lead & Chief Data Scientist @ Hortonworks

Apache Metron Committer

Page 2: Apache metron meetup presentation at capital one

Part 1 – Overview of Apache Metron

• Challenges with Today’s Security Tools to Combat Cyber Attacks

• Introduction to Apache Metron

• Metron Architecture

• Personas and Core Themes

• Why Apache Metron?

Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron

• Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform

• Get your Metron vagrant VM started

• Use Case 1: Adding a net new telemetry data source to Metron

• Use Case 2: Enriching Telemetry Data

• Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds

• Use Case 4: Setting up your IDE and writing Tests

Agenda

Page 3: Apache metron meetup presentation at capital one

Metron

Page 4: Apache metron meetup presentation at capital one

Page 4

The Good GuysSecurity

Practitioner

I have too many tools I need to learn

I don’t have a centralized view of my data

My tools are too expensive

I can’t find enough talent

I can’t keep relying on static rules

I need to discover bad stuff quicker

Most of my alerts are false positives

I have too many manual tasks

SOC Manager

Threat landscape too dynamic

More assets/users to manage

Attack surface increases

Legacy techniques don’t work anymore

Metron will make it easier and faster to findthe real issues I need to act on

Metron is a more cost effective way for my team to deal with the fast moving threat landscape

Page 5: Apache metron meetup presentation at capital one

Page 5

The Bad GuysAdvancedPersistent

Threat

ScriptKiddie

My techniques are predictable and known

My attack vectors are also known

You are not the only person I’ve attacked

I brag about what I did or will do

I set off a large number of alerts

I fumble around a lot

I am very unique in a way I do things

I live on your network for about 300 days

I know what I am after and I look for it, slowly

Your rules will not detect me, I am too smart

I impersonate a legitimate user, but I don’t act like one

Metron can take everything that is known about me and check for it in real time

Metron can model historical behavior of whoever I am impersonating and flag me as I try to deviate

Page 6: Apache metron meetup presentation at capital one

Page 6

Problems With Existing ToolsSecurity

InformationManagement

System

I am prohibitively expensive

I have vendor lock-in

I can’t deal with big data

I am not open

I am not extensible enough

LegacyPoint Tools

I was built for 1995

I am super specialized

I don’t scale horizontally

I have a proprietary format

You need a PhD to operate me

BehavioralAnalytics

Tools

I am mostly vapor ware

I was built by a small startup

I was modeled after a data set from 1999

I spam you with false positives

Page 7: Apache metron meetup presentation at capital one

Page 7

Apache Metron Vision

“Apache Metron is a Security Data Analytics Platform (SDAP). As a next

generation security analytics framework, it is designed to

consume and monitor network traffic and machine data within an

enterprise. Apache Metron is extensible and is designed to work at a massive scale. It is not a SIEM but rather the next evolution of a

SIEM.”

Apache Metron provides the following capabilities: Extensible spouts and parsers for attaching Apache

Metron to monitor any telemetry source

Extensible enrichment framework for any telemetry stream

Hadoop-backed storage for telemetry stream with a customizable retention time

Automated real-time index for telemetry streams enabling real-time search

Telemetry correlation and SQL query capability for data stored in Hadoop backed by Hive

ODBC/JDBC compatibility and integration with existing analytics tools

Page 8: Apache metron meetup presentation at capital one

Challenges that Apache Metron Solves

60%: Percent of breaches that happened in minutes

8 months: Average time an advanced security breach goes unnoticed

$400 million in estimated financial loss in 2015

70%-90%: Percentage of malware in breach unique to organization

2015 Verizon Data Breach Investigations Report

• Too expensive to keep data for enough time to understand history

• Not enough of the right data to provide context

• Too expensive to collect all the desired data to understand context

• Not sure if can detect a targeted event.• Too many events to review in timely manner• Not enough staff to review events in a timely

manner

Page 9: Apache metron meetup presentation at capital one

Part 1 – Overview of Apache Metron

• Challenges with Today’s Security Tools to Combat Cyber Attacks

• Introduction to Apache Metron

• Metron Architecture

• Personas and Core Themes

• Why Apache Metron?

Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron

• Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform

• Get your Metron vagrant VM started

• Use Case 1: Adding a net new telemetry data source to Metron

• Use Case 2: Enriching Telemetry Data

• Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds

• Use Case 4: Setting up your IDE and writing Tests

Agenda

Page 10: Apache metron meetup presentation at capital one

Real-time Processing Engine

PCAP

NETFLOW

DPI

IDS

AV

EMAIL

FIREWALL

HOST LOGS

PARSE

NORMALIZE

TAG

VALIDATE

PROCESS

USER

ASSET

GEO

WHOIS

CONN

ENRICH

STIX

Flat Files

Aggregators

Model As AService

Cloud Services

LABEL

PCAPStore

ALERTPERSIST

Alert

Security Data Vault

Apache Metron Logical Architecture

Network Tap

Custom Metron UI/Portals

Real-TimeSearch

InteractiveDashboards

DataModelling

IntegrationLayer

PCAPReplay

SecurityLayer

Data & Integration Services

Apache Metron

Page 11: Apache metron meetup presentation at capital one

Page 11

Sensor A

Sensor B

Sensor N

Topic A

Topic B

Topic (N)

ApacheKafka

PCAPPCAP Probe

Physical Architecture

NormalizingTopology A

NormalizingTopology B

NormalizingTopology N

ApacheStorm

Native Format

Native Format

Native Format

PCAP on HDFS Metron PCAP Service

PCAP Topology

Enrich

Normalized Metron Format Enrichment/

Threat IntelTopology

Out to Index + HDFS

Page 12: Apache metron meetup presentation at capital one

Page 12

Topic A

NormalizingTopology A

Sensor A

Native Format

ApacheKafka

ApacheStorm

Kafka Spout

Parser Kafka Bolt

Enriched

Metron JSON

Parsing/Normalization Topology

Key Points:• Each New Telemetry Data Source will have its own Parser Topology• Two types of Parsers available: Grok and Java

Page 13: Apache metron meetup presentation at capital one

Page 13

2 Types of Parsers

Parser Type Description Telemetry TypeGrok • A grok is a collection of named regular expressions.

• Provides a declarative way to write new parsers without any code

• A parser takes an input, which is usually a byte array coming from the Kafka Spout, and turns it into a Metron JSON Object.

• The Grok parser does this by utilizing the Grok library inside of the Parser Kafka Bolt Adapter.

• Use this parser when telemetry is simple to parse or low in volume

Java • Java based approach to writing a custom parsers • Use this parser when telemetry is complex to parse or high volume

Page 14: Apache metron meetup presentation at capital one

Page 14

Metron JSON Object• Numerous sensors log in different formats. The parser should normalize at least the

following subset of fields to the following Metron JSON naming conventions:

Page 15: Apache metron meetup presentation at capital one

Page 15

Enrichment

Bolt(a)

Enrichment

Bolt(n)

Threat Intel

JoinerMessage Splitter:

Enrichment

Enrichment Joiner

Message Splitter:

Threat Intel

Model Bolt (n)

Threat Intel Bolt (n)

Metron Enrichment

Loader Framework

Metron Threat Loader

Framework

Data Store

Fast Cach

e

Fast Cach

e

Fast Cach

e

Fast Cach

e

Data Store

EnrichmentTopology

ApacheKafka

EnrichedWriter Bolt

= Message Stream

Apache Storm

= Enrichment Stream

Enrichment Topology

Page 16: Apache metron meetup presentation at capital one

Page 16

Part 1 – Overview of Apache Metron

• Challenges with Today’s Security Tools to Combat Cyber Attacks

• Introduction to Apache Metron

• Metron Architecture

• Personas and Core Themes

• Why Apache Metron?

Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron

• Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform

• Get your Metron vagrant VM started

• Use Case 1: Adding a net new telemetry data source to Metron

• Use Case 2: Enriching Telemetry Data

• Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds

• Use Case 4: Setting up your IDE and writing Tests

Agenda

Page 17: Apache metron meetup presentation at capital one

Page 17

Personas

Page 18: Apache metron meetup presentation at capital one

Page 18

Metron’s Key Functional Themes

PlatformWork done to harden the platform for performance, scale, extensibility and maintainability. This also includes capabilities around provisioning, managing and monitoring the application.

Set of Data Sources that Metron provides capabilities to stream, ingest and parse into the platform.

A set of Storm Topologies to perform various actions in real-time including: normalization of telemetry data, enrichment, cross reference with threat intel feeds, alerting, indexing, and persisting into Historical stores

Data Collection

Data Processing

UI Set of portal, dashboard and user interfaces for the different personas.

Page 19: Apache metron meetup presentation at capital one

Page 19

Target Personas and Themes for Apache Metron 0.1Tech Preview 1 - Intro

Theme: Platform Theme: Data Collection

Theme: Data Processing Theme: UI

Security Platform Engineer

Security Platform Engineer

Security Platform Engineer

SOC Investigator Security Platform Engineer SOC Investigator

Forensic InvestigatorSOC Investigator

SOC Analyst SOC Manager

Page 20: Apache metron meetup presentation at capital one

Page 20

• Fully automated vagrant install of Metron on a single VM• Fully automated install of Metron on multi-node HDP cluster via Ansible scripts, Ambari

blueprints and APIs including:• Multi-node Elastic Search Cluster• Metron-UI Web Application • Deployment of the Metron Storm Topology• Deployment of telemetry sensors: PCAP, Bro, YAF(Netflow), Snort

• OpenSOC redesign (new topology structure, extensible enrichments, threat intel, data loads, configs, ease of adding new topologies)

Platform

Data Collection• Ingestion of the following data sources: PCAP via pycapa or C++ DPDK probe, Bro,

Netflow via YAF, Snort• Parsers for the following data sources: PCAP, Bro, Netflow & Snort

Data Processing

• Support for the following enrichment services: Geo, WhoIs, Host• Threat Intelligence Message enrichment - Enrich messages with fields that mat the

threat intelligence data in HBase• Support for the following persistence services: HDFS, HBase and Elastic Search• Indexing events and Alerts into Elastic Search cluster• Support for Soltra(CIF) Threat Aggregator Services via STIX and Taxii Feed• Ability to replay PCAP files for Testing

UI

• Metron Investigator UI to search across indexed events and alerts for SOC Analyst & Investigators

• Histogram Panels for each of the data sources (YAF, Bro, Snort)• Table Views for Alerts (YAF, Bro, Snort)• Customize new panels with different data sources and different panel types.

Key Features of Apache Metron 0.1

Page 21: Apache metron meetup presentation at capital one

Page 21

Part 1 – Overview of Apache Metron

• Challenges with Today’s Security Tools to Combat Cyber Attacks

• Introduction to Apache Metron

• Metron Architecture

• Personas and Core Themes

• Why Apache Metron?

Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron

• Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform

• Get your Metron vagrant VM started

• Use Case 1: Adding a net new telemetry data source to Metron

• Use Case 2: Enriching Telemetry Data

• Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds

• Use Case 4: Setting up your IDE and writing Tests

Agenda

Page 22: Apache metron meetup presentation at capital one

Page 22

Why Metron? SOC Analyst Perspective

Looking through alerts25%

Collecting contextual data25%

Formulating a Hypothesis5%

Investigate20%

Remediate15%

Update Work-flow5%

Wrte Report5%

Analyst workflow• Alerts Relevancy Engine• Smarter ML alerts• Centralized Alerts Console• Enriched with threat intel data

• Fully enriched messages• Single pane of glass UI• Centralized real-time search• All logs in one place

• Granular access to PCAP• Replay old PCAP against new signatures• Tag behavior for modelling by data scientists• Raw messages used as evidentiary store• Mine investigation history• Asset inventory as an enrichment• User identity as an enrichment

• Workflow engine• Ticket clustering

Everything you need to know in one place

Page 23: Apache metron meetup presentation at capital one

Page 23

Why Metron? Data Scientist Perspective

Formulating a Hypothesis5%

Finding Data20%

Cleaning Data20%

Munging Data20%

Visualizing Data20%

Modelling Data10%

Validating Model5%

Data Science Workflow• All my data is in the same place• Data exposed through a variety of APIs• Standard Access Control Policies• Quickly see what I have

• Metron normalizes objects• Partial schema validation on ingest• Tagging on ingest

• Automatic data enrichment• Automatic application of class labels• Common Metron Objects• Massively parallel computation framework

• Reusable Zeppelin Dashboards• Real-time search + UI• Integration with Python/R• Integration with analytics tools

Reducing time from hypothesis to model

Page 24: Apache metron meetup presentation at capital one

Page 24

Part 1 – Overview of Apache Metron

• Challenges with Today’s Security Tools to Combat Cyber Attacks

• Introduction to Apache Metron

• Metron Architecture

• Personas and Core Themes

• Why Apache Metron?

Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron

• Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform

• Get your Metron vagrant VM started

• Use Case 1: Adding a net new telemetry data source to Metron

• Use Case 2: Enriching Telemetry Data

• Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds

• Use Case 4: Setting up your IDE and writing Tests

Agenda

Page 25: Apache metron meetup presentation at capital one

Page 25

Use Case Setup

• Scenario• Customer Foo has installed Metron TP1 and they are using the out of the box data sources (PCAP,

YAF/Netflow, Snort and Bro). They love Metron!• But now they want to add new data source the the platform: squid proxy logs.

• Customer Foo’s requirements are the following1. Need to ingest the proxy events from Squid logs in real-time2. The proxy logs has to be parsed into a standardized JSON structure that Metron can understand

3. In real-time, the squid proxy event needs to be enriched with domain/whois information (domain, cert, country, company)

4. In real-time, the domain of the proxy event must be checked against for threat intel feeds5. If there is a threat intel hit, an alert needs to be raised6. The end user must be able to see the new telemetry events and the alerts from the new data

source

Page 26: Apache metron meetup presentation at capital one

Page 26

Squid & its Telemetry Event

• What is Squid?• Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and

improves response times by caching and reusing frequently-requested web pages

• What does a Squid Access Log look like?• When you make an outbound http connection to https://www.cnn.com, the following entry gets added to a file

called access.log:

Unix Epoch Time

IP of host where connection was made.

The domain name of the outbound connection

1461576382.642 161 98.220.218.158 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html

Page 27: Apache metron meetup presentation at capital one

Page 27

What Metron does to the Squid Telemetry Event in Real-time

Convert from Unix Epoch to

Timestamp

Use Metron’s asset enrichment to enrich that IP (hostname, type of device)

Use Metron’s WhoIs enrichment To look up domain name information (e.g:

Use the Metron’s Threat Intel Services to cross-reference the IP with threat intel

feed to see if there is a hit

1461576382.642 161 127.0.0.1 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html

Index the event into Elastic and persist into HDFS (Security Data Vault)

Page 28: Apache metron meetup presentation at capital one

Page 28Real-time Processing Engine

Squid Logs

PARSE

NORMALIZE

TAG

VALIDATE

PROCESS

USER

ASSET

GEO

WHOIS

CONN

ENRICH

STIX

Flat Files

Aggregators

Model As AService

Cloud Services

LABEL

PCAPStore

ALERTPERSIST

Alert

Security Data Vault

Real-TimeSearch

InteractiveDashboards

DataModelling

IntegrationLayer

PCAPReplay

SecurityLayer

Data & Integration Services

Tracing the Squid Event across the Platform

Custom Metron UI/Portals

Page 29: Apache metron meetup presentation at capital one

Page 29

Step 1: Telemetry Ingest (Tracing an Event)

1461576382.642 161 98.220.218.158 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html

Page 30: Apache metron meetup presentation at capital one

Page 30

Step 2 – Process/Parse (Tracing an Event)

Page 31: Apache metron meetup presentation at capital one

Page 31

Step 3 – Enrich (Tracing an Event)

Page 32: Apache metron meetup presentation at capital one

Page 32

Enriching Data Architecture

Page 33: Apache metron meetup presentation at capital one

Page 33

Step 4 – Label/Threat Intel (Tracing an Event)

Page 34: Apache metron meetup presentation at capital one

Page 34

High level Steps – How to Add the New Telemetry

1. Create new Kafka topic for the new telemetry source called “squid”

2. Create and validate a grok statement file that parses the squid event log into a format that Metron can understand

3. Store that grok statement in HDFS

4. Create a new flux configuration for the new Squid parser Storm Topology.

5. Update Zookeeper with configuration to mark what fields in the telemetry to enrich and what fields to cross-reference with threat intel feeds.

6. Move the flux configuration to the host where you will deploy the topology.

7. Deploy the new squid Storm parser topology using the new flux configuration

8. Load WhoIs enrichment data and configure enrichment mapping

9. Load Threat Intel data and configure threat intel matching mapping

10. Use Apache Nifi to capture the squid events and push them into Metron

11. Create a new Panel in Kibana and see the telemetry events

Key PointsEasy Extensibility – The ability to add new data source without writing any code and in an easy manner!!

Repeatable Pattern - The following represents a repeatable pattern that you can apply to most data source