Download - Fighting cyber fraud with hadoop
1
Fighting Cyber Fraud with HadoopNiel DunnageSenior Solutions Architect
2 ©2014 Cloudera, Inc. All rights reserved.
• Big Data is an increasingly powerful enterprise asset and this talk will explore the relationship between big data and cyber security, how we preserve privacy whilst exploiting the advantages of data collection and processing. Big Data technologies provide both governments and corporations powerful tools to offer more efficient and personalized services. The rapid adoption of these technologies has of course created tremendous social benefits. Unfortunately unwanted side effects are the potential rich pickings available to those with malicious intentions. Increasingly, the sophisticated cyber attacker is able to exploit the rich array public data to build detailed profiles on their adversaries to support their malicious intentions.
Summary
3 ©2014 Cloudera, Inc. All rights reserved.
• Data: - The new oil• Defend your data• The security value of Big Data
Agenda
Source: Grant Thornton LLP 2014 Corporate General Counsel Survey, conducted by American Lawyer Media
4 ©2014 Cloudera, Inc. All rights reserved.
• DDOS• Data Exfiltration
• Confidential customer records• Transaction data
• Reputation attack• False flag• Fake data
• Insider Threat
Cyber Security:- Data is a valuable commodityOperations designed to deceive in such a way that the operations appear as though they are being carried out by entities, groups or nations other than those who actually planned and executed them http://en.wikipedia.org/wiki/False_flag
@security_511 has continued to support OpSaudi, claiming further attacks on
websites connected to Saudi Aramco.
The @SQLiNairb hacker has released a database dump from a US fantasy football website (http://www.fftoday.com/), claiming that it was timed to coincide with the NFL draft
Anonymous Italy and Operation Green Rights (OpGR) have released the contents of an
email account connected to an Italian steel producer, in connection to accusations of
pollution against the company
5 ©2014 Cloudera, Inc. All rights reserved.
Typical Security Layers
Type Example
Access Physical (lock and key), Virtual (Firewalls, VLANS)
Authentication Logins – verify users are who they say they are
Authorization Permissions – verify what a user can doEncryption at Rest Data protection for files on diskEncryption in transport Data protection on the wire
Auditing Keep track of who accessed what
Policy / Procedure Protect against Human Error & Social Engineering
6
Cloudera’s Approach to Hadoop Security
Compliance-Ready
Comprehensive
Transparent
• Standards-based Authentication• Centralized, Granular Authorization• Native Data Protection• End-to-End Data Audit and Lineage
• Meet compliance requirements• HIPAA, PCI-DSS, …• Encryption and key management
• Security at the core• Minimal performance impact• Compatible with new components• Insight with compliance
6 ©2014 Cloudera, Inc. All rights reserved.
7 ©2014 Cloudera, Inc. All rights reserved.
• Hadoop Security: - Kerberos simplified deployment with Cloudera Manager• Sentry: - provides unified authorization with a single policy
for Hive, Impala and Search• HDFS Extended ACL’s and HBase cell level access control• Navigator encrypt and key trustee deliver compliant data security
• Via Gazzang acquisition• Navigator provides data management layer including audit, access
control reviews, data classification and discovery, and lineage
Defense: - Security Features
8 ©2014 Cloudera, Inc. All rights reserved.
Kerberos Security
Perimeter Security• Guarding access
to the cluster itself
• Technical Concepts:• Authentication
• Network isolation
Kerberos• Kerberos: A computer network authentication protocol that works on basis of tickets to
allow nodes to prove identity to each other in a secure manner using encryption extensively
• Messages are exchanged between:• Client• Server• Kerberos Key Distribution Center (KDC). • Note this is not part of Hadoop, but most Linux Distros come with MIT Kerberos
KDC.• Passwords are not sent across network, Instead passwords are used to compute
encryption keys• Authentication status is cached (don’t need to send credentials with each request)• Timestamps are essential to Kerberos (make sure system clocks are synchronized !)
9 ©2014 Cloudera, Inc. All rights reserved.
Apache Sentry
Access Security Sentry
• Sentry provides unified authorization across multiple access paths• A single authorization policy will be enforced
for Impala, Hive and Search• Role based access at Server, Database, Table or
View granularity• Multi-tenant: Separate policies for each
database / schema
• Access• Defining what users and
applications can do with data
• Technical Concepts:• Permissions
• Authorization
10 ©2014 Cloudera, Inc. All rights reserved.
Cloudera Navigator
Visibility Cloudera Navigator• Auditing and Access Management
• View, granting and revoke permissions across the Hadoop stack• Identify access to a data asset around the time of security breach• Generate alert when a restricted data asset is accessed
• Lineage• Given a data set, trace back to the original source• Understand the downstream impact of purging/modifying a data set
• Metadata Tagging and Discovery• Search through metadata to find data sets of interest• Given a data set, view schema, metadata and policies
• Lifecycle Management• Automate periodic ingestion of data • Compress/encrypt a data set at rest• Purge a dataset/replicate data set to a remote site
• Visibility• Reporting on where data
came from and how it’s being used
• Technical Concepts:• Auditing• Lineage
11 ©2014 Cloudera, Inc. All rights reserved.
12 ©Gazzang gazzang.com/products/cloudencrypt-for-aws
Linux Server / VMEncrypt client
Linux File, Directory
AES-256 Encryption
Process Based ACL’s
GPG
Linux Server / VMKey Trustee Server
Encryption at restNavigator Encrypt and Key Trustee• Encrypt any File, Directory
• AES-256 Encryption
• Unique Access controls• Process Based, NOT users / groups
• 100% Transparent• Separation of Duties
• Key Management• AES encryption keys stored on
separate Key Trustee server• Key manager breach, data is safe• Data Server breach, data is safe
13 ©2014 Cloudera, Inc. All rights reserved.
13
Our Design StrategyThe Enterprise Data Hub
One pool of data
One metadata model
One security framework
One set of system resources
A fully integrated Hadoop ecosystem
Storage
Integration REST (Webhdfs), File (Fuse) Flume, Sqoop
Resource Management YARN
Met
adat
a, N
avig
ator
BatchProcessing
Spark, MAPREDUCE,
HIVE & PIG
Stream Processing
Spark streaming
HDFS Hbase/ Accumulo
TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS
Engines
InteractiveSQL
CLOUDERAIMPALA
InteractiveSearchCLOUDERA
SEARCH
MachineLearning
Spark Mlib,MAHOUT,
Oryx
Math &Statistics
SAS, R
Secu
rity,
Nav
igat
or, S
entr
y
graph.vertices.filter{case(id, _) => id==13669222}.collect
Select CPU_Met from application WHERE (USAGE > 1000)LEFT OUTER JOIN ON application_ID where application_type IS Non_Critical
14
Operational EfficiencyPerform existing workloads faster, cheaper, better
Innovation and AdvantageAsk bigger questions in the pursuit of discovering something incredible
©2013 Cloudera, Inc. All Rights Reserved.
Enterprise Data Hub Users Cases
ETLAcceleration
EDWOptimization
Active Archive
OSINTAnalysis Fraud
Detection
Deep Exploratory
BI
HistoricalCompliance
Log Processing
PerformanceManagement
Risk Manageme
nt
15
Offence:- Fraud Detection
User Cases
• Distributed parallel execution with chained joins• Historical processing at scale• Machine Learning,
malware/anomaly detection, spam filters etc• Combined real time and batch
predictors15
Fully Automated at scale
16 ©2013 Cloudera, Inc. All Rights Reserved.
Big Data EconomicsAsk bigger questions
• Predictably process large data sets• Linear scaling• Robust and economic crypto
security• Creative fail fast innovation• Powers productivity insights
• Increasing infrastructure ROI• Increasing business ROI• Defeating fraudulent activity• Evaluating risk
Ingest
DiscoverPredict
Innovate
16
17 ©2014 Cloudera, Inc. All rights reserved.
storebuffer
Data Ingest• NRT Ingest
• Flume• Optimized to flow real time event data into the
Hadoop cluster• Spark Streaming for near real time micro batch
aggregations• Twitter streaming• Kafka• Log
• API• Bulk Load
• Sqoop for structured• Fuse file system access• API• Web / Hue
• Data Enrichment• Flume interceptors• Kite Morplines module
• Configuration based interceptors that can enrich data. For example extracting facets, entity extraction applying regulatory tags
Client
Client
Client
Client
Agent
Agent
Agent
enrichcollect
18 ©2014 Cloudera, Inc. All rights reserved.
Near Real time Access to threats
• View the geographic distribution of Slowloris DDOS taken from Apache web server logs• Help isolate unpatched
servers• Identify source of attacks
LogUtils.createStream(...) .filter(_.getText.contains(”408 Error")) .countByWindow(Seconds(10))stream.join(historicCounts).filter { case (word, (curCount, oldCount)) => curCount > oldCount}
19
Machine Learning
19
Real-time large-scale machine learning predictive analytics infrastructure build on Hadoop• Collaborative filtering and
recommendation• Classification and regression,• Clustering