how secure is hadoop?download.101com.com/pub/tdwi/files/maprvoltage120314.pdf · hbase api json api...
TRANSCRIPT
How Secure is Hadoop?
Colin White
BI Research
December 3, 2014
2
Sponsors
3
Speakers
Colin White
President,
BI Research
Anoop Dawar
Senior Director of Product
Management,
MapR Technologies
Sudeep Venkatesh
VP, Solutions Architecture,
Voltage Security
Copyright © BI Research, 2014
Colin White
President, BI Research
TDWI, MapR and Voltage Security Webinar
December 2014
How Secure is Hadoop?
Copyright © BI Research, 2014
Topics
The Role of Hadoop in the
Enterprise
Data Security and Privacy in the
Enterprise
How Secure is Hadoop for
Enterprise Use?
5
Copyright © BI Research, 2014
Why Hadoop in the Enterprise?
6
“Hadoop is a key component of the next generation data architecture”
“Hadoop enables big data applications for both operations and analytics”
Copyright © BI Research, 2014
Next Generation Data Architecture
7
Traditional EDW environment
Investigative computing platform
Data refinery
Data integration platform
Operational real-time environment
RT analysis platform
Other internal & external structured & multi-structured data
Real-time streaming data
Analytic tools & applications
Operational systems
RT BI services
Copyright © BI Research, 2014
Hadoop is Evolving Rapidly
8
Copyright © BI Research, 2014
Using Hadoop in the Enterprise
9
Traditional EDW environment
Investigative computing platform
Data refinery
Data integration platform
Operational real-time environment
RT analysis platform
Other internal & external structured & multi-structured data
Real-time streaming data
Analytic tools & applications
Operational systems
RT BI services
Copyright © BI Research, 2014
Data Security and Privacy
“Data security and privacy deliver data protection across
enterprises. Together, they comprise the people, process and
technology required to prevent destructive forces and
unwanted actions. Data security and privacy aren't a nice-to-
have. They are required by more than 50 international legal
and industry mandates, as well as business leaders.” IBM
“As data has become an increasingly valuable corporate
asset, hackers and data thieves continue their relentless
drive to thwart protection measures. A serious data breach
causes immeasurable damage to corporate reputation.”
Voltage Security
10
Copyright © BI Research, 2014
Data Breaches Are Expensive: Target Stores
11
Source: retailcustomerexperience.com
“Target’s fourth quarter profit decreased by 46 percent”
Copyright © BI Research, 2014
Attacks are becoming
increasingly more sophisticated
and are constantly evolving
Attacks Are Becoming More Sophisticated
12 Copyright BI Research, 2014
Hacked two different card processing companies
Counterfeited pre-paid debit cards
Reprogrammed cards with infinite balance
Organized mules to withdrew money at ATMs
Copyright © BI Research, 2014
Data Breaches Are On the Rise
13
Copyright © BI Research, 2014
A Decade of Data Breach: Top Trends /cont
14
Copyright © BI Research, 2014
Issue: Data is Becoming More Dispersed
15
Multiple user devices
Multiple output formats
Multiple deployment options
Multiple analytic tools Multiple data sources
Increasing data volumes & data rates
DW historical data
Web & social content
Sensor data
Operational data
Text & media files
Decision management
Data management
Data integration
Data analysis
Decision management
Copyright © BI Research, 2014
Issue: Cloud Computing Complicates Matters
16
1. Data Breaches
2. Data Loss
3. Account/Service Traffic Hijacking
4. Insecure APIs
5. Denial of Service
6. Malicious Insiders
7. Abuse of Cloud Services
8. Insufficient Due Diligence
9. Shared Technology
Copyright © BI Research, 2014
Data Security vs Database Security
17
Traditional Data Security Data Flow Threats to Data
Credential compromise
Traffic interceptors
SQL injection,
Malware
Malware,
Insiders
Malware,
Insiders
Authentication
Firewalls, Transport
Layer Security (TLS)
Database security (data access
controls, encryption, masking,
tokenization, auditing, etc.)
Data storage encryption
Based on an image from Voltage Security
Data security
Firewalls, Transport
Layer Security (TLS)
Copyright © BI Research, 2014
Hadoop Data Security Requirements
Authentication
• Interactive users (Hive and SQL systems, non-relational systems)
• Desktop and mobile devices
• Batch jobs (Pig, MapReduce) and external systems
• Hadoop components
• Single sign-on
Authorization
• Granular access controls for files and database systems
Protection
• Data encryption, masking and tokenization for at-rest files and databases
and for in-flight data
Governance
• Auditing and data lineage reporting
• Data life cycle management, backup, recovery and disaster recovery
18
Copyright © BI Research, 2014
Hadoop Data Security Technologies (Examples)
Authentication
• Kerberos and LDAP/AD
• Apache Knox Gateway: single point of authentication for Hadoop clusters
(for REST/HTTP clients)
Authorization
• Apache Sentry: access level (role-based) security for data and metadata
on a Hadoop cluster
Protection
• Hadoop-provided data encryption options plus commercial products for
encryption, masking and tokenization of at-rest and in-flight data
Governance
• Apache Falcon: data life-cycle management, data retention, data lineage
Note: The Apache products outlined above are still in development and are
therefore immature
19
Copyright © BI Research, 2014
Third-Party Product Example: Voltage Security
20
Copyright © BI Research, 2014
How Secure is Hadoop?
21
“There are still some barriers to adoption, with security and compliance
concerns being chief among them.”
Copyright © BI Research, 2014
Securing Hadoop (eWeek Article)
Plan for Information Security From the Start
Get In Early on Projects, Ask Questions About the Data
Tie Into Your Corporate Email and Identity System
Encrypt Your Data
Log Everything and Keep Backups
Set Up a Security Steering Committee
Identify and Tag Your Sensitive Data
Voice Your Security Requirements
Expect More From Your Commercial Hadoop Distribution
Empower and Layer Security, One Coat at a Time
Understand Data's Lineage
Protect All the Data
22
IMO these requirements will vary by project type and the data being processing
Copyright © BI Research, 2014
Thank You
23
© 2014 MapR Technologies 24 © 2014 MapR Technologies
© 2014 MapR Technologies 25
MapR: Best Product, Best Business & Best Customers
Top Ranked Exponential
Growth Premier
Investors Cloud
Leaders
2X bookings year over year
80% of accounts expand 3X
90% software licenses
< 1% lifetime churn
>$1B in incremental revenue
generated by 1 customer
© 2014 MapR Technologies 26
FOUNDATION
Going Big Requires a Rock-Solid Architecture
© 2014 MapR Technologies 27
FOUNDATIO
N
Written in
C/C+
vs. Java
Distributed
Metadata
Direct Disk
Access
Volume-based
Management
Fully
Read/Write
File System
Architecture Matters for Production Success
No Garbage Collections No NameNode
No Local File System Underneath
Easy Scaling & Management Supports File Updates
& Appends
© 2014 MapR Technologies 28
No Proprietary Lock-in: Same Data, Same Code
Data stored
in Native
format
Application
Code
MapR
Distribution
Data stored
in Native
format
Application
Code
Distribution
C/H
distcp
Same
Code
Data stored
in Native
format
Application
Code
Distribution
C/H
distcp
Same
Code
© 2014 MapR Technologies 29
Enterprise Openness: No Hadoop API Lock-in Connect Hadoop to existing environments using industry standard APIs
MapR : Random Read Write Capable POSIX
Platform
NFS ODBC,
JDBC REST
LDAP,
Linux
PAM
BI Tools
Security
Protocols
Web
Integration
Linux
Command
s
Access
Tools
HPC Jobs
3rd party
Custom
Code
Web
Servers
Pluggable
Services
© 2014 MapR Technologies 30
Benefits of HP Vertica on MapR
Vertica
NFS
Vertica
NFS
Vertica
NFS
MapR Data Platform
Vertica
Files
Vertica
Files
Vertica
Files
• Disaster recovery
• Improved disk usage
• Snapshots/Backup
• Reduced Complexity
• Lower operational cost
• Faster local file access
• Easy capacity
expansion
• Dynamic storage
utilization
Moving data costs money...
HP Vertica on MapR moves processing to data and utilizes the same hardware for both.
© 2014 MapR Technologies 31
Unbiased Open Source: No Application Lock-in a la Linux
• Open source distribution is about providing choice to customers
– All Linux distributions include
• MySQL and PostgreSQL and SQLite
• Apache HTTP server and nginx and Lighttpd
• MapR - Only Hadoop distribution that provides unbiased choice
MapR Distribution for
Hadoop
Distribution A Distribution B
Spark stack Spark and SparkSQL
Spark only None
Interactive
SQL
Multiple options (Shark & Impala & Drill &
Hive/Tez)
One option (Impala)
One option (Hive/Tez)
Providing the freedom of choice to pick the right tool for the right job
© 2014 MapR Technologies 32
MapR Distribution for Apache Hadoop
MapR Data Platform (Random Read/Write)
MapR-FS (POSIX)
MapR-DB (High-Performance NoSQL)
Security
YARN
Pig
Cascading
Spark
Batch
Spark Streaming
Storm
Streaming
HBase
Solr
NoSQL & Search
Juju
Provisioning &
Coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce v1 & v2
APACHE HADOOP AND OSS ECOSYSTEM
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow & Data
Governance Tez
Accumulo*
Hive
Impala
SparkSQL
Drill
SQL
Sentry Oozie ZooKeepe
r Sqoop
Whirr Flume
Data Integration & Access
HttpFS
Hue
NFS HDFS API HBase
API JSON API
Ma
pR
Co
ntr
ol
Sys
tem
(M
an
age
ment a
nd
Mo
nito
rin
g)
CLI
G
UI
RES
T A
PI
Make Hadoop Reliable, Fast and Operational
Making Hadoop Do More
Securing, Governing Hadoop??
© 2014 MapR Technologies 33
MapR’s Security Offering
Client Server
Can you
do this?
Who are
you?
Prove it.
Can
intruder
see or
change
traffic?
Is there a
record of
what was
done?
Dr.
Evil
Encryption
Auditing
Authentication
Authorization
• Security with or
without the use of
Kerberos
• Standard Linux
account integration
• Performance
• Ease of use
• Broad ecosystem
support
© 2014 MapR Technologies 34
Authorization
Authentication
Who am I?
• Kerberos authentication
• MapR’s built-in authentication - simpler to setup
• Widest registry support
• Single Sign ON
Authentication
Encryption
Auditing
© 2014 MapR Technologies 35
Authorization
Authorization
What am I allowed to do?
• Uses authenticated identity in MapR ticket
• Sentry for Impala (possibly Hive)
• Cell level access control in HBase 0.98
• Comprehensive column, CF access control in
MapR-DB
• Role based access control in MapR-DB
• MapReduce Access control lists
• Unix permission bits for files
• Job queue authorization
• Cluster and Volume Access control
Authentication
Encryption
Auditing
© 2014 MapR Technologies 36
Authorization
Encryption
Transfer and store data securely
• Encrypt all traffic from client to cluster
• Encrypt all traffic within cluster
• Rotate keys periodically to reduce probability
of cracking the key
• Extremely performant
• Utilizes Intel AES-NI technology when
available
Authentication
Encryption
Auditing
© 2014 MapR Technologies 37
Authorization
Encryption
Trace who did what, when?
• All Job submissions are recorded through Job
Tracker log
• Oozie audits who ran what as part of a
workflow
• Hive captures changes that were made to the
Hive metadata
• Control sys. records all admin actions
• Maprcli logs all commands issued
• Firewalls to limit access to the cluster and its
logs
• OS system shell logging to capture commands
Authentication
Encryption
Auditing
© 2014 MapR Technologies 38
Authorization
Partnerships
• Data Discovery
• Tell me what sensitive data I have on my
cluster
• Data Masking/Tokenization/format-preserving-
encryption
• Transform the data so it still has semantic
meaning, loses identity (but could still be
joined to other tables for insights)
• Unified security policy for data center
• Data moving from RDBMS to Hadoop needs
to carry the same policy
• Integrating with existing customer Keystores
Authentication
Encryption
Auditing
© 2014 MapR Technologies 39 © 2014 MapR Technologies
Voltage SecureData for Hadoop
© 2014 Voltage Security, Inc. Confidential
December, 2014
A History of Excellence
• Company: Founded in 2002 Out of Stanford University Based in Cupertino, California
• Mission: To protect the world’s sensitive data
• By: Providing encryption and tokenization solutions that protect the data wherever it is used or stored
• Market Leadership:
– Used by leading enterprises – 6 of the top 8 U.S. payment processors, 7 of the top 10 U.S. banks, top 5 Internet Retailers
– Certified on MapR and other leading Hadoop distributions
– Contributes technology to multiple standards organizations
41
Format-Preserving Encryption
42
AES
FPE 345-753-5772
8juYE%Uks&dDFa2345^WFLERG
First Name: Gunther Last Name: Robertson SSN: 934-72-2356 DOB: 20-07-1966
First Name: Uywjlqo Last Name: Muwruwwbp SSN: 253-67-2356 DOB: 18-06-1972
Ija&3k24kQotugDF2390^32 0OWioNu2(*872weW Oiuqwriuweuwr%oIUOw1@
• Supports data of any format: Name, address, dates, numbers, etc.
• Preserves referential integrity
• Only applications that need the original value need change
• Used for production protection and data masking
• Currently in the NIST standardization process
Tax ID
934-72-2356
43
Secure Stateless Tokenization (SST)
• Tokenization for PCI scope reduction
• Replaces token database with a smaller token mapping table
• Token values mapped using random numbers
• Numerous advantages over traditional tokenization: – No database hardware, software, replication problems, etc.
Credit Card 934-72-2356
Tax ID
1234 5678 8765 4321
Partial SST
SST 347-982-8309
Obvious SST
8736 5533 4678 9453
1234 5633 4678 4321
1234 56AZ UYTZ 4321
347-982-2356
AZS-UXD-2356
Data Protection with FPE (AES FFX) and SST
• Enables large amounts of sensitive data to be “de-identified” in Hadoop
• Majority of analysis, MapReduce jobs, etc. can occur on de-identified data
• Reduces insider threats and improves compliance
• Enables developers to test without exposure
• Enables Hadoop and cloud adoption
FPE
FPE
FPE
FPE
SST*
Name SS# Credit Card # Street Address Customer ID
James Potter 385-12-1199 37123 456789 01001 1279 Farland Avenue G8199143
Ryan Johnson 857-64-4190 5587 0806 2212 0139 111 Grant Street S3626248
Carrie Young 761-58-6733 5348 9261 0695 2829 4513 Cambridge Court B0191348
Brent Warner 604-41-6687 4929 4358 7398 4379 1984 Middleville Road G8888767
Anna Berman 416-03-4226 4556 2525 1285 1830 2893 Hamilton Drive S9298273
Name SS# Credit Card # Street Address Customer ID
Kwfdv Cqvzgk 161-82-1292 37123 48BTIR 51001 2890 Ykzbpoi Clpppn S7202483
Veks Iounrfo 200-79-7127 5587 08MG KYUP 0139 406 Cmxto Osfalu B0928254
Pdnme Wntob 095-52-8683 5348 92VK DEPD 2829 1498 Zejojtbbx Pqkag G7265029
Eskfw Gzhqlv 178-17-8353 4929 43KF PPED 4379 8261 Saicbmeayqw Yotv G3951257
Jsfk Tbluhm 525-25-2125 4556 25ZX LKRT 1830 8412 Wbbhalhs Ueyzg B6625294
Use Case 1: Global Telecommunication Co.
45
© 2014 Voltage Security, Inc. Confidential
Use Case 2: Health Care Insurance Company
46
© 2014 Voltage Security, Inc. Confidential
Use Case 3: Global Financial Services Company
47
© 2014 Voltage Security, Inc. Confidential
Conclusion
• Multi-platform enterprises adopting a data lake architecture
need a cross-platform solution for protection of sensitive data
• The open source community has invested in building enterprise
grade security for Apache Hadoop, with core capabilities for
perimeter security, authentication, authorization and auditing
• Voltage Security brings the data-centric security across data
stores including Hadoop—protecting data at rest, in use and in
motion, and maintaining the value of the data for analytics
• Together these enable comprehensive security for the
enterprise, and rapid and successful Hadoop adoption!
48
© 2014 Voltage Security, Inc. Confidential
http://www.voltage.com/hadoop/
844-311-2111
50
Questions?
51
Contact Information
If you have further questions or comments:
Colin White, BI Research
Anoop Dawar, MapR Technologies
Sudeep Venkatesh, Voltage Security