protegrity platform presentation - big data · pdf file · 2016-01-294...
TRANSCRIPT
Data Security as a Business Enabler – Not a Ball & Chain
Big Data Everywhere
May 12, 2015
2
Les has over twenty years experience in information
security. He has held the position of Chief Information
Security Officer (CISO) for a credit card company and
ILC bank, founded a computer training and IT
outsourcing company in Europe and helped several
security technology firms develop their initial product
strategy.
Les founded and managed Teradata’s Information
Security, Data Privacy and Regulatory Compliance
Center of Excellence and is currently Director of Data
Security Solutions for Protegrity.
Les holds a BS in MIS, CISSP, CISA, ITIL and other
relevant industry certifications.
Les McMonagleProtegrity - Director Data Security Solutions
Les McMonagle (CISSP, CISA, ITIL)
Mobile: (617) 501-7144Email:[email protected]
The cost of cybercrime is staggering:
• The annual cost to the global economy is in excess of $400 billion/year.
• Businesses that are victims of cybercrime need an average of 18 days to
resolve the problem and suffer average costs of over $400K.
• The tangible and intangible costs associated with some of the recent
high-profile cases exceeds $400M.
• Traditional network security, firewalls, IDS, SIEM, AV and monitoring
solutions do not offer the comprehensive security needed to protect the
target data against current, new and evolving threats.
The Problem . . .
3
4 http://eval.symantec.com/mktginfo/enterprise/white_papers/b-anatomy_of_a_data_breach_WP_20049424-1.en-us.pdf
Typical Phases of an Attack
Bad guys search for the easy targets
• Large repositories of valuable, un-protected data
• Systems with weaker controls and/or more access paths
• Financial Data or Personally Identifiable Information (PII)
Blurring or Network Boundaries
• Where does your company network end and another begin?
• BYOD
• Cloud
• IoT (Internet of Things)
Insider threats remain the biggest threat
Advanced Persistent Threats (APTs)
• Coordinated, comprehensive attack strategies
Factors to Consider
5
Types of Sensitive Data Potentially Stored in Hadoop
SSN
Credit Card
PAN
Best Practices
Bank Account
Numbers
Employee
Personnel
Records
R&D
Pending
Patents
Health
Records
Accounts
Payable
Production Planning
Sales Forecasts
Order History
Trade
Secrets
Payroll Data
Prescriptions
Accounts
Receivable
Customer Lists
Customer
Contact
Information
Home
Addresses
DOB
Income Data
Health
History
Passwords
PIN
Salary Data
Location Data
Project
Plans
What to do about it?
Engage Information Security – CISO & InfoSec
Work with Legal and Compliance
Sp
on
so
rsP
olic
yP
roce
ss
Establish Good Data Governance Program
Apply consistent protection throughout the data flow
Limit access on a Need-to-Know basis
Protect the actual data Itself (regardless of where it is)
De-Identify data ─ without losing analytics value
Engage InfoSec, Legal, Compliance, Privacy
8
Engage Information Security – rather than avoid them
CISO’s and InfoSec ultimately have the same goals
Will help fund and implement effective data protection
Legal, Privacy and Compliance
• Identify/interpret regulatory and compliance requirements
• Helping protect the business by identifying risks to consider
• Incorporate generally accepted Privacy Principles*
Data Governance Program
9
Establish good data governance program
• Identified Data Owners
• Identified Data Stewards
• Identified Data Custodians
• RACI – Roles and Responsibilities
Data Governance subject areas
• Data Ownership
• Data Quality
• Data Integration
• Metadata Management
• Master Data Management
• Data Architecture
• Data Security & Privacy
Protect sensitive data consistently wherever it goes
10
At Rest
In Transit
In Use
Ideally with a single, centralized enterprise solution
What Data to Tokenize or Encrypt ?
Important questions to ask . . .
• What policy and regulatory compliance requirements apply?
• What risks must be mitigated?
• How/Why are protected columns accessed/used?
• What other mitigating controls are available?
• Appropriate balance between business and data privacy/security?
• When is Tokenization or Encryption most appropriate?
Utilization and access control limitations of Hadoop / Hive
Alternative protection options to consider
• Full Disk Encryption (FTE)
Important Data Security Architecture Questions
To Encrypt or Tokenize . . . This is the Question
Large - Field Size relative to width of lookup table - Small
More - Structured - Less
Increasing
Data
Sensitivity
Less - Percent of Access Requiring Clear Text - More
More - Logic in portions of the data element - Less
Tokenization Encryption
SSN
CC-PAN
Bank Acct No.
PIN, CID, CV2Password
DOBCustomer ID #
X-Ray
Cat Scan
HIV-Pos*
Diagnosis
report
Healthcare Records
Patient ID #
* With Initialization Vector (IV)
Potential Additional Controls to Consider
Tokenization or Encryption farther upstream in Data Flow
Do not load unnecessary regulated data to Hadoop
Access Hadoop Hive Tables through Teradata (QueryGrid)
HDFS file-level access control
Accumulo cell level access control (Row/Column intersection)
Knox Gateway (authentication for multiple Hadoop clusters)
Coarse grained HDFS File Encryption
XASecure (now HDP Advanced Security)
Ambari (Hadoop Cluster Management)
Kerberos (Authentication) – all or nothing
Piecemeal independent security tools for Hadoop
Reduce Your Exposure and Risk
14
SSN
SSN
Last 4 Digits
SSN
Full
Population of users who have
access to SSN today
Population of users who need
access to the full SSN to
perform their job function
Population of users
who can perform
their job function
with only the last 4
digits of the SSN
Token Vaultless Tokenization is
a form of data protection
that converts sensitive
data into fake data. The
real data can be retrieved
only by authorized users.
Often a more usable form
of protection than
encryption.
Improve Security Posture Without Impacting Analytics Value
Critical Core Requirements:
Single Solution Across All Core Platforms
Scalable, Centralized Enterprise-class Solution
Segregation of Duties between DBA and Security Admin
Good Encryption Key or Token Lookup Table Management
Data Layer Solution
Tamper-proof Audit Trail
Transparent (as possible) to Authorized Users
High Availability (HA)
Optional In-database vs. Ex-database Encryption/Tokenization
What to look for in a good Enterprise Solution…
15
Other "nice to have" Features...
16
Flexible protection options (Encrypt, Tokenize, DTP/FPE, Masking)
Broadest possible support for a range of data types
Built in DR, Dual Active, Key and system recovery capability
Minimal performance impact to applications/end users
Optimized operations to minimize CPU utilization
Proven Implementation methodology
PCI-DSS compliant solution (meeting all relevant requirements)
Deep partnership with Teradata and other database providers
Minimal impact on system upgrades
Maintain consistent referential integrity and indexing capability
Low Total Cost of Ownership (TCO)
Course Grained and Fine Grained Protection Capability
• HDFS File Encryption, Multi-Tennant File Encryption, HDFS FP (HDFS Codec)
• Column/Field Level “Fine Grained” Protection
Multi-Tennant Row Level Protection
• Allow authorized users access to specific rows only
• Unprotect columns for authorized users only
Heterogeneous Protection Capabilities
• Protect Upstream sources of data and Downstream targets of data
• Vaultless Tokenization, often less intrusive than encryption, reversible protection
• Reversible – where masking is not
• Deployed on the (Data) Nodes
• Leverage MPP architecture of Hadoop
• Avoid Appliance based solutions that can slow down Hadoop
Tokenization capability for Hive access to HDFS Files/Tables
• Hive does not support VarByte data type (Encryption = Binary Ciphertext)
What to look for in a good solution for Hadoop…
17
Granularity of Protecting Sensitive Data
Coarse Grained
Protection
(File/Volume)
Fine Grained
Protection
(Data/Field)
• Methods: File or Volume encryption
• “All or nothing” approach
• Does NOT secure file contents in use
• OS File System Encryption
• HDFS Encryption
• Secures data at rest and in transit
• Operates at the individual field level
• Fine Grained Protection Methods:
• Vaultless Tokenization
• Masking
• Encryption (Strong, Format Preserving)
• Data is protected in use and wherever it goes
• Business logic can be retained
Data Security Platform
20
Applications
File Servers
RDBMS
Big Data
File and Cloud
Gateway
Servers
Protection
Servers
Netezza
EDW
IBM Mainframe
Protector
Audit
Log
Audit
Log
Audit
Log
Audit
Log
Audit
LogAudit
Log
Audit
Log
Enterprise
Security
Administrator
PolicyPolicyPolicyPolicyPolicyPolicyPolicyPolicyPolicy
Protegrity Confidential
Protegrity’s Big Data Protector for Hadoop
21
Hive
MapReduce
YARN
HBase
HDFS
OS File System
Pig Other
Hadoop Cluster Hadoop Node
Policy
Audit
Protegrity Big Data Protector for Hadoop delivers protection at every node
and is delivered with our own cluster management capability.
All nodes are managed by the Enterprise Security Administrator that delivers
policy and accepts audit logs
Protegrity Data Security Policy contains information about how data is de-
identified and who is authorized to have access to that data.
Policy is enforced at different levels of protection in Hadoop.
Rich Security Layer over the Hadoop Ecosystem
22
HDFS
MapReduce
YARN
HBase
Pig / Hive
File
System
• UDF Support for Pig
• UDF Support for Hive
• Hive - Tokenization
• Java API Support for MapReduce
• Hbase - Coprocessor support via UDFs
• Cassandra – UDT
• HDFS Encryption through the HDFS Codec
• HDFS Commands Extended for Security Functions
• HDFS Interface for Java Programs
• De-identify before Ingestion into HDFS
• OS File System Encryption; Folder/File or Volume
Coarse Grained Protection: File / Volume Encryption
23
HDFS
MapReduce
YARN
HBase
All fields are in
the clear
Pig / HiveAll fields are in the
clear
File with identifiable
data elementsFile
SystemEntire File is
Encrypted
Volume encryption option will encrypt the
entire volume versus the files themselves.
Coarse Grained with HDFS Staging Area
24
HDFS
MapReduce
YARN
HBase
Pig / Hive
File
System
Staging Area
Ingest into HDFS
MapReduce
Jobs
Coarse Grained Multi-Tenant Protection
25
HDFS
MapReduce
YARN
HBase
Pig / Hive
File
System
T1
T2
T3
T1 folder T2 folder T3 folder
Key 1 Key 2 Key 3
Ingest into HDFS
clear folder
Fine Grained Protection
Encryption
• Reversible
• Policy Control (authorized / Unauthorized Access)
• Lacks Integration Transparency
• Not searchable or sortable
• Complex Key Management
• Example: !@#$%a^.,mhu7///&*B()_+!@
Masking
• Not reversible
• No Policy, Everyone Can Access the Data
• Integrates Transparently
• No Complex Key Management
• Example: Date of Birth 2/15/1967 masked as xx/xx/1967
Vaultless Tokenization / Pseudonymization
• Reversible
• Policy Control (Authorized / Unauthorized Access)
or
• Not Reversible
• No Complex Key Management
In either case
• Integrates Transparently
• Searchable and sortable
• Business Intelligence: 0389 3778 3652 0038
Protegrity Confidential
Production Systems
Non-Production Systems
Node
Node
Node
Database Server
Edge Node
Input File
Source
File
Protector
Java
Program
DatabaseDatabase
ProtectorSqoop
Application
ProtectorESA
Policy Deployment
Audit Collection
FPG
Enterprise-wide Protection
MapReduce
YARN
HBase
HDFS
OS FS
Ecosystem Components
Pig Hive
Source Systems
(Internal / External)
Target Systems
(Internal / External)
Input File
Source
ETL
Downstream Systems
If Edge Node is a Hadoop Node,
Hadoop resources can be used
Consumption BI Systems
Typical Enterprise Today
Inside the Firewall
Traditional IT Environment: Protegrity Protection
028
EDW
FilesDBs
Arch
Internet
Apps
Hadoop
Apps
Protegrity Confidential
Typical Enterprise Today
Inside the Firewall
Today’s IT Environment: Protegrity Protection
029
EDW
FilesDBs
Arch
AppsHadoop
Apps
HG Apps
Internet
ESA
Cloud
Protector
Gateway
File
Protector
Gateway
Files
Protegrity Confidential
In Summary
30
Establish Good Data Governance
Protect the actual data Itself
Maintain referential integrity
De-Identify data ─ while maintaining analytics capability
Apply consistent protection throughout the data flow
Engage Information Security, Legal and Compliance
Build security in rather than bolt it on later