Introduction
Security Concerns/Challenges
Existing Controls/Limitations
Recommendation/Solutions
Summary
Holistic View of Big Data System
Courtesy/Copy Right vmware
‘V’ s of Big Data
•Variability
•Visualization
•Value
Characteristics
Parallel Task
Redundant
Distributed
Large data
Mixed Data
Processing
Features
Adoption of Big Data
• 73% (64% in 2013)is ready to invest in big data as per Gartner
• 76% senior level IT and security professionals are concerned about the inability to secure data across big data initiatives.
• More than 50% admitted these security concerns kept them away from starting or finishing cloud or big data projects.
Frameworks
• Hadoop• HDFS
• YARN
• Common
• No SQL – Cassandra, MongoDB etc.
• Spark
• Etc.
Hadoop Stack
Q 1
Q 2
Q 3
Q 4
Q 5
Key Questions?
Can
I trust Sources?
How to protect
sources, processes?
What
I am collecting?
Rival
Exploitation?
Employee
Confidentiality?
Big Data Challenges
• Technical
• Legal/Privacy
• All weak points of traditional IT systems
• Authentication of Applications and nodes
• Audit and Logging
• Monitoring, Filtering and Blocking
• API Security
Architectural Security Issues
1. Different than traditional DBs, Data warehouses
2. Deployment -Highly distributed, redundant
3. Sharded data
4. Data access/ownership
5. Inter-node communication
6. Client Interactions
7. No Security in Stacks
Deployment Issues
• Multi-node environment
• Patching/Application Configuration
• Updating the Hadoop stack
• Collecting trusted machine images
• Platform discrepancies
Operational Security Issues
• Rogue administrator or cloud service manager
• Access Control, Administrator - Data Access
• Configuration and Patch management
• Lack of Auditing
• Lack of Security gateways
• Lack of security in NoSQL
Key Security Facts
• Traditional security – small scale, static data
• Locating the source of data in cloud
• Streaming data – ultra-fast response times
Key Considerations
Scale along
Essential features
.
Data Security
Basic Functionality
Security of Environment
Traditional Limitations
Focus Areas by CSA
• Confidentiality
• Integrity
• Availability
What is the Approach?
• Technologies that address common threats
• Trick is in selecting options that work
• Depending on data security requirements
• Depending on Risk Appetite/profile of an organization
• Combination of
• Built-in security
• Add on security products
Adopt a Framework
ISO 27001, NIST, etc.
Gap/Risk Assessment
• Threats, vulnerabilities, risks
• Controls technical/process/people
• Fundamentals – Least Privilege,
Multilayer, Need to know/have
Monitor and Improve
• Continuous monitoring
• Identify and implement
relevant improvements.
Security Roadmap
Implement the Controls
Based on Priorities, Risks
Business Case
Cost-Benefit, Risk…etc..
Recommendations/Solutions
Kerberos
node authentication
File Encryption &
Key Management
Secure
Communication
Deployment
Validation
Log it
Use Kerberos
• Validating inter-service communicate
• Rogue nodes/applications out of the cluster
• Protect web console access
• Administrative access hard to compromise
• Most effective security control available
File Level Encryption
• Protects against some attacker techniques
• Malicious users/admins gain access to data nodes and directly inspect files/copied disk images
• Consistent protection across different platforms/Oss
• Some encrypts operations in memory
• Open source and commercial ones
User Key Management
• Encryption ineffective if access to encryption keys are not protected
• Keys stored in local disk insecure
• Distribution of keys and certificate
• Different keys - group, application and user
Secure Communication
• Secure communication between nodes/application
• SSL/TLS that protects all communication
• Burden is shared across all nodes
• Cloudera offers TLS, and some other providers too
• Otherwise integrate these in the application stack
Deployment validation
• Tools from the cloud provider, Hypervisor vendor, third parties to automate pre-deployment tasks
• Machine images, patches, and configurations should be fully updated and validated prior to deployment
• Validation tests, collect encryption keys, request access tokens before nodes are accessible to the cluster
• Service-level authentication built into hadoop to segregate administrative responsibilities
• Ensures that each node comes online with baseline security in place
Log it!
• Record of activity
• Natural fit for collecting and managing event data
• SIEM and log mgmt. uses Big data
• No reason not to add logging
• Increase in storage and processing demands is small, but the data is indispensable
Some more security measures
1. Mask data - uniquely tied to an individual
2. Tokenization techniques to protect sensitive data
3. Leverage the cloud database controls
4. Use OS hardening
Cloudera
• New role based security access control – sentry
Hortonworks
Voltage security – protect data from any source in any format, before it enters hadoop
IBM
Blindsights provides built-in features that can be configured during installation
Summary
Final Word
Big Data Security - Adopt standard frameworks and
identify/Deploy more specific technologic/process control
in addition to standard measures
Problem• Lack of adequate security
• Cloud and distributed
• No definite standard
• Controls may impact
function/performance
Future
• Built-in security controls
• Data Centric Security
• Adaption of Big Data
• Effective security controls
• Adopt a Frame work
• Carry out risk assessment
• Ensure right processes
• Adequate technology
• Enhance the awareness