best practices for protecting sensitive data across the big data platform
TRANSCRIPT
®
1© 2016 MapR Technologies | © 2016 Dataguise, Inc..© 2016 MapR Technologies | © 2016 Dataguise, Inc.
Best Practices for Protecting Sensitive Data Across the Big Data Platform
Mitesh ShahMapR | Product ManagerSecurity & Data Governance
®
Venkat SubramanianCTO | Dataguise
®
2© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Business Intelligence Trend for 2016 onwards…
IT-led, System-of-Record• Limited access• Glacial speed of response
Pervasive, Business-led, Self-service Analytics
• Near Real-time• Agile BI & Analytics• Deeper Insights into Diverse Data
Rita Sallam (Gartner)*
®
3© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Big Data Paradox
Data is the Biggest Asset
Data is also the Biggest Vulnerability
®
4© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Secure Business Execution
The ability of an Enterprise to safely and responsiblyleverage the value of all of their data assets to gain new business insights, maximize competitive advantage,and drive revenue growth.
®
5© 2016 MapR Technologies | © 2016 Dataguise, Inc..
MapR and Dataguise…
Enable SECURE BUSINESS EXECUTION
Through
Trusted Platform and Sensitive Data Management
…
®
6© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Big Data Platform Needs to be Trusted (not just secure)
Can we properly identify users?Can we authorize access
to data?
Can we plug in existing enterprise systems?
Is my data highly available?
Is there a proper paper trail?
Have others done this before?
Is multi-tenancy supported? Are apps supported across geographies and data centers?
Is my data governed?
TRUSTED
SECURE
Questions to Ask of Your Big Data Vendor. Verify the Platform is Trusted.
®
7© 2016 MapR Technologies | © 2016 Dataguise, Inc..
MapR Trust Model
Credibility
Vuln
Mgm
t
Detection
Resp
onse
Compliance
AA
DPA
Governance
Resilience
Four Pillars of Security
Auditing
Authorization
Data Protection
Authentication
®
8© 2016 MapR Technologies | © 2016 Dataguise, Inc..
What’s the (Big) Difference?
Flexibility•Multiple execution engines: Hive, Spark, MapReduce, Drill…
Scale•1000s of users, groups and applications sharing the same cluster
•100s data sources
•PBs of data
Multi-Structured Data•Multiple data formats: Parquet, JSON, CSV, MapR-DB tables
®
9© 2016 MapR Technologies | © 2016 Dataguise, Inc..
A
MapR Trust Model (Product Security)
GranularAuthorization
UbiquitousData Protection
• Access Control Expressions (ACEs)
• Protect files, tables, column families, columns, and management objects
• Extend to role-based access control (RBAC) with custom role functions
• Drill Views
• Encryption for data in motion• Within a cluster• Between clusters• Between client and cluster
• Encryption for data at rest• LUKS• Self-encrypting disk• Partners
• NSA-level cryptographic algorithms
• All events recorded immediately in JSON log files, with minimal performance impact
• Includes data access and administrative actions
• Ad hoc queries and custom reports on audit logs via SQL and standard BI tools
• Ticket-based authentication for all services in the cluster
• Integration with LDAP, Active Directory and other third-party directory services
• Kerberos or username/password authentication
AA
DPA4
21
3
FlexibleAuthentication
RobustAuditing
®
10© 2016 MapR Technologies | © 2016 Dataguise, Inc..© 2016 MapR Technologies | © 2016 Dataguise, Inc.
Granular Authorization with MapR
®
11© 2016 MapR Technologies | © 2016 Dataguise, Inc..
The Problem with POSIX Permissions
-rw-rw---- bruce dev-teamPOSIX Permissions
user group other
1.Change ownership of file to Sally.
2.Add Sally to dev-team group, even if she’s not a developer.
3.Allow ‘others’ to read the file.
Scenario 1:Sally needs to read the file.Options:
???
Scenario 3: All members that belong to both dev_team and managers.
1. Allow ‘others’ to read the file.
2. Create a supergroup ‘Tech’, and include all members from dev, QA, and Support in that group. chgrp Tech <filename>
Scenario 2: Groups ‘QA’ and ‘Support’ need to read the file.Options:
POSIX Permissions Are Limiting
AUTHORIZATION
®
12© 2016 MapR Technologies | © 2016 Dataguise, Inc..
POSIX ACLs vs ACEs
r : user:sally | (group:dev_team & group:managers)
Access Control Lists
MapR Access Control Expressions
AUTHORIZATION
Which one is easier to set and understand?Which one allows for higher granularity?
®
13© 2016 MapR Technologies | © 2016 Dataguise, Inc..
MapR Has ACEs for Files and MapR-DB Records
Example: user:mary | (group:admins & group:VP) & user:!bobPermissions on files, tables, column families, columns, JSON documents and sub-documents
AUTHORIZATION
Use Access Control Expressions (ACEs) to set granular permissions.
®
14© 2016 MapR Technologies | © 2016 Dataguise, Inc..
File ACEs – Key Features
Intuitive InheritanceSubdirectories and files inherit perms from parent directory
Whole-Volume ACEsVolume-level filter –useful in multitenant environments.
RolesArbitrary grouping of users according to your business needs
High PerformanceNo performance hit
Boolean OperatorsAllowing for ultra fine-grainpermissions
AUTHORIZATION
®
15© 2016 MapR Technologies | © 2016 Dataguise, Inc..
File ACEs: Whole Volume ACE Example
Whole-Volume ACEr: group:finance
Jane grants read access to Bob.File: /finance/final_report.csv r: user:bob
Bob cannot read the file/finance/final_report.csv because the whole-volume ACE is set to allow read-access to finance only.
Jane(Finance)
Bob(Developer)
Whole-Volume ACE
AUTHORIZATION
®
17© 2016 MapR Technologies | © 2016 Dataguise, Inc..© 2016 MapR Technologies | © 2016 Dataguise, Inc.
Robust Auditing with MapR
®
18© 2016 MapR Technologies | © 2016 Dataguise, Inc..
MapR Audits
• Who touched customer records outside of business hours?
• What actions did users take in the days before leaving the company?
• What operations were performed without following change control?
• Are users accessing sensitive files from protected/secured source IPs?
• Why do my reports look different, despite sourcing from same underlying data?
Monitoring IncidentResponse
Security
AUDITING
Serving Security Analysts
®
19© 2016 MapR Technologies | © 2016 Dataguise, Inc..
MapR Audits – Key Features
Data Access• Files• MapR-DB Tables
Cluster Operations• Administrative Operations• Maprcli commands
Authentication Requests
Secure High Performance Flexible• Retention Period• Maxsize• Coalesce Interval• Selective Auditing
JSON Format
{"timestamp":"{$date=2015-06-01T05:24:58.231Z}","operation":"GETATTR","user":"root","uid":"0","ipAddress":"10.10.x.x","nfsServer":"10.10.x.x","srcPath":"/dbtest.0/","srcFid":"2147.16.2","VolumeName":“mktg_files","volumeId":“mktg_files","status":"0"}
AUDITING
®
20© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Querying Audit Logs with SQLExample: detect suspicious, failed commands
®
22© 2016 MapR Technologies | © 2016 Dataguise, Inc..© 2016 MapR Technologies | © 2016 Dataguise, Inc.
Data Protection with MapR
®
23© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Encryption at Rest (Today)
SSN
Credit Card #
Health Records
Name +Age + Address
Sensitive Data
Volume
Self-Encrypting
Disk
2
3Use Partners for Masking, Tokenization, Format Preserving Encryption
DATA PROTECTION
Many Options for Block-Level, Disk-Level, and Field-Level Encryption
1
®
24© 2016 MapR Technologies | © 2016 Dataguise, Inc..© 2016 MapR Technologies | © 2016 Dataguise, Inc.
Sensitive Data Management with Dataguise
®
25© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Cost of a Data Breach
“Hackers and criminal insiders cause the most data breaches…malicious attacks can take an average of 256 days to identify…The most costly breaches continue to occur in the US and Germany at $217 and $211 per compromised record…If a healthcare organization has a breach, the average cost could be as high as $363.”
Time and Financial Impact on Organizations
Ponemon Institute’s 2015
®
26© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Secure Environment
Perimeter Security• Physical security, Firewalls, IDS/IPS…
Volume/File-level Encryption• Control over data access• Meeting regulatory compliance…
Aren’t these enough?YOU NEED BOTH…AND *MORE
®
27© 2016 MapR Technologies | © 2016 Dataguise, Inc..
PHI: Guidance for Data De-Identification
Sensitive/Privacy Data• Name• Address• Dates – Birth, Death...• Telephone Numbers• Device Identifiers and Serial Numbers• Email Addresses• SSN• Medical Record Numbers• Account Numbers….….
®
28© 2016 MapR Technologies | © 2016 Dataguise, Inc..
What Should We Do?
At a Granular (cell) Level:• Precisely locate sensitive content across ALL repositories• Protect those assets appropriately – masking, encryption• Provide “controlled” access to data• Enable employees, trusted partners to make data-driven decisions
RISKS
BREACH
SECURITY
COMPLIANCE
VALUE
REVENUE
DATA DRIVEN DECISIONS
BUSINESS INTELLIGENCE
®
29© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecureDETECTWhere sensitive content is present in structured, unstruct. & semi-structured data
AUDITWho has access to which sensitive data & identify misalignments and risk factors
PROTECTSensitive data at the element level –encrypt/decrypt with RBAC mask or redact
MONITORBased on alert policies, track sensitive data access through a 360°dashboard
®
30© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecureDETECTWhere sensitive content is present in structured, unstruct. & semi-structured data
AUDITWho has access to which sensitive data & identify misalignments and risk factors
PROTECTSensitive data at the element level –encrypt/decrypt with RBAC mask or redact
MONITORBased on alert policies, track sensitive data access through a 360°dashboard
Across Hadoop, RDBMS, Files, NoSQL DB
®
31© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecureOn Premise, in the Cloud, or Hybrid
DETECTWhere sensitive content is present in structured, unstruct. & semi-structured data
AUDITWho has access to which sensitive data & identify misalignments and risk factors
PROTECTSensitive data at the element level –encrypt/decrypt with RBAC mask or redact
MONITORBased on alert policies, track sensitive data access through a 360°dashboard
Across Hadoop, RDBMS, Files, NoSQL DB
®
32© 2016 MapR Technologies | © 2016 Dataguise, Inc..© 2016 MapR Technologies | © 2016 Dataguise, Inc.
How do we do that in DgSecure?
®
33© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Complex Sensitive Data DetectionSENSITIVE DATA DISCOVERY FOR COMPLEX
ENVIRONMENTS
Patterns in “Strings”• Digit Patterns: 4451 3340 0023 1200 8/16 B7127157
Expires 04-19-15
Patterns in “Grammar”• August Thomson vs
1240 August Ave vs 12 August 1994
Patterns in Context (Dependent)• Other data elements in horizontal or vertical vicinity
‘94538’ near address elements
Patterns in Combination (Composite)• CCN & Name, CCN, Name, Expiry not just CCN
Patterns in Knowledge• Ontologies HL7 Encoding, Financial Market Data
DISCOVERY FOR:
Data at Rest• Hadoop (HDFS)• DBMS• Teradata• Files• SharePoint
Data in Motion• Flume (into HDFS)• FTP (into HDFS or between file
systems)• Scoop (into HDFS)• Kafka (Q3 2016)
®
34© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Sensitive Data Protection
Masking• Obfuscation, one-way operation• Multiple options in DgSecure – fictitious but realistic values, X’ing out part of the content…• Consistent masking to retain statistical distribution of dataEncrpytion• Encrypted cell/row• Accessible by authorized users only – Hive, bulk, via App• Granular protectionRedaction• X’ing out entire sensitive data cell• Nullifying
Masking & Encryption in Hadoop
®
35© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Masking multiple Options - ExamplesMasking Option Applied Original Value Masked Value
Telephone – Random- Realistic, fictitious -
(508) 850-0058 (325) 418-0131
Telephone – Character- Hide digits -
(508) 850-0058 XXX-XXX-0058
Telephone – Intellimask- Replace first 3 digits -
(508) 850-0058 (451) 850-0058
Telephone – FPM - Format Preserving- Replace char & Digits with same type -
(508) 850-0058508 850 0058508-850-0058
(729) 432-9647729 432 9647729-432-9647
Telephone – Static Masking- Replace all with (111) 222-3333 -
(508) 850-0058 (111) 222-3333
®
36© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Unstructured Data – Any Sensitive Elements?
RAWDATA
®
38© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Masking Data in Hadoop (Cell Level)
MASKEDDATA
®
39© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Encrypting Data in Hadoop (Cell Level)
MASKEDDATAENCRYPTEDDATA
®
41© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Decryption through Hive QueriesUser WITHOUT access privileges for Names and SSN
®
42© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Decryption through Hive QueriesUser WITH access privileges for Names and SSN
®
43© 2016 MapR Technologies | © 2016 Dataguise, Inc..
BI Use Cases and Sensitive ElementsBrand SentimentLog AnalysisCustomer RetentionClinical Trial AnalysisPayments Risk Mgmt.Trading System Perf.Risk ModelingSupply Chain Optimization
Smart MeteringInsurance PremiumsProcess EfficiencyPerson of Interest DiscoveryDynamic PricingIT Security IntelligenceReal-time UpsellMonitoring Sensors
Analytic
Transactional
NameAddressEmail AddressCustomer Lifetime ValueIP AddressURLMedical Record NumberSocial Security Number
Telephone NumberDate of Birth (DOB)IP AddressesCredit Card NumberCredit LimitPurchase AmountVINDevice ID
®
44© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Protection Policy: Encryption, MaskingBrand SentimentLog AnalysisCustomer RetentionClinical Trial AnalysisPayments Risk Mgmt.Trading System Perf.Risk ModelingSupply Chain Otimization
Smart MeteringInsurance PremiumsProcess EfficiencyPerson of Interest DiscoveryDynamic PricingIT Security IntelligenceReal-time UpsellMonitoring Sensors
Analytic
Transactional
NameAddressEmail AddressCustomer Lifetime ValueIP AddressURLMedical Record NumberSocial Security Number
Telephone NumberDate of Birth (DOB)Medical Test ResultsCredit Card NumberCredit LimitPurchase AmountVINDevice IDTransaction Date
Mask
Encrypt
®
45© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Protection Policy: Encryption, MaskingBrand SentimentLog AnalysisCustomer RetentionClinical Trial AnalysisPayments Risk Mgmt.Trading System Perf.Risk ModelingSupply Chain Otimization
Smart MeteringInsurance PremiumsProcess EfficiencyPerson of Interest DiscoveryDynamic PricingIT Security IntelligenceReal-time UpsellMonitoring Sensors
Analytic
Transactional
NameAddressEmail AddressCustomer Lifetime ValueIP AddressURLMedical Record NumberSocial Security Number
Telephone NumberDate of Birth (DOB)Medical Test ResultsCredit Card NumberCredit LimitPurchase AmountVINDevice IDTransaction Date
Mask
Encrypt
®
46© 2016 MapR Technologies | © 2016 Dataguise, Inc..© 2016 MapR Technologies | © 2016 Dataguise, Inc.
DgSecure Solution Workflow
®
47© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecure for Hadoop: Policy
DETECT AUDIT PROTECT REPORT
• Policy• Per Data Feed?• Protection Options
• Custom Elements• Singleton• Composite• Dependent
• Domain Definition• Key Management
®
48© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecure for Hadoop: Detection
In-FlightWithin HDFSFull vs. IncrementalStructured, Semi, UnstructuredQuick ScanElement Count
DETECT AUDIT PROTECT REPORT
®
49© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecure for Hadoop: Access Audit
In-FlightWithin HDFSFull vs. IncrementalStructured, Semi, UnstructuredQuick ScanElement Count
Files/Directories- Sensitive Elements- Protected?- Who has access?
Users- What can theyaccess?
DETECT AUDIT PROTECT REPORT
®
50© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecure for Hadoop: Protection
In-FlightWithin HDFSFull vs. IncrementalStructured, Semi, UnstructuredQuick ScanElement Count
Files/Directories- Sensitive Elements- Protected?- Who has access?
Users- What can theyaccess?
Domain BasedMaskingRedactionEncryption
- Field or Record- AES or FPE
DETECT AUDIT PROTECT REPORT
®
51© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecure for Hadoop: Reports
In-FlightWithin HDFSFull vs. IncrementalStructured, Semi, UnstructuredQuick ScanElement Count
Files/Directories- Sensitive Elements- Protected?- Who has access?
Users- What can theyaccess?
Domain BasedMaskingRedactionEncryption
- Field or Record- AES or FPE
Job Level- Sensitive elements- Directories & Files- Remediation applied
Dashboard- Directory or by policy- Drill-down
Audit report- User actions
Notifications
DETECT AUDIT PROTECT REPORT
®
52© 2016 MapR Technologies | © 2016 Dataguise, Inc..© 2016 MapR Technologies | © 2016 Dataguise, Inc.
DgSecure Monitor
®
53© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecure Monitor
Precisely Focused on Monitoring Sensitive Data• Where are the sensitive content and how many (density)• How is it protected• What data is accessed• Who is accessing itAcross All Enterprise Repositories• Hadoop and Cassandra• Cloud support (AWS S3 and Azure Blob)Continuous, Near-real-time Anomaly Behavior Detection• Using maching learning to build user profile• Complex event processing to detect breach“Out of the Box” Templates
®
54© 2016 MapR Technologies | © 2016 Dataguise, Inc..
DgSecure Monitor
NoSQL
ON PREMISE
Sensitive Info
RDBMS
Hadoop
DgSECURECLOUD
DATASTORES
S3RDBMS
BlobStorageHadoop
DgSecureRepository
Monitoring Metadata
Monitoring Metadata Manager
Detection
Data Access Information
Monitoring Engine
®
55© 2016 MapR Technologies | © 2016 Dataguise, Inc..© 2016 MapR Technologies | © 2016 Dataguise, Inc.
Secure Business WorkflowEnterprise Data Marketplace Use Case
®
56© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
Multiple Data Feeds with their own PoliciesData Asset Marketplace: Data Assets (Indexed)Access Granted upon Request per policy & compliance
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
®
57© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
CISO/CPO:Setpolicyperdata
feedtype
®
58© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
DataAssetOwner:Provenancemetadata
®
59© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
RunDiscovery todetectsensitivedata
Metadatatorepository
Mask/Encrypt toprotectsensitivedata
Metadataincl.lineagetorepository
®
60© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
IT/SetProcess:UseMetadatatoset
accesscontrol
®
61© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
DataAssetowneraddsannotations&addstoDataAsset
Index
®
62© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
DataScientistbrowsesavailabledatasetsandmakes
accessrequest
®
63© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
Dataownerapproves requestSetsaccesscontrol
inRanger
®
64© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
DataScientistrunsdata
mining/BI/Analytics
®
65© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
DataScientistrunsdata
mining/BI/Analytics
Other Data Sources
®
66© 2016 MapR Technologies | © 2016 Dataguise, Inc..
Data Marketplace End-to-End Workflow
1SOURCES LANDING ZONE DATA PROCESS
COLLECT METADATA ANNOTATE & PUBLISH BROWSE & REQUEST APPROVE ANALYZE & REPORTData Admin Data Scientist Data Admin Data Scientist
Set policy per feed
Data Lake
Data Feed 1
Data Feed 2
Data Feed 3
Data Feed 4
Set Access Control Metadata Repository
Q1 Region 1 Data Set 1Q2 Region 2 Data Set 2Q3 Region 3 Data Set 3Q4 Region 4 Data Set 4
Access Given
Access Denied
WORKFLOWSECURE BUSINESS EXECUTION1 2 3
4 5 6 7 8
Other Data Sources
®
67© 2016 MapR Technologies | © 2016 Dataguise, Inc..
MapR + Dataguise: Comprehensive Data Security
ActiveDirectory
Disk
Auditing
IncidentResponseAuthentication
Authorization
Data Protection
Data Protection
ComplianceVulnerability Management