apache atlas. data governance for hadoop. strata london 2015
TRANSCRIPT
Data Governance Technology
TransparencyReproducibilityAuditabilityConsistency
ETL/DQ
BPM
Business Analytics
Visualization& Dashboards
ERP
CRM SCM
MDM
ARCHIVE
Common Governance Framework
Use Cases
Financial ReportingChain of custody, Lineage narratives
Healthcare30 day measures reporting
RetailPoint of sale analysis, Price optimization
TelcoDevice log management, Correlation, Analysis & Mitigation
Atlas: Capabilities
● Data Classification● Metadata Exchange● Centralized Auditing● Search & Lineage● Policy Engine● Security
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag Based Policies
Data Lifecycle Management
Real Time Tag Based Access Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Energy
PPDM
Retail
PCIPII
Other
CWM
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Energy
PPDM
Retail
PCIPII
Other
CWM
Certification
● Metadata exchange● Stability● Interoperability
○ Low cost to switch● Fosters innovation
DiscoveryTagging
Prep / Cleanse
ETL
GovernanceBPM
Self Service
Visualization
Atlas: Knowledge Store
Metadata exchangeFlexible Taxonomy
● Data sets/objects● Tables/Columns● Logical Context● Source/Destination
Tech: Titan with HBase● PluggableApache Atlas
Audit Store
Policy Engine
Data Lifecycle Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Energy
PPDM
Retail
PCIPII
Other
CWM
Knowledge Store
ModelsType-System
Policy RulesTaxonomies
Atlas: Data Lifecycle Management
Focus on:● Provenance● Replication● Data retention/eviction● Late data handling● Automation
Tech: Falcon
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomiesPolicy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Custom
CWM
Retail
PCIPII
Other
Data Lifecycle Management
Other
CWM
Energy
PPDM
Atlas: Audit Store
Historical repository● Security & Operational● Indexed● Searchable (DSL)
Tech:● YARN ATS, HBase, Hive● Solr, ElasticSearch
○ PluggableApache Atlas
Knowledge Store
ModelsType-System
Policy RulesTaxonomiesPolicy Engine
Data Lifecycle Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Custom
CWM
Retail
PCIPII
Other
Audit Store
Other
CWM
Energy
PPDM
Atlas: Policy Engine
Metadata drivenRationalized at runtimeGeo/Time based rulesProhibitions
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Taxonomies
Data Lifecycle Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Custom
CWM
Retail
PCIPII
Other
Policy RulesPolicy Engine
Security
Other
CWM
Energy
PPDM
Atlas: Security
Enforces policiesMetadata drivenABAC (not simple RBAC)● Attribute-based access control
Tech: Ranger
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Taxonomies
Data Lifecycle Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Custom
CWM
Retail
PCIPII
Other
Policy RulesPolicy Engine
Security
Other
CWM
Energy
PPDM
Atlas: RESTful Interface
API everything
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomiesPolicy Engine
Data Lifecycle Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Energy
PPDM
Retail
PCIPII
Other
CWM
Atlas: Metadata Exchange
MetadataMetadataMetadata
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomiesPolicy Engine
Data Lifecycle Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Energy
PPDM
Retail
PCIPII
Other
CWM
MVP: ASF Incubated
● Rest API● UI● Centralized Taxonomy● Import / Export Metadata● Documentation
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Energy
PPDM
Retail
PCIPII
Other
CWM
2015 mid-year GA
● Policy Rules Engine● Real-time Access Control● Column Level Tagging● Audit Store
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Energy
PPDM
Retail
PCIPII
Other
CWM
2015 2H
● Enhanced Audit Store○ Immutable File Format○ Event Metadata Tagging○ Advanced Reporting
● Advanced Policy Engine● Row / Column Masking● 3rd Party Metadata Exchange
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA HL7
Financial
SOXDodd-Frank
Energy
PPDM
Retail
PCIPII
Other
CWM