apache atlas. data governance for hadoop. strata london 2015

24
Apache Atlas Data Governance for Hadoop Sean Roberts Partner Engineering London & EMEA @seano

Upload: sean-roberts

Post on 09-Aug-2015

633 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Apache AtlasData Governance

for Hadoop

Sean RobertsPartner Engineering

London & EMEA@seano

Data Governance

AvailabilityUsabilityIntegritySecurity

Data Governance Technology

TransparencyReproducibilityAuditabilityConsistency

ETL/DQ

BPM

Business Analytics

Visualization& Dashboards

ERP

CRM SCM

MDM

ARCHIVE

Common Governance Framework

Use Cases

Financial ReportingChain of custody, Lineage narratives

Healthcare30 day measures reporting

RetailPoint of sale analysis, Price optimization

TelcoDevice log management, Correlation, Analysis & Mitigation

Challenges in Hadoop ecosystem

Ecosystem

No holistic approach

Business Demand

Apache AtlasData Governance

for Hadoop

Open & co-development with users!

wiki.apache.org/incubator/AtlasProposal

Apache Atlas

Atlas: Capabilities

● Data Classification● Metadata Exchange● Centralized Auditing● Search & Lineage● Policy Engine● Security

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Policy RulesTaxonomies

Tag Based Policies

Data Lifecycle Management

Real Time Tag Based Access Control

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Energy

PPDM

Retail

PCIPII

Other

CWM

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Policy RulesTaxonomies

Data Lifecycle Management

Policy Engine

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Energy

PPDM

Retail

PCIPII

Other

CWM

Certification

● Metadata exchange● Stability● Interoperability

○ Low cost to switch● Fosters innovation

DiscoveryTagging

Prep / Cleanse

ETL

GovernanceBPM

Self Service

Visualization

Apache Atlas Components

Atlas: Knowledge Store

Metadata exchangeFlexible Taxonomy

● Data sets/objects● Tables/Columns● Logical Context● Source/Destination

Tech: Titan with HBase● PluggableApache Atlas

Audit Store

Policy Engine

Data Lifecycle Management

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Energy

PPDM

Retail

PCIPII

Other

CWM

Knowledge Store

ModelsType-System

Policy RulesTaxonomies

Type System

ClassStructTraitPrimitives

Collections● Map● Array

Instances (Entity)● Referenceable

Type System

Atlas: Data Lifecycle Management

Focus on:● Provenance● Replication● Data retention/eviction● Late data handling● Automation

Tech: Falcon

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Policy RulesTaxonomiesPolicy Engine

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Custom

CWM

Retail

PCIPII

Other

Data Lifecycle Management

Other

CWM

Energy

PPDM

Atlas: Audit Store

Historical repository● Security & Operational● Indexed● Searchable (DSL)

Tech:● YARN ATS, HBase, Hive● Solr, ElasticSearch

○ PluggableApache Atlas

Knowledge Store

ModelsType-System

Policy RulesTaxonomiesPolicy Engine

Data Lifecycle Management

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Custom

CWM

Retail

PCIPII

Other

Audit Store

Other

CWM

Energy

PPDM

Atlas: Policy Engine

Metadata drivenRationalized at runtimeGeo/Time based rulesProhibitions

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Taxonomies

Data Lifecycle Management

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Custom

CWM

Retail

PCIPII

Other

Policy RulesPolicy Engine

Security

Other

CWM

Energy

PPDM

Atlas: Security

Enforces policiesMetadata drivenABAC (not simple RBAC)● Attribute-based access control

Tech: Ranger

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Taxonomies

Data Lifecycle Management

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Custom

CWM

Retail

PCIPII

Other

Policy RulesPolicy Engine

Security

Other

CWM

Energy

PPDM

Atlas: RESTful Interface

API everything

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Policy RulesTaxonomiesPolicy Engine

Data Lifecycle Management

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Energy

PPDM

Retail

PCIPII

Other

CWM

Atlas: Metadata Exchange

MetadataMetadataMetadata

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Policy RulesTaxonomiesPolicy Engine

Data Lifecycle Management

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Energy

PPDM

Retail

PCIPII

Other

CWM

Apache AtlasNow & Future

MVP: ASF Incubated

● Rest API● UI● Centralized Taxonomy● Import / Export Metadata● Documentation

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Policy RulesTaxonomies

Data Lifecycle Management

Policy Engine

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Energy

PPDM

Retail

PCIPII

Other

CWM

2015 mid-year GA

● Policy Rules Engine● Real-time Access Control● Column Level Tagging● Audit Store

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Policy RulesTaxonomies

Data Lifecycle Management

Policy Engine

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Energy

PPDM

Retail

PCIPII

Other

CWM

2015 2H

● Enhanced Audit Store○ Immutable File Format○ Event Metadata Tagging○ Advanced Reporting

● Advanced Policy Engine● Row / Column Masking● 3rd Party Metadata Exchange

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Policy RulesTaxonomies

Data Lifecycle Management

Policy Engine

Security

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Energy

PPDM

Retail

PCIPII

Other

CWM

Apache AtlasData Governance

for Hadoop

Sean Roberts@seano