denodo datafest 2016: the governed data lake – putting big data to work

20
OCTOBER 18,2016 SAN FRANCISCO BAY AREA, CA #DenodoDataFest RAPID, AGILE DATA STRATEGIES For Accelerating Analytics, Cloud, and Big Data Initiatives.

Upload: denodo

Post on 14-Apr-2017

49 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A

#DenodoDataFest

RAPID, AGILE DATA STRATEGIESFor Accelerating Analytics, Cloud, and Big Data Initiatives.

Page 2: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services

The Governed Data Lake – Putting Big Data to Work

Mark Eaton

Enterprise Architect

DataFest 2016

Page 3: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services

Challenge – Maximizing the Value of the Data LakeData in the lake is just data

Page 4: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 4

The Data Lake alone is not a panacea

Easy to put data

in

Wait! What about

my old data

warehouse?

Harder to access

and secure the

data

Page 5: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 5

Data Architecture

Structure and organization to your data lake

Logical Data Warehouse across big data and legacy data sources

Data Governance and Enterprise Access Point

Change control throughout the data architecture

Enterprise-level access controls – table, row, column

Build a Data Strategy!

Roadmap and information architecture to execute the strategy

The Governed Data Lake – Putting Big Data to work

Page 6: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services

The Autodesk Agile Data Architecture

Page 7: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 7

Autodesk Data Architecture

Page 8: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 8

Why Build the Logical Data Warehouse Data virtualization can be used

throughout your data pipeline!

Page 9: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 9

Autodesk Big Data Ecosystem

Page 10: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services

Data Governance and Enterprise Access PointEssential for Regulatory Compliance and Sensitive Data Handling

Page 11: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 11

Data Governance

Change control throughout the data architecture

Structure and organization to your data lake

Availability, usability, integrity, security…

Enterprise Access Point

Enterprise-level access controls – table, row, column

Named-user access only

Authorization entirely driven by LDAP roles

Audit all access

Data Governance and Enterprise Access Point

Page 12: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 12

Regulatory Controls

SOX, SOC, Geo, contractual…

Requires defensible data security, specifically physical isolation of data

Sensitive Data Handling

PII, PCI, current quarter financial data…

Obfuscate, mask or remove as to specific guidance

No data movement outside compliant environment

Subset data required for specific use cases

Leverage tools rather than eyeballs to vet models

Regulatory Controls and Sensitive Data Handling

Page 13: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services

Building and Executing a Data Strategy“Though this be madness, yet there is method in ’t.”

Page 14: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

Data Strategy

Leverage Data Deliver Value Build Insight

Product Usage

Contact, Account, Product,

Entitlement…

Product Adoption

Account Hierarchy Insights

Product Nurture

Customer Retention

Account/Contact

Enrichment

Customer Interaction

Marketing Optimization

???

Can we aggregate the data?

Can we build canonical

representations of this data?

How can we

leverage this data?

Are we listening to

everything from our

customers?

Is our business growing?

Are our customers

succeeding?

Are we competing

successfully?

How are we doing on our

marketing spend?

How well do we know the

business of our customers?

???

“Leverage our data assets to deliver tangible business value en route to greater business insight”

Page 15: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 15

Data Strategy Execution – the Roadmap

Page 16: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 16

Data Strategy Execution – the Information Architecture

Identify enterprise data sources

Harder than you think

Highly-available ingestion

mechanism

Self-service or nearly so

Stream-based facilitates batch and

streaming data processing

Leverage highly-redundant cloud

storage for the data lake

e.g. S3

Leverage best-of breed for individual

components

Open source, selected commercial

vendors

Develop canonical representations and

derivations for your data sets

Freakin’ hard!

Build the Governed Data Lake

Use data virtualization to span native

big data and legacy applications

Page 17: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 17

Architecting the Data Virtualization Layer

Corporate

LDAP

Data Consumer

Data Sources

Data

DV Instance 1

Source

Repository

Code

Logging Infrastructure

Da

ta

Audit

Audit

CI/CD

DV Instance n…

Page 18: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 18

Build an Information Architecture

Base views to abstract data sources

Layered derived views to reflect successively refined

derivations

Create the notion of publication for curated, externally

visible views

Expose services on top of views to make views more

accessible

Separate namespaces (schemas) by project or

subject area

Build the notion of commonality for views shared

across schemas

Naming conventions for all objects

Data portal for one-stop shopping for data consumers

Page 19: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 19

Autodesk Information Architecture

Page 20: Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

© 2016 Autodesk | Enterprise Information Services 20

The Logical Data Warehouse is an essential

component of the Governed Data Lake