denodo datafest 2016: the governed data lake – putting big data to work
Post on 14-Apr-2017
49 Views
Preview:
TRANSCRIPT
O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A
#DenodoDataFest
RAPID, AGILE DATA STRATEGIESFor Accelerating Analytics, Cloud, and Big Data Initiatives.
© 2016 Autodesk | Enterprise Information Services
The Governed Data Lake – Putting Big Data to Work
Mark Eaton
Enterprise Architect
DataFest 2016
© 2016 Autodesk | Enterprise Information Services
Challenge – Maximizing the Value of the Data LakeData in the lake is just data
© 2016 Autodesk | Enterprise Information Services 4
The Data Lake alone is not a panacea
Easy to put data
in
Wait! What about
my old data
warehouse?
Harder to access
and secure the
data
© 2016 Autodesk | Enterprise Information Services 5
Data Architecture
Structure and organization to your data lake
Logical Data Warehouse across big data and legacy data sources
Data Governance and Enterprise Access Point
Change control throughout the data architecture
Enterprise-level access controls – table, row, column
Build a Data Strategy!
Roadmap and information architecture to execute the strategy
The Governed Data Lake – Putting Big Data to work
© 2016 Autodesk | Enterprise Information Services
The Autodesk Agile Data Architecture
© 2016 Autodesk | Enterprise Information Services 7
Autodesk Data Architecture
© 2016 Autodesk | Enterprise Information Services 8
Why Build the Logical Data Warehouse Data virtualization can be used
throughout your data pipeline!
© 2016 Autodesk | Enterprise Information Services 9
Autodesk Big Data Ecosystem
© 2016 Autodesk | Enterprise Information Services
Data Governance and Enterprise Access PointEssential for Regulatory Compliance and Sensitive Data Handling
© 2016 Autodesk | Enterprise Information Services 11
Data Governance
Change control throughout the data architecture
Structure and organization to your data lake
Availability, usability, integrity, security…
Enterprise Access Point
Enterprise-level access controls – table, row, column
Named-user access only
Authorization entirely driven by LDAP roles
Audit all access
Data Governance and Enterprise Access Point
© 2016 Autodesk | Enterprise Information Services 12
Regulatory Controls
SOX, SOC, Geo, contractual…
Requires defensible data security, specifically physical isolation of data
Sensitive Data Handling
PII, PCI, current quarter financial data…
Obfuscate, mask or remove as to specific guidance
No data movement outside compliant environment
Subset data required for specific use cases
Leverage tools rather than eyeballs to vet models
Regulatory Controls and Sensitive Data Handling
© 2016 Autodesk | Enterprise Information Services
Building and Executing a Data Strategy“Though this be madness, yet there is method in ’t.”
Data Strategy
Leverage Data Deliver Value Build Insight
Product Usage
Contact, Account, Product,
Entitlement…
Product Adoption
Account Hierarchy Insights
Product Nurture
Customer Retention
Account/Contact
Enrichment
Customer Interaction
Marketing Optimization
???
Can we aggregate the data?
Can we build canonical
representations of this data?
How can we
leverage this data?
Are we listening to
everything from our
customers?
Is our business growing?
Are our customers
succeeding?
Are we competing
successfully?
How are we doing on our
marketing spend?
How well do we know the
business of our customers?
???
“Leverage our data assets to deliver tangible business value en route to greater business insight”
© 2016 Autodesk | Enterprise Information Services 15
Data Strategy Execution – the Roadmap
© 2016 Autodesk | Enterprise Information Services 16
Data Strategy Execution – the Information Architecture
Identify enterprise data sources
Harder than you think
Highly-available ingestion
mechanism
Self-service or nearly so
Stream-based facilitates batch and
streaming data processing
Leverage highly-redundant cloud
storage for the data lake
e.g. S3
Leverage best-of breed for individual
components
Open source, selected commercial
vendors
Develop canonical representations and
derivations for your data sets
Freakin’ hard!
Build the Governed Data Lake
Use data virtualization to span native
big data and legacy applications
© 2016 Autodesk | Enterprise Information Services 17
Architecting the Data Virtualization Layer
Corporate
LDAP
Data Consumer
Data Sources
Data
DV Instance 1
Source
Repository
Code
Logging Infrastructure
Da
ta
Audit
Audit
CI/CD
DV Instance n…
© 2016 Autodesk | Enterprise Information Services 18
Build an Information Architecture
Base views to abstract data sources
Layered derived views to reflect successively refined
derivations
Create the notion of publication for curated, externally
visible views
Expose services on top of views to make views more
accessible
Separate namespaces (schemas) by project or
subject area
Build the notion of commonality for views shared
across schemas
Naming conventions for all objects
Data portal for one-stop shopping for data consumers
© 2016 Autodesk | Enterprise Information Services 19
Autodesk Information Architecture
© 2016 Autodesk | Enterprise Information Services 20
The Logical Data Warehouse is an essential
component of the Governed Data Lake
top related