smart data lake or data landfill? the difference may be "semantic"

19
www.ovum.com © Copyright Ovum 2015. All rights reserved. Finding order in chaos: Building smarter data lakes with semantics Surya Mukherjee, Senior Analyst, Information Management surya.mukherjee@ovum.com @SuryaatOvum

Upload: cambridge-semantics

Post on 09-Jan-2017

582 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

www.ovum.com

© Copyright Ovum 2015. All rights reserved.

Finding order in chaos: Building smarter data lakes with semanticsSurya Mukherjee, Senior Analyst, Information Management

[email protected]

@SuryaatOvum

Page 2: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

2© Copyright Ovum 2015. All rights reserved.

Surya Mukherjee

• Leads Ovum’s analytics practice

• Keynote speaker at several global analytics

events and independent analytics thought-leader

• Advisor to numerous small to large enterprises

on analytics and data

• Independent product, vendor, and market

evaluator

• Experience in both working for and advising the

lines of business

Page 3: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

3© Copyright Ovum 2015. All rights reserved.

‘Smart’ data lake or data landfill? The difference may be ‘semantic’

Data lakes are fast becoming a front-burner issue as early Hadoop adopters plan or consider implementation

They are attractive for several reasons –fixed schema independence, commodity hardware, economical alternative for archiving, cross platform insights

In principle, data lakes should be transparent, manageable, and governable, even if incrementally, without which organizations may be exposed to risks and lower ROI

There are many approaches from data platform and integration providers to making data lakes governable, each with positives and tradeoffs

The semantic approach, which is driven by taxonomies and ontologies, can be extremely helpful for industries such as financial services and healthcare

In today’s webinar, we explore the world of semantics and how it can be used to make your data lake ‘smart’

Page 4: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

4© Copyright Ovum 2015. All rights reserved.

Agenda

Data Lake enters the enterprise agenda

Data landfill versus a ‘smart’ data lake

Key components to a smart data lake

The semantic approach to data lakes

Recommendations for enterprises

Page 5: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

5© Copyright Ovum 2015. All rights reserved.

Data Lake enters the enterprise agenda

Everyone’s taking about data lakes, because:

Hypermarket for all data types, speeds, and sizes

No need for joining everything now

Batch, real-time, or in-betweens

Expert/scientists, data analysts, business users

Cost

Re-use of skills and software

Not only for web-scale companies!

Page 6: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

6© Copyright Ovum 2015. All rights reserved.

Many audiences, one lake

Page 7: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

7© Copyright Ovum 2015. All rights reserved.

But what makes a lake, a lake?

Ovum's definition of a data lake is a governed, tagged, workable repository that becomes the default ingest point for raw data.

We strongly believe that without governance, a data store – structured, unstructured, or both, cannot be called a data lake and is better labelled a data swamp or landfill.

Our requirements for data lakes are therefore more stringent than many others who classify any solution that can store multi-structured data as a data lake.

Page 8: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

8© Copyright Ovum 2015. All rights reserved.

Data landfill versus smart data lakeStages of Hadoop adoption

Page 9: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

9© Copyright Ovum 2015. All rights reserved.

Key components to a “smart” data lake

Analysis-ready

Page 10: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

10© Copyright Ovum 2015. All rights reserved.

The semantic approach to data lakes

What is it?

Founded by the W3C for the word wide web

Primarily three technical standards: RDF (Resource Description Framework). SPARQL (SPARQL Protocol and RDF Query Language) OWL (Web Ontology Language)

Subject Property Object

Darth Vader

IsAlso Anakin Skywalker

Page 11: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

11© Copyright Ovum 2015. All rights reserved.

Relationship depiction in RDF

Page 12: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

12© Copyright Ovum 2015. All rights reserved.

Benefits of a semantic approach

Creating linked and contextualized content that depicts inter relationships between data entities enabling deeper meaning, insight and action.

Shortening of time taken to massage data for analysis

Additional layer of metadata

Combined with technologies such as graph databases, easy to visualize relationships and explore data

Once data is meta-tagged, very easy to analyze, and extremely flexible

Mature security environment

Easier inventory management

Source/target based integration/operation

Page 13: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

13© Copyright Ovum 2015. All rights reserved.

Recommendations for enterprises

Keep it business pain-point/ use-case focused

Start small, and grow

Get executive sponsorship early

Requires team efforts from both business and IT

Thank you!

Page 14: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

©2015 Cambridge Semantics Inc. All rights reserved.

The Anzo Smart Data Platformfor Linking and Contextualizing

Large, Diverse Datasets

Cambridge Semantics Contact:Marty LoughlinVice President, Financial ServicesCambridge Semantics141 Tremont St., 6th Floor, Boston, [email protected](o) 617.855.9565

Page 15: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 15

The Anzo Smart Data Platform

• An agile, end-to-end, platform for tackling diverse information challenges

• Link and contextualize information for search, analytics, visualization and collaboration

Page 16: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 16

State Street Bank/D&B/EDM CouncilFIBO Solution Architecture

FrontArenaData

Dun &BradstreetData

Internal Data Sources

Map & Load (QA) Link & Query (Classification, analytics)

External Data Sources

Derivatives Data

Entity &Corp. Hierarchy

Data

Reports & Analytics

16

Page 17: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 17

Load & operationalize FIBO in Anzo

Map data sources onto FIBO

Load, harmonize, QA and classify data

Configure analytic dashboards

1

2

3

4

Project Deliverables

Page 18: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

©2015 Cambridge Semantics Inc. All rights reserved.

• Business understandable models describe data and transformations

• Searchable Catalog of Data Sources, Maps & Metadata

• Query model for data lineage, impact analysis, data quality

Anzo Smart Data Lake

Anzo Smart Data Lake Server

Anzo Enterprise Server

• Standardized reports and self-service data discovery for diverse use cases

• Data curation, annotation and application workflow

Anzo Graph Query Engine

• Load, transform and harmonize diverse internal and external data sources

• Link to business meaning (e.g., FIBO)

Data Store/File System

Third party BI/Analytics

Data ProvidersStructured Sources Unstructured Sources

Page 19: Smart Data Lake or Data Landfill? The Difference May Be "Semantic"

©2015 Cambridge Semantics Inc. All rights reserved.

Click here to view the full webinar