smart data lake or data landfill? the difference may be "semantic"

Post on 09-Jan-2017

582 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

www.ovum.com

© Copyright Ovum 2015. All rights reserved.

Finding order in chaos: Building smarter data lakes with semanticsSurya Mukherjee, Senior Analyst, Information Management

surya.mukherjee@ovum.com

@SuryaatOvum

2© Copyright Ovum 2015. All rights reserved.

Surya Mukherjee

• Leads Ovum’s analytics practice

• Keynote speaker at several global analytics

events and independent analytics thought-leader

• Advisor to numerous small to large enterprises

on analytics and data

• Independent product, vendor, and market

evaluator

• Experience in both working for and advising the

lines of business

3© Copyright Ovum 2015. All rights reserved.

‘Smart’ data lake or data landfill? The difference may be ‘semantic’

Data lakes are fast becoming a front-burner issue as early Hadoop adopters plan or consider implementation

They are attractive for several reasons –fixed schema independence, commodity hardware, economical alternative for archiving, cross platform insights

In principle, data lakes should be transparent, manageable, and governable, even if incrementally, without which organizations may be exposed to risks and lower ROI

There are many approaches from data platform and integration providers to making data lakes governable, each with positives and tradeoffs

The semantic approach, which is driven by taxonomies and ontologies, can be extremely helpful for industries such as financial services and healthcare

In today’s webinar, we explore the world of semantics and how it can be used to make your data lake ‘smart’

4© Copyright Ovum 2015. All rights reserved.

Agenda

Data Lake enters the enterprise agenda

Data landfill versus a ‘smart’ data lake

Key components to a smart data lake

The semantic approach to data lakes

Recommendations for enterprises

5© Copyright Ovum 2015. All rights reserved.

Data Lake enters the enterprise agenda

Everyone’s taking about data lakes, because:

Hypermarket for all data types, speeds, and sizes

No need for joining everything now

Batch, real-time, or in-betweens

Expert/scientists, data analysts, business users

Cost

Re-use of skills and software

Not only for web-scale companies!

6© Copyright Ovum 2015. All rights reserved.

Many audiences, one lake

7© Copyright Ovum 2015. All rights reserved.

But what makes a lake, a lake?

Ovum's definition of a data lake is a governed, tagged, workable repository that becomes the default ingest point for raw data.

We strongly believe that without governance, a data store – structured, unstructured, or both, cannot be called a data lake and is better labelled a data swamp or landfill.

Our requirements for data lakes are therefore more stringent than many others who classify any solution that can store multi-structured data as a data lake.

8© Copyright Ovum 2015. All rights reserved.

Data landfill versus smart data lakeStages of Hadoop adoption

9© Copyright Ovum 2015. All rights reserved.

Key components to a “smart” data lake

Analysis-ready

10© Copyright Ovum 2015. All rights reserved.

The semantic approach to data lakes

What is it?

Founded by the W3C for the word wide web

Primarily three technical standards: RDF (Resource Description Framework). SPARQL (SPARQL Protocol and RDF Query Language) OWL (Web Ontology Language)

Subject Property Object

Darth Vader

IsAlso Anakin Skywalker

11© Copyright Ovum 2015. All rights reserved.

Relationship depiction in RDF

12© Copyright Ovum 2015. All rights reserved.

Benefits of a semantic approach

Creating linked and contextualized content that depicts inter relationships between data entities enabling deeper meaning, insight and action.

Shortening of time taken to massage data for analysis

Additional layer of metadata

Combined with technologies such as graph databases, easy to visualize relationships and explore data

Once data is meta-tagged, very easy to analyze, and extremely flexible

Mature security environment

Easier inventory management

Source/target based integration/operation

13© Copyright Ovum 2015. All rights reserved.

Recommendations for enterprises

Keep it business pain-point/ use-case focused

Start small, and grow

Get executive sponsorship early

Requires team efforts from both business and IT

Thank you!

©2015 Cambridge Semantics Inc. All rights reserved.

The Anzo Smart Data Platformfor Linking and Contextualizing

Large, Diverse Datasets

Cambridge Semantics Contact:Marty LoughlinVice President, Financial ServicesCambridge Semantics141 Tremont St., 6th Floor, Boston, MAwww.cambridgesemantics.commarty@cambridgesemantics.com(o) 617.855.9565

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 15

The Anzo Smart Data Platform

• An agile, end-to-end, platform for tackling diverse information challenges

• Link and contextualize information for search, analytics, visualization and collaboration

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 16

State Street Bank/D&B/EDM CouncilFIBO Solution Architecture

FrontArenaData

Dun &BradstreetData

Internal Data Sources

Map & Load (QA) Link & Query (Classification, analytics)

External Data Sources

Derivatives Data

Entity &Corp. Hierarchy

Data

Reports & Analytics

16

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 17

Load & operationalize FIBO in Anzo

Map data sources onto FIBO

Load, harmonize, QA and classify data

Configure analytic dashboards

1

2

3

4

Project Deliverables

©2015 Cambridge Semantics Inc. All rights reserved.

• Business understandable models describe data and transformations

• Searchable Catalog of Data Sources, Maps & Metadata

• Query model for data lineage, impact analysis, data quality

Anzo Smart Data Lake

Anzo Smart Data Lake Server

Anzo Enterprise Server

• Standardized reports and self-service data discovery for diverse use cases

• Data curation, annotation and application workflow

Anzo Graph Query Engine

• Load, transform and harmonize diverse internal and external data sources

• Link to business meaning (e.g., FIBO)

Data Store/File System

Third party BI/Analytics

Data ProvidersStructured Sources Unstructured Sources

©2015 Cambridge Semantics Inc. All rights reserved.

Click here to view the full webinar

top related