how to build a smart data lake using semantics

8
©2015 Cambridge Semantics Inc. All rights reserved. How to Build a Smart Data Lake™ Using Semantics Presenters: Marty Loughlin, Vice President Curt Wright, Director Engineering Cambridge Semantics Contact: Marty Loughlin Vice President Cambridge Semantics 141 Tremont St., 6 th Floor, Boston, MA www.cambridgesemantics.com [email protected] (o) 617.855.9565

Upload: cambridge-semantics

Post on 09-Jan-2017

480 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: How to Build a Smart Data Lake Using Semantics

©2015 Cambridge Semantics Inc. All rights reserved.

How to Build a Smart Data Lake™ Using Semantics

Presenters:Marty Loughlin, Vice President

Curt Wright, Director EngineeringCambridge Semantics Contact:Marty LoughlinVice PresidentCambridge Semantics141 Tremont St., 6th Floor, Boston, [email protected](o) 617.855.9565

Page 2: How to Build a Smart Data Lake Using Semantics

©2015 Cambridge Semantics Inc. All rights reserved. Page 2

Introduction to Cambridge Semantics Inc (CSI)

The Anzo Smart Data Platform is used to create data analytics and management solutions with diverse data from varied sources

Company: Founded in 2007 by senior team from IBM’s Advanced Internet Technology Group Complemented by database team responsible for Netezza & ParAccel

Software: Market leading Anzo software suite is built on open Semantic Web standards Currently 3rd generation of the product in production use

Business Intelligence / Analytics Solutions

2013(Winner) 2014(Finalist)2015(Finalist)

2014 Innovation Showcase

Page 3: How to Build a Smart Data Lake Using Semantics

©2015 Cambridge Semantics Inc. All rights reserved. Page 3

The State of the Data Lake

• Great way to rapidly and inexpensively assemble large volumes of unfiltered data

• However, challenging to identify and link data

• Getting value requires harmonization of meaning across diverse sources and making it accessible to business users

• And, you also need good data governance, quality, lineage and security

Leading organizations are looking to Semantic Models and Tools to address these challenges

Source: 2015 EDM Council Benchmarking Study

Page 4: How to Build a Smart Data Lake Using Semantics

©2015 Cambridge Semantics Inc. All rights reserved. Page 4

Why Semantic Models & Tools?

• Semantic models enable harmonization of data through common business meaning

• Model driven transformation enables active meta data management

• Business analyst tools for cataloging, harmonizing, transforming, discovering and analyzing data

• Agile, flexible and adaptable to new requirements

Page 5: How to Build a Smart Data Lake Using Semantics

©2015 Cambridge Semantics Inc. All rights reserved. Page 5

Anzo Analytics and Data Integration Suite

ReferenceData

HoldingsData

TradingData

Source Data

Map & Load Link & Query

Page 6: How to Build a Smart Data Lake Using Semantics

©2015 Cambridge Semantics Inc. All rights reserved. Page 6

• Business understandable models describe data and transformations

• Searchable Catalog of Data Sources, Maps & Metadata

• Query model for data lineage, impact analysis, data quality

Anzo Smart Data Lake

Anzo Smart Data Integration Server

Anzo Enterprise Server

• Standardized reports and self-service data discovery for diverse use cases

• Data curation, annotation and application workflow

Anzo Graph Query Engine

• Load, transform and harmonize diverse internal and external data sources

• Link to business meaning (e.g., FIBO)

Data Store

Third party BI/Analytics

Data ProvidersStructured Sources Unstructured Sources

Page 7: How to Build a Smart Data Lake Using Semantics

©2015 Cambridge Semantics Inc. All rights reserved. Page 7

Build a Smart Data Lake with Anzo Smart Data Integration Server

• Register sources & capture configurable metadata• Automatically retrieve/create schema and sample data• Supports databases (JDBC), CSV, TSV, Hadoop HDFS, RDF

• Smart mapper based on familiar Excel interface• Supports mapping and transformation

• Create jobs by combining maps• Generate code for Apache Spark, Informatica, Pentaho• Ingest data or transform in lake

• Self-service data catalog and in-memory analytics• Data Lineage• Configurable model driven reports and analytics

1. Catalog Sources

2. Map Data

3. Load, Link, Transform

4. Self-Service Analytics

Page 8: How to Build a Smart Data Lake Using Semantics

©2015 Cambridge Semantics Inc. All rights reserved. Page 8

Click here to watch the webinar