how to build a smart data lake using semantics
TRANSCRIPT
©2015 Cambridge Semantics Inc. All rights reserved.
How to Build a Smart Data Lake™ Using Semantics
Presenters:Marty Loughlin, Vice President
Curt Wright, Director EngineeringCambridge Semantics Contact:Marty LoughlinVice PresidentCambridge Semantics141 Tremont St., 6th Floor, Boston, [email protected](o) 617.855.9565
©2015 Cambridge Semantics Inc. All rights reserved. Page 2
Introduction to Cambridge Semantics Inc (CSI)
The Anzo Smart Data Platform is used to create data analytics and management solutions with diverse data from varied sources
Company: Founded in 2007 by senior team from IBM’s Advanced Internet Technology Group Complemented by database team responsible for Netezza & ParAccel
Software: Market leading Anzo software suite is built on open Semantic Web standards Currently 3rd generation of the product in production use
Business Intelligence / Analytics Solutions
2013(Winner) 2014(Finalist)2015(Finalist)
2014 Innovation Showcase
©2015 Cambridge Semantics Inc. All rights reserved. Page 3
The State of the Data Lake
• Great way to rapidly and inexpensively assemble large volumes of unfiltered data
• However, challenging to identify and link data
• Getting value requires harmonization of meaning across diverse sources and making it accessible to business users
• And, you also need good data governance, quality, lineage and security
Leading organizations are looking to Semantic Models and Tools to address these challenges
Source: 2015 EDM Council Benchmarking Study
©2015 Cambridge Semantics Inc. All rights reserved. Page 4
Why Semantic Models & Tools?
• Semantic models enable harmonization of data through common business meaning
• Model driven transformation enables active meta data management
• Business analyst tools for cataloging, harmonizing, transforming, discovering and analyzing data
• Agile, flexible and adaptable to new requirements
©2015 Cambridge Semantics Inc. All rights reserved. Page 5
Anzo Analytics and Data Integration Suite
ReferenceData
HoldingsData
TradingData
Source Data
Map & Load Link & Query
©2015 Cambridge Semantics Inc. All rights reserved. Page 6
• Business understandable models describe data and transformations
• Searchable Catalog of Data Sources, Maps & Metadata
• Query model for data lineage, impact analysis, data quality
Anzo Smart Data Lake
Anzo Smart Data Integration Server
Anzo Enterprise Server
• Standardized reports and self-service data discovery for diverse use cases
• Data curation, annotation and application workflow
Anzo Graph Query Engine
• Load, transform and harmonize diverse internal and external data sources
• Link to business meaning (e.g., FIBO)
Data Store
Third party BI/Analytics
Data ProvidersStructured Sources Unstructured Sources
©2015 Cambridge Semantics Inc. All rights reserved. Page 7
Build a Smart Data Lake with Anzo Smart Data Integration Server
• Register sources & capture configurable metadata• Automatically retrieve/create schema and sample data• Supports databases (JDBC), CSV, TSV, Hadoop HDFS, RDF
• Smart mapper based on familiar Excel interface• Supports mapping and transformation
• Create jobs by combining maps• Generate code for Apache Spark, Informatica, Pentaho• Ingest data or transform in lake
• Self-service data catalog and in-memory analytics• Data Lineage• Configurable model driven reports and analytics
1. Catalog Sources
2. Map Data
3. Load, Link, Transform
4. Self-Service Analytics
©2015 Cambridge Semantics Inc. All rights reserved. Page 8
Click here to watch the webinar