big data pipeline for analytics at scale @ fit cvut 2014

23
2014 GoodData Corporation. All Rights Reserved. GoodData – the Case Study #2: Big Data Pipeline for Analytics at Scale DB Technologies for Big Data @ FIT CVUT November 19 2014

Upload: jaroslav-gergic

Post on 01-Jul-2015

603 views

Category:

Data & Analytics


0 download

DESCRIPTION

The recent boom in big data processing and democratization of the big data space has been enabled by the fact that most of the concepts originated in the research labs of companies such as Google, Amazon, Yahoo and Facebook are now available as open source. Technologies such as Hadoop, Cassandra let businesses around the world to become more data driven and tap into their massive data feeds to mine valuable insights. At the same time, we are still at a certain stage of the maturity curve of these new big data technologies and of the entire big data technology stack. Many of the technologies originated from a particular use case and attempts to apply them in a more generic fashion are hitting the limits of their technological foundations. In some areas, there are several competing technologies for the same set of use cases, which increases risks and costs of big data implementations. We will show how GoodData solves the entire big data pipeline today, starting from raw data feeds all the way up to actionable business insights. All this provided as a hosted multi-tenant environment letting its customers to solve their particular analytical use case or many analytical use cases for thousands of their customers all using the same platform and tools while abstracting them away from the technological details of the big data stack.

TRANSCRIPT

Page 1: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

2014 GoodData Corporation. All Rights Reserved.

GoodData – the Case Study #2:Big Data Pipeline for Analytics at Scale

DB Technologies for Big Data @ FIT CVUTNovember 19 2014

Page 2: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

GoodData Corporation

Page 3: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

End to End, Analytics Platform as a Service

Data VisualizationTableau, Qlikview, Spotfire, etc.

Analytics Engine Cognos, Oracle, Business Objects, etc.

Data MartsMySQL, PostgreSQL, etc.

Data Warehouse Oracle, Teradata, Netezza, Microsoft, etc.

ETLInformatica, DataStage, Boomi, Snaplogic, etc.

InfrastructureServers, Storage, Networking, etc.

Traditional BI

Data Collaboration

Data Visualization

Analytics Engine

Data Marts

Data Warehouse

ELT / ETL

Infrastructure

Page 4: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

For Your CustomersPowered By GoodData Partner Program

for disruptive ISVs including Zendesk, Switchfly, and Phizzle

For Your BusinessDrive your business with your data.

Experience and accelerators for Social, Sales, Marketing, Yammer

One Platform. Two Markets.

Page 5: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Our Focus

Page 6: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

Our Customers

Page 7: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

What The End Users See...

Page 8: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

What The End Users See...

Page 9: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

What Is In The Box...

Page 10: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

End to End, Analytics Platform as a Service

Data VisualizationTableau, Qlikview, Spotfire, etc.

Analytics Engine Cognos, Oracle, Business Objects, etc.

Data MartsMySQL, PostgreSQL, etc.

Data Warehouse Oracle, Teradata, Netezza, Microsoft, etc.

ETLInformatica, DataStage, Boomi, Snaplogic, etc.

InfrastructureServers, Storage, Networking, etc.

Traditional BI

Data Collaboration

Data Visualization

Analytics Engine

Data Marts

Data Warehouse

ELT / ETL

Infrastructure

Page 11: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Platform Zoom-In

End to End, AnalyticsPlatform as a Service

Data Collaboration

Data Visualization

Analytics Engine

Data Marts

Data Warehouse

ELT / ETL

Infrastructure

Page 12: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Analytics Platform - The Data Pipeline

Page 13: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Let’s Start With The Outcome - The Insights

Page 14: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Let’s Start With The Outcome - The Insights

• User Experience○ Visual Appeal○ Ease of Use○ Performance

• Analytical Power• Many Data Sources

○ Need to cross analyze all of them○ Need to add/remove sources as needed

• Cost Efficiency○ Computational density allowed by multi-tenancy

Page 15: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Let’s Start With The Outcome - The Insights

● Analytical Engine / MAQL ● Exploration, Visualization

and Distribution Layer● Pluggable Database

Backends● 10s of GB up to TBs

Page 16: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Behind The Scenes - The Big Data Pipeline

• Large Data Throughput○ Close to Real-time Updates

• Many Data Sources○ Need to cross analyze all of them○ Need to add/remove sources as needed

• Agility○ Capture all data without knowing the analytical use case in advance

• Cost Efficiency○ Computational density allowed by multi-tenancy

Page 17: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Behind The Scenes - The Big Data Pipeline

• Big Data Store○ 100s of TBs per customer○ Persist All Incoming Data○ CSV, XML, JSON, ...

• Immutable○ Append Only○ Keep Ingestion History

• Technologies○ Amazon S3○ Cloud Files

Page 18: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Behind The Scenes - The Big Data Pipeline

• Agile Data Warehouse○ 10s of TBs per customer○ Relational Model○ Semi-Cleansed○ Complete History Captured

• Technologies○ HP Vertica○ GoodData BI Integration Services

Page 19: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Behind The Scenes - The Big Data Pipeline

• Combine Input Stage Data Sets○ Mapping, Cleansing

• Perform Data Transformations in Data Warehouse○ Benchmarking, Snapshotting, Sampling

• Generate Data Mart Input Data○ Data Warehouse : Data Mart relation is typically 1 : N○ 10s of thousands Data Marts in PbG (OEM) use case!

Page 20: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Behind The Scenes - The Big Data Pipeline

• GoodData BI Integration Services○ CloudConnect Runtime○ Ruby Runtime○ Data Integration Console

Over 2M ETL jobs per week!

Page 21: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

The Wrap-Up - The Big Data PipelineProgression Through:• Big Data Store• Data Warehouse• Data Marts

As a means to satisfy the end user:• User Experience• Analytical Power• Many Data Sources• Cost Efficiency

Page 22: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Questions?

Page 23: Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

GoodData Corporation. All Rights Reserved.

Thank you!