modern data architecture

16
Bramhope a modern data architecture for BI © the DataShed Limited 2015

Upload: ed-thewlis

Post on 17-Jul-2015

329 views

Category:

Technology


2 download

TRANSCRIPT

Bramhope

a modern data architecture for BI

© the DataShed Limited 2015

a (not so) long time ago in a galaxy far, far away…

The most complex system you had to handle was an old AS/400

Publishing data weekly was adequate for most users

Analysts didn’t really exist

Most queries took hours to run – but that was ok

© the DataShed Limited 2015

what the hell happened?

© the DataShed Limited 2015

data explosion

Source(s): CSC: http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode Gartner: http://www.gartner.com/technology/research/it-spending-forecast/

Growth between 2010 and 2020:

Data: 500%

Budget: 16%

$3.0

$4.0

$5.0

0

5

10

15

20

25

30

35

40

2010 2012 2013 2014 2015 2016 2017 2018 2020

Glo

bal

IT

Bu

dge

t ($

Tri

llio

n)

Glo

bal

Dat

a V

olu

me

(Zet

tab

ytes

)

Data Growth IT Budget Growth Expon. (Data Growth) Expon. (IT Budget Growth)

© the DataShed Limited 2015

help is at hand…

© the DataShed Limited 2015

Hadoop & Big Data toolsFirst incarnation in 2005

Highly-scalable data processing, based on a distributed file system (HDFS)

Ability to handle PB size workloads

Becoming more mature – including:

ANSI-SQL compliant data warehousing tools (Hive & Stinger.next)

Batch processing (Map Reduce/Tez, Pig)

Operations management (Ambari)

Security (Knox)

Governance (HCatalog)

© the DataShed Limited 2015

…or is it?

© the DataShed Limited 2015

too small?Small cluster: 5 – 50 nodes

Assuming a single node:

24GB RAM

Single socket quad core

4-6 2 TB SATA drives

Total storage ≈ 10 TB

How many of us need to process 10TB of data?

Source: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/content/conclusion.html

© the DataShed Limited 2015

if not Hadoop, then what?Big Data has driven innovation in both technology and tools.

If you can’t adopt the tools, you can still adopt some of the principles:

Design for scale-out, rather than up.

ELT vs ETL

Lambda data architecture

… amongst other things!

© the DataShed Limited 2015

prepare to scale out

Data Storage

Data Integration

Data Marts & Cubes

Business Intelligence Apps

Executives: DashboardsManagers & Stakeholders: Reports

Business/Data Analysts: Cubes & Direct Access

Specific, small data marts

CRM, ERP, Transactional System

ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)

Data Integration

Data Marts & Cubes

ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)

Data Integration

Data Marts & Cubes

ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)

Specific, small data marts

Specific, small data marts

Data Storage

Data Integration

Data Marts & Cubes

Business Intelligence Apps

Executives: DashboardsManagers & Stakeholders: Reports

Business/Data Analysts: Cubes & Direct Access

Data marts constructed on top of an EDW Cubes present views of this data to business users

ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)

CRM, ERP, Transactional System

© the DataShed Limited 2015

ELT vs ETL

Key considerations:

Metadata & data lineage

How real-time is real-time?

How long does it take you to get data to analysts?

How powerful is your presentation server?

Can you use both?

vs

Schema on read? Or schema on write?

© the DataShed Limited 2015

lambda data architecture

© the DataShed Limited 2015

…most importantly, think differently

© the DataShed Limited 2015

You don’t need big data to use Big Data tools

Example: Prediction.io (http://prediction.io/)

Open Source Machine Learning Server, utilizes Hadoop, HBase, Spark and ElasticSearch

© the DataShed Limited 2015

any questions?

© the DataShed Limited 2015

ed thewlistech director – the DataShed

@[email protected]

© the DataShed Limited 2015