conflict in the cloud – issues & solutions for big data

23
Conflict in the Cloud Big data and cloud computing Keith Peterson, CEO Halo BI ©2014 Halo Business Intelligence | All Rights Reserved

Upload: halobi

Post on 08-Jul-2015

189 views

Category:

Software


4 download

DESCRIPTION

Halo BI CEO, Keith Peterson, presents at the 6th Annual Cloud Computing Conference - AITP San Diego: Conflict in the Cloud – Issues & Solutions for Big Data. Cloud services make money based on the volume of data stored in the cloud – and big data delivers that volume. But companies seeking to use big data are looking for economies of scale from the Cloud.

TRANSCRIPT

Page 1: Conflict in the Cloud – Issues & Solutions for Big Data

Conflict in the Cloud

Big data and cloud computingKeith Peterson, CEO

Halo BI

©2014 Halo Business Intelligence | All Rights Reserved

Page 2: Conflict in the Cloud – Issues & Solutions for Big Data

Starting points

• Information management and analytics issues hurting business objectives

• Taking days and weeks to get to the data

• Multiple copies of data around the organization

• No shared view of the truth

• ETL and data warehouse unable to handle loads

• BI and reporting eating up capacity

• Data volumes growing but budgets static

• Desire to leverage new machine data sources

©2014 Halo Business Intelligence | All Rights Reserved

Page 3: Conflict in the Cloud – Issues & Solutions for Big Data

The “Big Data” challenge (Executive View)…

© 2014 Halo Business Intelligence | All Rights Reserved

Page 4: Conflict in the Cloud – Issues & Solutions for Big Data

The “Big Data” challenge (Business View)…

© 2014 Halo Business Intelligence | All Rights Reserved

Big Data Is…

• Ever escalating volumes

• Expanding sources, such as Internet Of Things

• Increasingly high velocities

• With a widening variety of unstructured formats and semantic contexts

Big Data Is Not Really Useful

Unless insight can be gleaned through analytics...with a reasonable effort!

Page 5: Conflict in the Cloud – Issues & Solutions for Big Data

Three Strategies to Deal with Big Data

©2014 Halo Business Intelligence | All Rights Reserved

111 212 313

Ignore It Archive It Analyze It

Don’t jump on the

bandwagon

You have better things to

focus on

Just collect and store it

You can always analyze

it when resources free

up

With a clear business

problem and ROI

Invest in infrastructure to

derive the insights

needed

Page 6: Conflict in the Cloud – Issues & Solutions for Big Data

Big Data

©2014 Halo Business Intelligence | All Rights Reserved

Google 1 Trillion Web Pages per Year

Facebook 1 Million GB of Disk Storage

Yelp! 100 GB of log data per day

Youtube.com 20 Petabytes new video per year

.

.

Regional medical center – patient sensors 25 TB

Mid-market retailer – POS 10 TB

Mid-market manufacturer – machine sensors 6 TBhttp://www.google.com/trends/explore#q=%2Fm%2F04y7lrx%2C%20Amazon%20Aws%2C%20Rackspace&cmpt=q

Page 7: Conflict in the Cloud – Issues & Solutions for Big Data

Five Big Data Questions

©2014 Halo Business Intelligence | All Rights Reserved

111 212 313 414 515

Left Behind? Cloud? Data? Tools? Usefulness?

Everyone is doing

it…you need to.

Really?

Or will cost

exceed benefit?

Big Data requires Big

Compute

Outsourcing risks:

• Loss of Control

• Platform Reliability

• Privacy

• Security

Which data and

sources?

Too much to handle

Some or all?

All vendors have

a Big Data suite

Which one?

How to query the data

Skills needed

Machine Learning

Page 8: Conflict in the Cloud – Issues & Solutions for Big Data

Big Data and Cloud Computing

Commodity computing to execute distributed queries across multiple data sets

Rent commodity server instances to execute computation remotely

Cloud hosting for $10/TB/Mo

©2014 Halo Business Intelligence | All Rights Reserved

Page 9: Conflict in the Cloud – Issues & Solutions for Big Data

©2014 Halo Business Intelligence | All Rights Reserved

Traditional BI ArchitectureOn Premise Or Cloud

• Data Volumes = 100 GB – 5 TB

• Manageable on-premise or in the cloud

Operational DataData Warehouse

Page 10: Conflict in the Cloud – Issues & Solutions for Big Data

©2014 Halo Business Intelligence | All Rights Reserved

Add Big Data

Data volumes = 6 TB +

Big DataLogs

image

Cluster

Page 11: Conflict in the Cloud – Issues & Solutions for Big Data

©2014 Halo Business Intelligence | All Rights Reserved

Big Data in the CloudThe “Traditional” Approach

Data Platform

Traditional RDMSCommodity Storage

Client

Familiar BI Tools

MPP

SQLSSAS

Sharepoint

BI

Stream

Machine

Learning

Browser

SQOOP

HIVE ODBC

• Use Amazon

Redshift,

Azure

HDInsight or

similar

• Use Blob

storage to

persist big

data

• Spin up

Compute

clusters as

needed

• Keep Data in

Cloud

perpetually

Page 12: Conflict in the Cloud – Issues & Solutions for Big Data

Big Data Storage Costs

Sources:

http://calculator.s3.amazonaws.com/index.html

http://azure.microsoft.com/en-us/pricing/calculator/

As of Nov 2014

©2014 Halo Business Intelligence | All Rights Reserved

Provider TypeCost per TB

per yearCost per PB

per year

Amazon EBS SSD storage $ 1,229 $ 1,258,291

Amazon EBS Magnetic Storage $ 614 $ 629,145

Amazon S3 Storage $ 411 $ 420,372

Azure Tables & Queues $ 792 $ 811,302

Azure Blob Storage $ 288 $ 294,912

Page 13: Conflict in the Cloud – Issues & Solutions for Big Data

Hosting Considerations

• What if you host big data on-premise?

• Cost of managing hundreds of servers, expensive processing power

• Costs can be hidden in data center budget until too late

• What is your Big Data output?

• Beyond about 25 TB of data, cloud hosting costs become significant.

• Data Transfer costs must be considered as well

• Inbound is usually free

• Outbound can be $1,000’s per month

• Direct connect or physically ship

• For audit purposes, data may need to be kept for up to 7 years

• Factor this into your storage costs

• Location

• Will regulations impact ability to store or process on machines in different countries

© 2014 Halo Business Intelligence | All Rights Reserved

Page 14: Conflict in the Cloud – Issues & Solutions for Big Data

Big Data Considerations

Databases

• High speed analysis of transactional data

• Multi-step computations

• Interactive querying

• Lots of updates (adds/deletes/mods)

MapReduce HDFS

• Low cost storage and compute

• High performance queries on large data

• Complex data simple query

• Simple scaling

© 2014 Halo Business Intelligence | All Rights Reserved

Note: Ideas in this slide are borrowed and adapted from “Running, Managing, and Adapting Hadoop at Sears,” by Andy McNalis, Senior Manager,

Hadoop Infrastructure, Sears Holdings.

Page 15: Conflict in the Cloud – Issues & Solutions for Big Data

Cloud Considerations

• Big Data needs Big Compute

• Which cloud services will you choose?

• Time, effort and skills will vary considerably

• Microsoft Azure

• Amazon EC2

• Google Cloud Platform

• Verizon Cloud

• Rackspace

©2014 Halo Business Intelligence | All Rights Reserved

http://online.wsj.com/articles/little-space-remains-for-rackspace-ahead-

of-the-tape-1415557510

Page 16: Conflict in the Cloud – Issues & Solutions for Big Data

©2014 Halo Business Intelligence | All Rights Reserved

Big Data in the CloudThe “Traditional” Approach

Data Platform

Traditional RDMS

Commodity Storage Client

Familiar BI Tools

MPP

SQLSSAS

Sharepoint

BI

Stream

Machine

Learning

Browser

SQOOP

HIVE ODBC

Page 17: Conflict in the Cloud – Issues & Solutions for Big Data

©2014 Halo Business Intelligence | All Rights Reserved

Big Data in the CloudPremise-Cloud Hybrid Approach

Data Platform

Traditional RDMS

Commodity Storage Client

Familiar BI Tools

MPP

SQLSSAS

Sharepoint

BI

Stream

Machine

Learning

Browser

SQOOP

HIVE ODBC

ETL

an

d P

re-a

gg

reg

ate

on

-p

rem

ise

An

aly

ze V

isu

aliz

e in

Clo

ud

Page 18: Conflict in the Cloud – Issues & Solutions for Big Data

Enterprise Data Hub

©2014 Halo Business Intelligence | All Rights Reserved

On-premise Hadoop

Clusters

Data Warehouse

AcceleratorCloud Hosting

Cloud BI Reporting and

Analytics

Page 19: Conflict in the Cloud – Issues & Solutions for Big Data

ROI Strategies

Cost of Labor

Use lower skill-lower cost resources

Avoid extra headcount

Share experiences among plants

Move experienced talent to higher value activity

Cost of Capital

Use under-resourced equipment / assets more efficiently

Make equipment last longer, run more efficiently

Avoid more equipment purchases

Finding critical applications

©2014 Halo Business Intelligence | All Rights Reserved

Cost of Materials

User fewer raw materials

Improve quality of raw materials sourced

Improve delivery and inventory

Cost of Overheads

Reduce transportation costs

Reduce or optimize energy and resource costs

Reduce management layers

Cost of Lost Opportunities

Reduce time to market

Improve product end-of-life

Reduce downtime

Reduce order to cash

Cost of Reputation

Reduce product defects

Anticipate customer reactions

Tailor service and response profiles

More available: [email protected]

Page 20: Conflict in the Cloud – Issues & Solutions for Big Data

Warehouse OperationsMachine sensor data for inventory and labor optimization

©2014 Halo Business Intelligence | All Rights Reserved

$300K Cases per man hour

Picking accuracy

Page 21: Conflict in the Cloud – Issues & Solutions for Big Data

Drought Management for GrowersSmarter water use

©2014 Halo Business Intelligence | All Rights Reserved

$475K potential Water per output

Page 22: Conflict in the Cloud – Issues & Solutions for Big Data

Retail promotionsDemand forecasting, sentiment analysis, and pricing

©2014 Halo Business Intelligence | All Rights Reserved

$6.2M Sales per Square Foot

Returns Rate

Page 23: Conflict in the Cloud – Issues & Solutions for Big Data

Summary

• The value of investing in Big Data in the Cloud depends on your use case

• Cost is an issue – 25 TB

• Skills are an issue – steep learning curves

• Process is an issue – requires change in the way people think and operate

• Partners are an issue – do you want a large or niche provider

• Database design is important

©2014 Halo Business Intelligence | All Rights Reserved