conflict in the cloud – issues & solutions for big data
DESCRIPTION
Halo BI CEO, Keith Peterson, presents at the 6th Annual Cloud Computing Conference - AITP San Diego: Conflict in the Cloud – Issues & Solutions for Big Data. Cloud services make money based on the volume of data stored in the cloud – and big data delivers that volume. But companies seeking to use big data are looking for economies of scale from the Cloud.TRANSCRIPT
Conflict in the Cloud
Big data and cloud computingKeith Peterson, CEO
Halo BI
©2014 Halo Business Intelligence | All Rights Reserved
Starting points
• Information management and analytics issues hurting business objectives
• Taking days and weeks to get to the data
• Multiple copies of data around the organization
• No shared view of the truth
• ETL and data warehouse unable to handle loads
• BI and reporting eating up capacity
• Data volumes growing but budgets static
• Desire to leverage new machine data sources
©2014 Halo Business Intelligence | All Rights Reserved
The “Big Data” challenge (Executive View)…
© 2014 Halo Business Intelligence | All Rights Reserved
The “Big Data” challenge (Business View)…
© 2014 Halo Business Intelligence | All Rights Reserved
Big Data Is…
• Ever escalating volumes
• Expanding sources, such as Internet Of Things
• Increasingly high velocities
• With a widening variety of unstructured formats and semantic contexts
Big Data Is Not Really Useful
Unless insight can be gleaned through analytics...with a reasonable effort!
Three Strategies to Deal with Big Data
©2014 Halo Business Intelligence | All Rights Reserved
111 212 313
Ignore It Archive It Analyze It
Don’t jump on the
bandwagon
You have better things to
focus on
Just collect and store it
You can always analyze
it when resources free
up
With a clear business
problem and ROI
Invest in infrastructure to
derive the insights
needed
Big Data
©2014 Halo Business Intelligence | All Rights Reserved
Google 1 Trillion Web Pages per Year
Facebook 1 Million GB of Disk Storage
Yelp! 100 GB of log data per day
Youtube.com 20 Petabytes new video per year
.
.
Regional medical center – patient sensors 25 TB
Mid-market retailer – POS 10 TB
Mid-market manufacturer – machine sensors 6 TBhttp://www.google.com/trends/explore#q=%2Fm%2F04y7lrx%2C%20Amazon%20Aws%2C%20Rackspace&cmpt=q
Five Big Data Questions
©2014 Halo Business Intelligence | All Rights Reserved
111 212 313 414 515
Left Behind? Cloud? Data? Tools? Usefulness?
Everyone is doing
it…you need to.
Really?
Or will cost
exceed benefit?
Big Data requires Big
Compute
Outsourcing risks:
• Loss of Control
• Platform Reliability
• Privacy
• Security
Which data and
sources?
Too much to handle
Some or all?
All vendors have
a Big Data suite
Which one?
How to query the data
Skills needed
Machine Learning
Big Data and Cloud Computing
Commodity computing to execute distributed queries across multiple data sets
Rent commodity server instances to execute computation remotely
Cloud hosting for $10/TB/Mo
©2014 Halo Business Intelligence | All Rights Reserved
©2014 Halo Business Intelligence | All Rights Reserved
Traditional BI ArchitectureOn Premise Or Cloud
• Data Volumes = 100 GB – 5 TB
• Manageable on-premise or in the cloud
Operational DataData Warehouse
©2014 Halo Business Intelligence | All Rights Reserved
Add Big Data
Data volumes = 6 TB +
Big DataLogs
image
Cluster
©2014 Halo Business Intelligence | All Rights Reserved
Big Data in the CloudThe “Traditional” Approach
Data Platform
Traditional RDMSCommodity Storage
Client
Familiar BI Tools
MPP
SQLSSAS
Sharepoint
BI
Stream
Machine
Learning
Browser
SQOOP
HIVE ODBC
• Use Amazon
Redshift,
Azure
HDInsight or
similar
• Use Blob
storage to
persist big
data
• Spin up
Compute
clusters as
needed
• Keep Data in
Cloud
perpetually
Big Data Storage Costs
Sources:
http://calculator.s3.amazonaws.com/index.html
http://azure.microsoft.com/en-us/pricing/calculator/
As of Nov 2014
©2014 Halo Business Intelligence | All Rights Reserved
Provider TypeCost per TB
per yearCost per PB
per year
Amazon EBS SSD storage $ 1,229 $ 1,258,291
Amazon EBS Magnetic Storage $ 614 $ 629,145
Amazon S3 Storage $ 411 $ 420,372
Azure Tables & Queues $ 792 $ 811,302
Azure Blob Storage $ 288 $ 294,912
Hosting Considerations
• What if you host big data on-premise?
• Cost of managing hundreds of servers, expensive processing power
• Costs can be hidden in data center budget until too late
• What is your Big Data output?
• Beyond about 25 TB of data, cloud hosting costs become significant.
• Data Transfer costs must be considered as well
• Inbound is usually free
• Outbound can be $1,000’s per month
• Direct connect or physically ship
• For audit purposes, data may need to be kept for up to 7 years
• Factor this into your storage costs
• Location
• Will regulations impact ability to store or process on machines in different countries
© 2014 Halo Business Intelligence | All Rights Reserved
Big Data Considerations
Databases
• High speed analysis of transactional data
• Multi-step computations
• Interactive querying
• Lots of updates (adds/deletes/mods)
MapReduce HDFS
• Low cost storage and compute
• High performance queries on large data
• Complex data simple query
• Simple scaling
© 2014 Halo Business Intelligence | All Rights Reserved
Note: Ideas in this slide are borrowed and adapted from “Running, Managing, and Adapting Hadoop at Sears,” by Andy McNalis, Senior Manager,
Hadoop Infrastructure, Sears Holdings.
Cloud Considerations
• Big Data needs Big Compute
• Which cloud services will you choose?
• Time, effort and skills will vary considerably
• Microsoft Azure
• Amazon EC2
• Google Cloud Platform
• Verizon Cloud
• Rackspace
©2014 Halo Business Intelligence | All Rights Reserved
http://online.wsj.com/articles/little-space-remains-for-rackspace-ahead-
of-the-tape-1415557510
©2014 Halo Business Intelligence | All Rights Reserved
Big Data in the CloudThe “Traditional” Approach
Data Platform
Traditional RDMS
Commodity Storage Client
Familiar BI Tools
MPP
SQLSSAS
Sharepoint
BI
Stream
Machine
Learning
Browser
SQOOP
HIVE ODBC
©2014 Halo Business Intelligence | All Rights Reserved
Big Data in the CloudPremise-Cloud Hybrid Approach
Data Platform
Traditional RDMS
Commodity Storage Client
Familiar BI Tools
MPP
SQLSSAS
Sharepoint
BI
Stream
Machine
Learning
Browser
SQOOP
HIVE ODBC
ETL
an
d P
re-a
gg
reg
ate
on
-p
rem
ise
An
aly
ze V
isu
aliz
e in
Clo
ud
Enterprise Data Hub
©2014 Halo Business Intelligence | All Rights Reserved
On-premise Hadoop
Clusters
Data Warehouse
AcceleratorCloud Hosting
Cloud BI Reporting and
Analytics
ROI Strategies
Cost of Labor
Use lower skill-lower cost resources
Avoid extra headcount
Share experiences among plants
Move experienced talent to higher value activity
Cost of Capital
Use under-resourced equipment / assets more efficiently
Make equipment last longer, run more efficiently
Avoid more equipment purchases
Finding critical applications
©2014 Halo Business Intelligence | All Rights Reserved
Cost of Materials
User fewer raw materials
Improve quality of raw materials sourced
Improve delivery and inventory
Cost of Overheads
Reduce transportation costs
Reduce or optimize energy and resource costs
Reduce management layers
Cost of Lost Opportunities
Reduce time to market
Improve product end-of-life
Reduce downtime
Reduce order to cash
Cost of Reputation
Reduce product defects
Anticipate customer reactions
Tailor service and response profiles
More available: [email protected]
Warehouse OperationsMachine sensor data for inventory and labor optimization
©2014 Halo Business Intelligence | All Rights Reserved
$300K Cases per man hour
Picking accuracy
Drought Management for GrowersSmarter water use
©2014 Halo Business Intelligence | All Rights Reserved
$475K potential Water per output
Retail promotionsDemand forecasting, sentiment analysis, and pricing
©2014 Halo Business Intelligence | All Rights Reserved
$6.2M Sales per Square Foot
Returns Rate
Summary
• The value of investing in Big Data in the Cloud depends on your use case
• Cost is an issue – 25 TB
• Skills are an issue – steep learning curves
• Process is an issue – requires change in the way people think and operate
• Partners are an issue – do you want a large or niche provider
• Database design is important
©2014 Halo Business Intelligence | All Rights Reserved