big data in the cloud
DESCRIPTION
Adopting Hadoop to manage your Big Data is an important step, but not the end-solution to your Big Data challenges. Here are some of the additional considerations you must face: Choosing the right cloud for the job: The massive computing and storage resources that are needed to support Big Data applications make cloud environments an ideal fit, and more than ever, there is a growing number of choices of cloud infrastructure types and providers. Given the diverse options, and the dynamic environments involved, it becomes ever more important to maintain the flexibility for all your IT needs. Big Data is a complex beast: It involves many and different moving parts, in large clusters, and is continually growing and evolving. Managing such an environment manually is not a viable option. The question is, how can you achieve automation of all this complexity? The world beyond Hadoop: Big Data is not just Hadoop – there is a whole rapidly growing ecosystem to contend with, including NoSQL, data processing, analytics tools… As well as your own application services. How can you manage deployment, configuration, scaling and failover of all the different pieces, in a consistent way? In this session, you’ll learn how to deploy and manage your Hadoop cluster on any Cloud, as well as manage the rest of your big data application stack using a new open source framework called Cloudify.TRANSCRIPT
3
Big Data In The Cloud
2.7 ZB
0.5 Petabytes
66%
Global Digital Data
Two years tweets
Will or plan to use Big Data in the cloud
43% think that data
analytics could be improved in their organization if data analytics was part of
cloud services
Large ISV Case Study
• Application– Call Center surveillance
• Background– Previously – voice data
• Goal for a new system– Monitor data & voice– Multiple data sources – Advanced correlations
The Challenge
Cost Business Impact
Lower Margins
Competiveness
Time to Market
Customer Satisfaction
Infrastructure
Operational
Big Data in the Cloud- 3 Reasons
• Skills– Do you really need/want this all in-
house?• Huge amounts of external data. – Does it make sense to move and
manage all this data behind your firewall?
• Focus on the value of your data– Instead of big data management.
Holger Kisker
Managing Big Data on the
Cloud
• Auto start VMs• Install and configure
app components • Monitor • Repair • (Auto) Scale• Burst…
Big Data in the Cloud..
Reduce the Infrastructure Cost
Choose the Right Cloud for the Job
Running Bare-Metal for high I/O workloads, Public cloud for sporadic workloads..
Big Data in the Cloud ..
Reducing The Operational Complexity
• Consistent Management
• Automation Through the Entire Stack
Cloud Portability
Facts
Zynga moved ~80% of their workload from Amazon to their private zCloud
“own the base, rent the spike”
http://code.zynga.com/2012/02/the-evolution-of-zcloud/
Cloud Portability
Facts Started with Linode, then moved to RackSpace, then to AWS
http://code.mixpanel.com/2010/11/08/amazon-vs-rackspace/
Cloud Portability
Facts
• You want the flexibility to choose what’s right for you, when it’s right for you
• Based on pricing, features, availability, performance, etc.
Cloud APIs, Today
Standard APIs (?)OCCIVCloud
OSS FrameworksOpenStackCloudStackEucalyptus
Abstraction frameworksJCloudsDeltacloudFogLibvirt
Cloud APIs, Today
Standard APIsNot practical in the foreseeable future
OSS Projects Need a couple more years to converge &
mature
Abstraction FrameworksProbably the only
practical (near-term) option
Realization:
What You Really Care
about Is App
Portability
OS is the same on any cloud
Most clouds have compute & storage
Elasticity & scaling have same effects on the app, regardless of the cloud
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved
25
Consistent ManagementRecipes consistent description for running any app:
What middleware services to run Dependencies between services How to install services Where application and service binaries are When to spawn or terminate instances How to monitor each of the services.
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved
27
Choosing the Right Cloud for the Jobcompute { template "SMALL_LINUX"}
SMALL_LINUX : template imageId "us-east-1/ami-76f0061f“ remoteDirectory "/home/ec2-user/gs-files“ machineMemoryMB 1600 hardwareId "m1.small" locationId "us-east-1" localDirectory "upload" keyFile "myKeyFile.pem"
options ([ "securityGroups" : ["default"]as
String[], "keyPair" : "myKeyFile"])
overrides (["jclouds.ec2.ami-query":"",
"jclouds.ec2.cc-ami-query":""])privileged true
}
SMALL_LINUX : template{ imageId "1234" machineMemoryMB 3200 hardwareId "103" remoteDirectory "/root/gs-files" localDirectory "upload" keyFile "gigaPGHP.pem" options ([ "openstack.securityGroup" : "default", "openstack.keyPair" : "gigaPGHP"
])privileged true
}
Automation across the stack1 Upload your recipe.
2 Cloudify creates VM’s & installs agents
3 Agents install and manage your app
4 Cloudify automate the scaling
Try a simple demo yourself
The app
The Cloudify dashboardlaunch.cloudifysource.org/d
Large ISV Case Study
• Application– Call Center surveillance system
• Background– Previously – voice data
• Goal for a new systemMonitor data & voiceMultiple data sources Advanced correlations Mission
Accomplished
Additional Benefits
• True Cloud Economics
• One product -> any Customer Environment
• Increased Agility
Thank You!
References: http://www.cloudifysource.org http://github.com/CloudifySource