gracehopper 2015, cluster management for big data in the cloud
TRANSCRIPT
2015
Cloudman: Cluster Management for Big Data
in the Cloud
Swati SinghiDecember 3, 2015
#GHCI15
2015
2015
▪Fixed pre-provisioned capacity▪Variable and Unpredictable workloads▪Do not scale well▪Expensive▪On-site IT team
Challenges of On-premise Big Data Infra
2015
Cloud offers salvation...▪Stretches with the workload▪Pay-as-you-go
...but brings its own challenges▪Moving data to the cloud▪Security/Privacy
Big Data in the Cloud
2015
Qubole as Big Data Service▪Enables Big Data on the cloud▪Enterprise ready deployments▪On major public clouds▪Simple and Fast
2015
Cloudman▪Qubole’s Cluster management software ▪Launches half a million nodes per month▪Works across AWS, GCE and Azure▪Provides higher level APIs
2015
Cloudman Goals▪Automated cluster provisioning▪Configure Big Data Stack▪Manage cluster lifecycle▪Highly optimized cost of compute
2015
UI SDK API
Cloudman
Layers of Big Data as a Service
2015
Architecture
2015
Challenges▪Autoscale based on workload▪Abstractions to address differences in
behaviors of each cloud provider Examples
−Image creation and registration−Configuring clusters
2015
▪Launched automatically when needed▪Expands automatically if the load is high▪Terminate the cluster with no running jobs▪Remove nodes at billing boundary
Autoscaling Clusters
2015
insert overwrite table dest select … from ads join campaigns on …group by …;
Map Tasks
ReduceTasks Deman
dSupply
Progress
Master
Slaves
Job Tracker
Cloudman
Cloudman: AutoScaling
2015
Image registration in AWS vs. Azure
Image creation and registration
2015
▪Image creation▪Public images in AWS ▪Not well supported in Azure▪Images copied to user’s account in Azure
Image creation and registration
2015
▪Configure credentials−Storage and Compute keys
▪Configure the big data stack−Start appropriate s/w, example JobTracker and NameNode on Master and TaskTracker and DataNode on Slaves
Cluster Configuration
2015
Optimizing cost of compute in Cloud
▪Utilize ephemeral compute instances to lower cost−AWS Spot Instances−GCE Preemptible VMs
▪Challenges−Data loss−Big data job failures
2015
Demo
2015
2015
Key Takeaways▪Highly efficient cluster management system▪Proven at scale in production▪Works on multiple clouds
2015
Got Feedback?
Rate and review the session on our mobile app – Convene
For all details visit: http://ghcindia.anitaborg.org
2015
Appendix
2015
Architecture▪ QDS has a user interface, Python and Java SDKs
and APIs that allows users to talk to QDS and analyze data sets without knowing cluster management. ▪A QDS user can submit primitive commands to
logical clusters.▪The middleware layer communicates to the cloud
orchestration layer called Cloudman▪Cloudman is responsible for spinning up clusters in
the concerned cloud
2015
▪One such example is Image creation and registration▪Procedure▪Pre create a machine image with all the the
softwares to be deployed baked into it▪We start the cluster machines using this as the
underlying image▪Saves us the time in deploying the softwares on
the nodes after they are up▪This process is very different in all the cloud
providers
Image creation and registration
2015
Cluster Configuration
▪Another operation that had to be implemented differently for each cloud▪Startup scripts are used for to
programmatically customize virtual machine instances▪AWS and Google cloud had support for this▪Azure did not support automatic execution of
this script at the VM boot up time in the Centos VMs
2015
▪Hadoop clusters in QDS come up automatically when applications that require them are launched▪If the load on the cluster is high, then the
cluster automatically expands. ▪Cloudman automatically launches additional
nodes which eventually join the running cluster and are able to pick up part of the workload
Autoscaling Clusters