gracehopper 2015, cluster management for big data in the cloud

2015

Cloudman: Cluster Management for Big Data

in the Cloud

Swati SinghiDecember 3, 2015

#GHCI15

2015

Ajay Bhave

You are thinking here only from the perspective of Cloudman and how it varies from hustler. However, for this audience this is the first look at Qubole's cluster management stack. So, talk about all the key challenges that were solved to build hustler + Cloudman. For example, auto-scaling is a key differentiator.I see you have a slide for auto-scaling later. IMO that is the biggest USP and hard problem that should be discussed first. You can then describe the other major challenge as building an abstraction that is cloud provider agnostic. And say that this abstraction required addressing the differences in behaviors/capabilities of each cloud provider. For example, differences in image creation and configuring clusters.So, two major challenges to be discussed: Autoscaling efficiently and Agnostic to Cloud Provider (AWS/GCE/Azure).

Ajay Bhave

Instead of a separate slide, this can be talking points for your slide 6 when you introduce Cloudman and explain how existing existing offerings were considered, but did not meet requirements.If you think this is important to discuss separately, then add some bullets on why these of-the-shelf solutions were inadequate. Also then move this slide closer to the Cloudman slide.

Swati Singhi

Actually, makes sense to remove it. I do not have a lot of talking points around this one. I will remove it

2015

▪Fixed pre-provisioned capacity▪Variable and Unpredictable workloads▪Do not scale well▪Expensive▪On-site IT team

Challenges of On-premise Big Data Infra

2015

Cloud offers salvation...▪Stretches with the workload▪Pay-as-you-go

...but brings its own challenges▪Moving data to the cloud▪Security/Privacy

Big Data in the Cloud

2015

Qubole as Big Data Service▪Enables Big Data on the cloud▪Enterprise ready deployments▪On major public clouds▪Simple and Fast

2015

Cloudman▪Qubole’s Cluster management software ▪Launches half a million nodes per month▪Works across AWS, GCE and Azure▪Provides higher level APIs

2015

Cloudman Goals▪Automated cluster provisioning▪Configure Big Data Stack▪Manage cluster lifecycle▪Highly optimized cost of compute

2015

UI SDK API

Cloudman

Layers of Big Data as a Service

2015

Architecture

2015

Challenges▪Autoscale based on workload▪Abstractions to address differences in

behaviors of each cloud provider Examples

−Image creation and registration−Configuring clusters

2015

▪Launched automatically when needed▪Expands automatically if the load is high▪Terminate the cluster with no running jobs▪Remove nodes at billing boundary

Autoscaling Clusters

2015

insert overwrite table dest select … from ads join campaigns on …group by …;

Map Tasks

ReduceTasks Deman

dSupply

Progress

Master

Slaves

Job Tracker

Cloudman

Cloudman: AutoScaling

2015

Image registration in AWS vs. Azure

Image creation and registration

2015

▪Image creation▪Public images in AWS ▪Not well supported in Azure▪Images copied to user’s account in Azure


2015

▪Configure credentials−Storage and Compute keys

▪Configure the big data stack−Start appropriate s/w, example JobTracker and NameNode on Master and TaskTracker and DataNode on Slaves

Cluster Configuration

2015

Optimizing cost of compute in Cloud

▪Utilize ephemeral compute instances to lower cost−AWS Spot Instances−GCE Preemptible VMs

▪Challenges−Data loss−Big data job failures

2015

Demo

2015

Key Takeaways▪Highly efficient cluster management system▪Proven at scale in production▪Works on multiple clouds

2015

Got Feedback?

Rate and review the session on our mobile app – Convene

For all details visit: http://ghcindia.anitaborg.org

http://ghcindia.anitaborg.org/

2015

Appendix

2015

Architecture▪ QDS has a user interface, Python and Java SDKs

and APIs that allows users to talk to QDS and analyze data sets without knowing cluster management. ▪A QDS user can submit primitive commands to

logical clusters.▪The middleware layer communicates to the cloud

orchestration layer called Cloudman▪Cloudman is responsible for spinning up clusters in

the concerned cloud

2015

▪One such example is Image creation and registration▪Procedure▪Pre create a machine image with all the the

softwares to be deployed baked into it▪We start the cluster machines using this as the

underlying image▪Saves us the time in deploying the softwares on

the nodes after they are up▪This process is very different in all the cloud

providers


2015

Cluster Configuration

▪Another operation that had to be implemented differently for each cloud▪Startup scripts are used for to

programmatically customize virtual machine instances▪AWS and Google cloud had support for this▪Azure did not support automatic execution of

this script at the VM boot up time in the Centos VMs

2015

▪Hadoop clusters in QDS come up automatically when applications that require them are launched▪If the load on the cluster is high, then the

cluster automatically expands. ▪Cloudman automatically launches additional

nodes which eventually join the running cluster and are able to pick up part of the workload

Autoscaling Clusters

gracehopper 2015, cluster management for big data in the cloud

Technology