effective spark on multi-tenant clusters

1© Cloudera, Inc. All rights reserved.

Effective Spark on Multi-Tenant ClustersKostas Sakellis


Me

• Spark Tech Lead Manager at Cloudera•Contributed to Apache Spark•Previously, stint on Cloudera Manager


Challenges

•Predictable execution time of Spark jobs•Prevent Starvation

•Optimal cluster utilization• Secure Data access•Configuration Management


Spark on YARN


Why YARN?

• Spark supports pluggable Cluster Managers• local, Standalone, YARN and Mesos

• YARN contains proper resource manager•Enables multi-platform jobs

• Spark on YARN is mature with active community


Running an application

spark-submit --master yarn-cluster

--executor-memory 2g

--num-executors 3

--num-cores 2

<your-class>


Host-b.mydomain.com

System Architecturehost-a.mydomain.com

Resource Manager

Node Manager

Host-c.mydomain.com

Node Manager

Node Manager

Container

App Master

Exec2

Exec1

Exec3

Driver

Driver

Exec1 Exec2


Gotchas

• Ensure compatible YARN configuration• yarn.nodemanager.resource.[memory-mb|cpu-vcores]• yarn.scheduler.maximum-allocation-[vcores|mb]• ...

•Remember overhead memory• spark.yarn.executor.memoryOverhead •Default of 10% since Spark 1.4


Container [pid=63375,containerID=container_1388158490598_0001_01_000003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...]

Otherwise…


Host-b.mydomain.com

System Architecturehost-a.mydomain.com

Resource Manager

Node Manager

Host-c.mydomain.com

Node Manager

Node Manager

Exec2

Exec1

Exec3

Driver

Driver

Exec1 Exec2

Exec3

Exec2

Exec1

Driver


How do we share a common resource?

Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpg


Resource Management

• YARN has ability to create resource queues•Priorities can be set per queues

•Preemption is also available•Fixed in Spark 1.6 (SPARK-8167)• yarn.scheduler.fair.preemption


Running an application

spark-submit --master yarn-cluster

--queue my-special-queue

--executor-memory 2g

--num-executors 3

--num-cores 2

<your-class>


How about locality?

Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpgCourtesy of: https://blog.voxbone.com/wp-content/uploads/2015/07/think-global-act-local.jpg


ExecutorExecutor

Task Scheduling

Driver Executor

DAG Scheduler

Task Scheduler

Core

TaskTask

Shuffle

Shuffle

stagestageStage

Spark Context JobJobJob


Host-b.mydomain.com

Localityhost-a.mydomain.com

Resource Manager

Node Manager

HDFS

x:B1 x:B2 y:B1 y:B3

Host-c.mydomain.com

Node Manager

Node Manager

HDFS

x:B3 x:B2 y:B2 y:B3

HDFS

x:B3 x:B1 y:B1 y:B2

hdfs://x

hdfs://y

Exec2

Exec1Driver


Spark creates executors before executing code!


Underutilized Clusters

Courtesy of: http://media.nbclosangeles.com/images/1200*675/60-freeway-repair-dec16-2-empty.JPG


Dynamic Allocation

• Spark applications scale the number of executors based on load•Removes need for: --num-executors• Idle executors get killed

• First supported in CDH 5.4• Ideal for:•Long ETL jobs with large shuffles• shell applications: hive and spark shell


Task Scheduling

Driver

DAG Scheduler

Task Scheduler

stagestageStage

Spark Context JobJobJob

host-a.mydomain.com

Node Manager

Exec1

host-b.mydomain.com

Node Manager

Exec2

host-c.mydomain.com

Node Manager

Task

TaskExec3

Task

Task

RM


Dynamic Allocation Configuration

•Many Knobs• spark.dynamicAllocation.enabled• spark.dynamicAllocation.[min|max|initial]Executors• spark.dynamicAllocation.executorIdleTimeout• spark.dynamicAllocation.cachedExecutorIdleTimeout• ...

• --num-executors will disable dynamic allocation


Dynamic Allocation Limitations

• Still required to specify cores•--num-cores

•Memory•--executor-memory• Includes JVM overhead

•Caching• spark.dynamicAllocation.cachedExecutorIdleTimeout


The Future of Dynamic Allocation

•Only “task size” needed: --task-size• Eliminates•--num-cores•--num-executors•--executor-memory

• Leads to better cluster utilization


Dynamic Allocation respects Locality!


Security, oh no!

Courtesy of: https://www.iti.illinois.edu/sites/default/files/Cybersecurity_image.jpg


Security

• Shared resources -> Shared data• Security has many facets•Encryption•Authentication•Authorization

• Encryption is interesting for multi-tenant clusters


Encryption

Who’s looking at the data?


Data Flow in Spark

Driver

Executor

Executor

Spark Submit

Control Plane

File Distribution

Shuffle Blocks

UI

Disk

DiskSpilled/Shuffle Blocks


Prior to Spark 1.6

•Different channel, different method•Control plane• File distribution• Shuffle Blocks•User UI / REST API• Spilled/Shuffle Blocks

SSLSSLSASL EncryptionNo EncryptionUse encrypfs (or equivalent)


What is wrong with SSL?


Why not SSL?

• SSL can be hard to set up•Need certificates readable on every node• Sharing certificates not as secure•Hard to have per-user certificate


Spark 1.6

• Standardize around a common transport library•Replaces Akka RPC (SPARK-6028)•Replaces HTTP File service (SPARK-11140)•Uses Netty transport library with SASL Encryption

•But..•WebUI still has no encryption•Shuffle / Spilled blocks still require FS-level encryption•SASL in JVM restricted to 3DES – not very strong and slow


Spark 2.0

•REPL class distribution using transport lib (SPARK-11563)•HTTPS Support for WebUI (SPARK-2750)• Encrypting spilled blocks is almost available (SPARK-5682)•Depends on third party Chimera library for encryption•Work is being done to add Chimera to Apache Commons

• Future:•Use Chimera to encrypt over-the-wire data


Gateways: launching Spark Application

Courtesy of: http://www.gottardo2016.ch/sites/default/files/styles/hero/public/parallax_story_8_tunnelsystem.jpg?itok=p2Mtg5be


Host-b.mydomain.com

Spark Gateway

Resource Manager

Host-c.mydomain.com

Node Manager

Node Manager

gateway-a.mydomain.com

Bob Client

Client Configs

Spark Install

RandomPorts

Driver

Exec1 Exec2

Exec1 Driver

SSH


Gateway Considerations

•Gateway hosts actively managed by administrators•Updates to client configurations and Spark installs

•Users need to tunnel into network•Difficult to put users behind firewall

• YARN allows different Spark versions•spark.yarn.jar or spark.yarn.archive•Shared Spark services makes this difficult


Host-b.mydomain.com

Shared Services

Resource Manager

Host-c.mydomain.com

Node Manager

Node Manager

gateway-a.mydomain.com

Bob Client

Client Configs

Spark Install

RandomPorts

Driver

Exec1 Exec2

Exec1 Driver

SSH

SS

SS

History Service


Alternative

An open source Apache licensed REST web service that manages long running Spark contexts in your cluster


Livy Architecture

Rest Server

Cluster Manager

Driver ExecutorExecutor

Client

Driver ExecutorExecutor

The Managed ClusterHTTP

Context 1

Context 2

Context 2

Context 1


Case 1: Spark Application JAR Submission

• Enables spark applications to be submitted without needing a Spark installation•Basically a wrapper around spark-submit

% curl –XPOST localhost:8998/batches -d '{ "file": "<path_to_file>", “className”: “com.foo.bar..” ...}'


How do you retrieve results?


Case 2: Fine grained Job submission

•Programmatic submission of Spark jobs to a long running application•A thin Java (and Scala) client available for easier integration•Provides automatic serialization/deserialization

• Enables Web/Mobile applications to use Spark as a backend


Case 2: Example// Create Livy ClientLivyClient client = new LivyClientBuilder(false) .setURI(new URI(”<uri>")) .setAll(<config>) .build()

// JobHandle allows monitoring of jobsJobHandle<Long> handle = client.submit(new YourJob());

// Block until results are returnedhandle.get(TIMEOUT, TimeUnit.SECONDS)

// Close connectionsclient.stop()


Case 2: Example

private static class YourJob implements Job<Long> { @Override public Long call(JobContext jc) { ArrayList<Long> list = Arrays.asList(1, 2, 3, 4, 5); JavaRDD<Integer> rdd = jc.sc().parallelize(list); return rdd.count(); }}

// Job Interface to Implementpublic interface Job<T> extends Serializable { T call(JobContext jc) throws Exception;}


Contributions Welcome!

•http://livy.io/•Code: https://github.com/cloudera/livy• JIRA: https://issues.cloudera.org/browse/LIVY•Users: http://groups.google.com/a/cloudera.org/group/livy-user•Dev: http://groups.google.com/a/cloudera.org/group/livy-dev

http://livy.io/

http://livy.io/

https://github.com/cloudera/livy

https://issues.cloudera.org/browse/LIVY



http://groups.google.com/a/cloudera.org/group/livy-user

http://groups.google.com/a/cloudera.org/group/livy-user

http://groups.google.com/a/cloudera.org/group/livy-dev

http://groups.google.com/a/cloudera.org/group/livy-dev


Thank you

effective spark on multi-tenant clusters

Technology