how to deploy apache spark in a multi-tenant, on-premises environment

17
HOW TO DEPLOY APACHE SPARK IN A MULTI-TENANT, ON-PREMISES ENVIRONMENT

Upload: bluedata-inc

Post on 14-Feb-2017

1.038 views

Category:

Software


3 download

TRANSCRIPT

Page 1: How to deploy Apache Spark in a multi-tenant, on-premises environment

HOW TO DEPLOY APACHE SPARK IN A MULTI-TENANT, ON-PREMISES ENVIRONMENT

Page 2: How to deploy Apache Spark in a multi-tenant, on-premises environment

Adoption of Apache Spark is accelerating

• Spark adoption is growing rapidly – The number of contributors and end users is increasing at a substantial rate

• Spark is expanding beyond Hadoop– Spark is an integral component of new big data platforms - with support for pipelines,

streaming and statistical analysis, SQL, and more

• A variety of use cases are being implemented – Use cases include recommendation systems, data warehousing, log processing, and more

• Programming paradigm is expanding– Languages supported include java, scala, python, SQL, R and more

Source: Spark Survey Report, 2015 (Databricks)

Page 3: How to deploy Apache Spark in a multi-tenant, on-premises environment

Top roles using Spark in the enterprise

DATA ENGINEERS

41%DATA SCIENTISTS

22.2%ARCHITECTS

17.2%

MANAGEMENT

10.6%ACADEMIA

6.2%OTHER

2.4%

Source: Spark Survey Report, 2015 (Databricks)

Page 4: How to deploy Apache Spark in a multi-tenant, on-premises environment

Spark infrastructure patterns

• Individual developers or data scientists who build their own infrastructure from VMs or bare metal machines

• A bottoms-up approach where everyone gets the same infrastructure/platform irrespective of their skill or use case

Page 5: How to deploy Apache Spark in a multi-tenant, on-premises environment

Developers / data scientists and Spark

• Mostly self-starters who identify a use case

• They build their own systems on laptops, VMs, or servers

• The complexity soon overwhelms them and restricts adoption

• They need help to scale deployment beyond the initial use case

Page 6: How to deploy Apache Spark in a multi-tenant, on-premises environment

Rigid on-premises infrastructure

• Infrastructure is often built by IT for generic use cases

• Flexibility to cater to different usage scenarios is lost

• Spark users needs are always changing

• Upgrades become a challenge

Page 7: How to deploy Apache Spark in a multi-tenant, on-premises environment

Common Deployment Patterns

48%Standalone mode

40%YARN

11%Mesos

Most Common Spark Deployment Environments (Cluster Managers)

Source: Spark Survey Report, 2015 (Databricks)

Page 8: How to deploy Apache Spark in a multi-tenant, on-premises environment

Scalable, self-service infrastructure

• IT controls machines, network, storage, and security

• Users create their own tenants and Spark clusters

• Teams can upgrade and scale their clusters independently

Page 9: How to deploy Apache Spark in a multi-tenant, on-premises environment

Big Data New Realities

Big Data Traditional Assumptions

Bare-metal

Data locality

HDFS on local disks

Big Data New Realities

Containers and VMs

Compute and storage separation

In-place access on remote data stores (e.g.

NFS, Object)

New Benefits and Value

Big-Data-as-a-Service

Agility and cost savings

Faster time-to-insights

Page 10: How to deploy Apache Spark in a multi-tenant, on-premises environment

Local HDFS

BlueData EPIC Software Platform

IOBoost™ - Extreme performance and scalability

ElasticPlane™ - Self-service, multi-tenant clusters

DataTap™ - In-place access to enterprise data stores

Blue Data EPIC 2.0 PlatformMarketing R&D Sales Manufacturing Support

BI/Analytics Tools

NFS Gluster Object Store Remote HDFS CEPH

Page 11: How to deploy Apache Spark in a multi-tenant, on-premises environment

Deployment flexibility for Spark

• Physical Machines or VMs as hosts

• Docker containers as nodes

• Networking and security enabled

• Standalone or YARN-based deployment

Page 12: How to deploy Apache Spark in a multi-tenant, on-premises environment

Support for all types of Spark users

• Integrated web-based notebook support for data analysts

• Command line support for data engineers and data scientists

• API support for building customer pipelines

•Multiple language support including SQL, R, Streaming

• JDBC support for business intelligence tools

Page 13: How to deploy Apache Spark in a multi-tenant, on-premises environment

Simple and easy Spark cluster creation

Page 14: How to deploy Apache Spark in a multi-tenant, on-premises environment

Instant Spark analysis and visualization

• Web-based notebook with integrated Spark cluster

• Support for various languages and Zeppelin interpreters

• Fully provisioned Hadoop File System (HDFS)

• Support for persistent tables

• Iterative analysis and visualization

Page 15: How to deploy Apache Spark in a multi-tenant, on-premises environment

App Store for Spark and Big Data tools

Page 16: How to deploy Apache Spark in a multi-tenant, on-premises environment

One-click Big Data app deployment

Page 17: How to deploy Apache Spark in a multi-tenant, on-premises environment

www.bluedata.com