alluxio: unify data at memory speed · compute zone mesosor yarn storage in different availability...

29
Alluxio: Unify Data at Memory Speed Product Overview September 26, 2017

Upload: others

Post on 25-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Alluxio: Unify Data at Memory SpeedProduct Overview

September 26, 2017

Page 2: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 2

Agenda

2

1

2

3

Why we built Alluxio

Alluxio’s innovations

Use cases

Page 3: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 3

Data Ecosystem Yesterday

•One Compute Framework• Single Storage System• Co-located

ETL

ETL

ETL

Page 4: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 4

Data Ecosystem Today

• Many Compute Frameworks

• Multiple Storage Systems• Most not co-located

Page 5: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 5

Data Ecosystem Issues

• Each application manage multiple data sources

• Add/Removing data sources require application changes

• Storage optimizations requires application change

• Lower performance due to lack of locality

Page 6: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 6

Data Ecosystem Challenges

2 Data Freshness• Real time data?• Cross-network movement is slow• Each ETL creates more lag

4 Security & Governance• Data security & governance is

increasingly complex

1 Speed & Complexity• Many storage & compute systems• Integration and interoperability issues

(on prem, hybrid, cloud)• Many departments & groups

3 Cost • Data and App explosion driving cost up• Data duplication

6

Heavy integrations create painful organizational drag

Page 7: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 7

This is why we built AlluxioA unified data solution for the digital economy

Page 8: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 8

Data Ecosystem with Alluxio

• Apps only talk to Alluxio

• Simple Add/Remove

• No App Changes

• Highest performance in Memory

• No Lock in

Native File System Hadoop Compatible File System

REST Web Service Key-Value Interface

HDFS Interface Amazon S3 Interface Swift Interface NFS Interface

Page 9: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 9

Fastest Growing Big Data Open Source Project

0

100

200

300

400

5000 10 20 30 40 45

Num

ber

of C

ontr

ibut

ors

Open Source Contributors by Month (Github)

Alluxio

Spark

Kafka

Redis

HDFS

Cassandra

Hive

Fastest Growing open-source project in the big data ecosystem

Running world’s largest production clusters

600+ Contributors from 100+ organizations

Page 10: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 10

Selection of customers

Page 11: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 11

Alluxio Design Principles

2 Data Sharing• Don’t own the data• Multiple apps sharing common data• Data stored in multiple, hybrid systems

4 Enterprise Class• Distributed architecture• Commodity hardware• Service-oriented• High availability• Security

1 Big Data & Machine Learning• Interoperability with leading projects• Large scale data sets• High IO

3 High Speed Data Access• Remote data• Hot/warm/cold data• Temporary data• Read/write support

11

Page 12: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 12

Alluxio Innovation:

Unified NamespaceEnables effective data management across different Under Stores

Uses Mounting with Transparent Naming

Page 13: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 13

Alluxio Innovation:

Unified NamespaceCreate a catalog of available data sources for Data Scientists

/finance/customer-transactions//finance/vendor-transactions//operations/device-logs//operations/phone-call-recordings//operations/check-images//research/us-economic-data//research/intl-economic-data//marketing/advertising-dataset//marketing/marketing-funnel-dataset/

alluxio://

Page 14: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 14

Alluxio Innovation:

Server-side API TranslationConvert from Client-side Interface to Native Storage Interface

HDFS Interface

HDFS Interface S3A Interface Swift InterfaceGoogle Cloud Interface

Page 15: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 15

Alluxio Innovation:

Server-side API TranslationConvert between different versions of HDFS

HDFS 2.7 Interface

HDP 2.4 InterfaceCDH 5.6 Interface MAPR 5.2 Interface

Page 16: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 16

Alluxio Innovation:

Intelligent CacheLocal performance from remote data using native multi-tier storage

RAM SSD HDD

Hot Warm Cold

Read & Write BufferingTransparent to App

Policies for pinning, promotion/demotion, TTL

Page 17: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 17

Alluxio Innovation:

Intelligent CacheMaintain read & write operations in the event of an outage

RAM SSD HDD

Hot Warm Cold

Read & Write BufferingTransparent to App

Policies for pinning, promotion/demotion, TTL

X

Page 18: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 18

Where to use AlluxioFinding high-fit Alluxio use-cases

Compute ZoneStandalone or managed with Mesos or Yarn

Storage in Different Availability ZoneEither on-prem or cloud

Alluxio is installed with or near compute to unify data stores, stage remote data, and improve system performance.

Spark Tensorflow Presto

HDFS

Guidelinesü Compute separated from storageü Distributed computeü I/O or network latency existsü Unification of many storage systemsü Applications sharing long lived data

More checks result in higher fit applications

Page 19: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 19

Where to use AlluxioFinding high-fit Alluxio use-cases

Compute ZoneStandalone or managed with Mesos or Yarn

Storage in Different Availability ZoneEither on-prem or cloud

Alluxio is installed with or near compute to unify data stores, stage remote data, and improve system performance.

Spark Tensorflow Presto

HDFS

Example First ProjectsüBig Data Hybrid StorageüCommon Data CatalogüData Center ContainerizationüCloud Migrationü ETL Alternative

Page 20: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 20

Alluxio Offerings

Cap

abili

ty/V

alue

TechnologyValidation

Alluxio OpenSource (AOS)

Open Source

Alluxio EnterpriseEdition (AEE)

EnterpriseDeployment

• Kerberos Authentication

• LDAP Integration• Encryption• Data Replication• Fast Durable Write• Support

Alluxio Manager

Open Source

Page 21: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 21

Use Cases

Page 22: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 22

Next Gen Analytics PlatformLeading US TechnologyCompany

Page 23: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 23

HPC/Deep Learning Partnership -

Alluxio maximizes GPU investment:

• Self-serve data access for data scientists

• Rapid integration of new data sources

• Improved memory management & performance

Page 24: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 24

Machine Learning Case Study –

Challenge –Slow training of model for algorithmic trading in $46B data driven Hedge Fund

Data access was slow, costing them $$ in compute cost and lower modeler productivity

SPARK

HDFS

SPARK

HDFS

Solution –With Alluxio, data access are 10-30X faster

Impact –Increased efficiency on training of ML algorithm, lowered compute cost and increased modeler productivity, resulting in 14 day ROI of Alluxio

MES

OS

MES OS

Public Internet

Public Internet

Page 25: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 25

Consumer Intelligence Use Case – Top 3 Telco

Challenge –Desired a central view of consumer information in near real time for proactive support.

Many HDFS, different distributions, many incompatible versions. On-prem & cloud. Integration through heavy ETL.

HADOOP

Solution –Alluxio integrates data into central catalog for fast access to consumer interaction records.

Impact –Reduced integration timeFaster data speed & freshness

ML HADOOP

HDFS HDFS HDFS

ML

ETL

HDP

HDFS

CDH

HDFS

MAPR

HDFS

HDFS

Page 26: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 26

Big Data Case Study – Top 3 Retailer

Challenge –Bottleneck in Trend Analysis of mission critical daily sales and inventory management

Queries were slow / not interactive, resulting in operational inefficiency

SPARK

HDFS

SPARK

HDFS

Solution –With Alluxio, data queries are 10X faster

Impact –Higher operational efficiency

Use case: http://bit.ly/2ook8Nh

Page 27: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 27

Big Data Case Study –

27

Challenge –Gain end to end view of business with large volume of data

Queries were slow / not interactive, resulting in operational inefficiency

SPARK

TERADATA

SPARK

TERADATA

Solution –ETL Data from Teradata to Alluxio

Impact –Faster Time to Market – “Now we don’t have to work Sundays”

Use Case: http://bit.ly/2oMx95W

Page 28: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Confidential © Alluxio, Inc. All Rights Reserved. 28

Enabling Next Gen Big Data Analytics

1

2

3

Unified Storage Bridge

Unified Cache Management

Security & Governance

Page 29: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data

Twitter.com/alluxio

Linkedin.com/alluxio

Websitewww.alluxio.com

[email protected]

@

Social Media

á

Confidential © Alluxio, Inc. All Rights Reserved. 29

Thank You!