a virtual distributed storage system // haoyuan li, alluxio [firstmark's data driven]

33
Alluxio (formerly Tachyon), A Memory Speed Virtual Distributed Storage Haoyuan (HY) Li CEO @ Alluxio Inc. April 11, 2016

Upload: firstmark-capital

Post on 12-Apr-2017

273 views

Category:

Technology


3 download

TRANSCRIPT

Alluxio (formerly Tachyon),A Memory Speed Virtual

Distributed StorageHaoyuan (HY) Li

CEO @ Alluxio Inc.April 11, 2016

2

Alluxio Inc

• Founded by Alluxio (formerly Tachyon) open source project creators and top committers

• $7.5 million Series A by Andreessen Horowitz• Committed to the Alluxio Open Source Project• Company Website: www.alluxio.com• We are hiring!

Who Am I?• Haoyuan LI– Co-creator of Alluxio (formerly Tachyon)– CEO @ Alluxio Inc– Ph.D. Candidate @ AMPLab, UC Berkeley– Founding Committer of Apache Spark

4

Outline• What is Alluxio

• Why Alluxio

• Alluxio Use Cases

5

Alluxio: Open Source Memory

Speed Virtual Distributed Storage

Memory Speed• Memory-centric architecture designed for memory I/O

Virtual• Unified Namespace abstracts persistent storage from applications

Distributed• Designed to scale with nothing but commodity hardware

Open Source• One of the fastest growing project communities

6

Background• Started at UC Berkeley AMPLab

– From summer 2012– The same lab produced Apache Mesos and Apache

Spark

• Open sourced– April 2013– Apache License 2.0– Latest Release: Version 1.0.1 (February 2016)

• Deployed at > 100 companies7

Contributor Growth• Close to 250 Contributors– 3x growth over the last year

8

Organizations• Over 50 Organizations

9

Memory Speed Virtual Distributed Storage System

11

What is Alluxio

12

Why Alluxio?

Performance Trend: Memory is Fast

• RAM throughput increasing exponentially

• Disk throughput increasing slowly

13

Memory-locality key to interactive response times

Price Trend: Memory is Cheaper

14

source: jcmit.com

Realized by many…

15

16

Is theProblem Solved?

17

Missing a Solution for the Storage Layer

18

Take a Look at Eco-System

19

Big Data Ecosystem

20

Big Data Ecosystem

Problems

• Costly Eco-system Integration• Costly ETL• Expensive Data Duplication• Data Silo• Nightmare Data Management• Long Cycle from Data to Value

22

Ecosystem

Alluxio: Any Application accesses Any Data from Any Storage at

Memory Speed

23

• Enable new workloads across storage systems• Work with the framework of your choice• Scale storage and compute independently

Alluxio Power-Up Your WorkloadsBoth in the Cloud and on Premise

Alluxio Case Study

• Framework: Spark• Under Storage: Baidu’s File System• Storage Media: MEM + HDD• 200+ nodes deployment• 2PB+ managed space

Alluxio Case Study

• Framework: Spark• Storage Media: MEM• Improvement from Hours to Seconds

Use Case: Qunar [NASDAQ:QUNR]

• Framework: Spark Streaming & Batch

• Under Storage: HDFS & Ceph

• Storage Media: MEM + HDD

• 200 nodes deployment

Use Case: an Oil Company

• Framework: Spark

• Under Storage: GlusterFS

• Storage Media: MEM only

• Analyzing data in traditional storage

Use Case: a SAAS Company

• Framework: Impala

• Under Storage: S3

• Storage Media: MEM + SSD

• 15x Performance Improvement

Use Case: a Biotechnology Company

• Framework: Spark & MapReduce

• Under Storage: GlusterFS

• Storage Media: MEM and SSD

Use Case: a SAAS Company

• Framework: Spark

• Under Storage: S3

• Storage Media: SSD only

• Elastic Alluxio deployment

Use Case: a Retail Company

• Framework: Spark & MapReduce

• Under Storage: HDFS

• Storage Media: MEM

• Alluxio Project: www.alluxio.org

• Alluxio Inc: www.alluxio.com

• Development: www.github.com/Alluxio/alluxio

• Meet Friends: www.meetup.com/Alluxio

• Contact: [email protected]