a virtual distributed storage system // haoyuan li, alluxio [firstmark's data driven]
TRANSCRIPT
Alluxio (formerly Tachyon),A Memory Speed Virtual
Distributed StorageHaoyuan (HY) Li
CEO @ Alluxio Inc.April 11, 2016
2
Alluxio Inc
• Founded by Alluxio (formerly Tachyon) open source project creators and top committers
• $7.5 million Series A by Andreessen Horowitz• Committed to the Alluxio Open Source Project• Company Website: www.alluxio.com• We are hiring!
Who Am I?• Haoyuan LI– Co-creator of Alluxio (formerly Tachyon)– CEO @ Alluxio Inc– Ph.D. Candidate @ AMPLab, UC Berkeley– Founding Committer of Apache Spark
Memory Speed• Memory-centric architecture designed for memory I/O
Virtual• Unified Namespace abstracts persistent storage from applications
Distributed• Designed to scale with nothing but commodity hardware
Open Source• One of the fastest growing project communities
6
Background• Started at UC Berkeley AMPLab
– From summer 2012– The same lab produced Apache Mesos and Apache
Spark
• Open sourced– April 2013– Apache License 2.0– Latest Release: Version 1.0.1 (February 2016)
• Deployed at > 100 companies7
Performance Trend: Memory is Fast
• RAM throughput increasing exponentially
• Disk throughput increasing slowly
13
Memory-locality key to interactive response times
Problems
• Costly Eco-system Integration• Costly ETL• Expensive Data Duplication• Data Silo• Nightmare Data Management• Long Cycle from Data to Value
Alluxio: Any Application accesses Any Data from Any Storage at
Memory Speed
23
• Enable new workloads across storage systems• Work with the framework of your choice• Scale storage and compute independently
Alluxio Case Study
• Framework: Spark• Under Storage: Baidu’s File System• Storage Media: MEM + HDD• 200+ nodes deployment• 2PB+ managed space
Alluxio Case Study
• Framework: Spark• Storage Media: MEM• Improvement from Hours to Seconds
Use Case: Qunar [NASDAQ:QUNR]
• Framework: Spark Streaming & Batch
• Under Storage: HDFS & Ceph
• Storage Media: MEM + HDD
• 200 nodes deployment
Use Case: an Oil Company
• Framework: Spark
• Under Storage: GlusterFS
• Storage Media: MEM only
• Analyzing data in traditional storage
Use Case: a SAAS Company
• Framework: Impala
• Under Storage: S3
• Storage Media: MEM + SSD
• 15x Performance Improvement
Use Case: a Biotechnology Company
• Framework: Spark & MapReduce
• Under Storage: GlusterFS
• Storage Media: MEM and SSD
Use Case: a SAAS Company
• Framework: Spark
• Under Storage: S3
• Storage Media: SSD only
• Elastic Alluxio deployment
Use Case: a Retail Company
• Framework: Spark & MapReduce
• Under Storage: HDFS
• Storage Media: MEM
• Alluxio Project: www.alluxio.org
• Alluxio Inc: www.alluxio.com
• Development: www.github.com/Alluxio/alluxio
• Meet Friends: www.meetup.com/Alluxio
• Contact: [email protected]