a reliable memory-centric distributed storage systemhaoyuan/talks/tachyon_2014-10-16... · a...
TRANSCRIPT
![Page 1: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/1.jpg)
A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata & Hadoop World NYC
Website: tachyon-project.org Meetup: www.meetup.com/Tachyon
UC Berkeley
![Page 2: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/2.jpg)
Outline
• Overview – Feature 1: Memory Centric Storage Architecture – Feature 2: Lineage in Storage
• Open Source
• Roadmap
![Page 3: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/3.jpg)
Outline
• Overview – Feature 1: Memory Centric Storage Architecture – Feature 2: Lineage in Storage
• Open Source
• Roadmap
![Page 4: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/4.jpg)
Projects UC Berkeley
• Design next generaDon data analyDcs stack: Berkeley Data AnalyDcs Stack (BDAS)
a cluster manager making it easy to write and deploy distributed applicaDons.
a parallel compuDng system supporDng general and efficient in-‐memory execuDon.
a reliable distributed memory-‐centric storage enabling memory-‐speed data sharing.
![Page 5: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/5.jpg)
Why Tachyon?
5
![Page 6: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/6.jpg)
Memory is King
• RAM throughput increasing exponenDally • Disk throughput increasing slowly
Memory-‐locality key to interacDve response Dme
![Page 7: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/7.jpg)
Realized by many… • Frameworks already leverage memory
7
![Page 8: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/8.jpg)
Problem solved?
8
![Page 9: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/9.jpg)
Missing a SoluDon for Storage Layer
9
![Page 10: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/10.jpg)
An Example: -‐
• Fast in-‐memory data processing framework – Keep one in-‐memory copy inside JVM – Track lineage of operaDons used to derive data – Upon failure, use lineage to recompute data
map
filter map
join reduce
Lineage Tracking
![Page 11: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/11.jpg)
Issue 1 Data Sharing is the bo/leneck in
analy4cs pipeline: Slow writes to disk
Spark Task
Spark mem block manager
block 1
block 3
Spark Task
Spark mem block manager
block 3
block 1
HDFS / Amazon S3 block 1
block 3
block 2
block 4
storage engine & execution engine same process (slow writes)
11
![Page 12: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/12.jpg)
Issue 1
Spark Task
Spark mem block manager
block 1
block 3
Hadoop MR
YARN
HDFS / Amazon S3 block 1
block 3
block 2
block 4
storage engine & execution engine same process (slow writes)
12
Data Sharing is the bo/leneck in analy4cs pipeline: Slow writes to disk
![Page 13: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/13.jpg)
Issue 2
Spark Task
Spark memory block manager
block 1
block 3
HDFS / Amazon S3 block 1
block 3
block 2
block 4
execution engine & storage engine same process
13
Cache loss when process crashes.
![Page 14: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/14.jpg)
Issue 2
crash
Spark memory block manager
block 1
block 3
HDFS / Amazon S3 block 1
block 3
block 2
block 4
execution engine & storage engine same process
14
Cache loss when process crashes.
![Page 15: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/15.jpg)
HDFS / Amazon S3
Issue 2
block 1
block 3
block 2
block 4
execution engine & storage engine same process
crash
15
Cache loss when process crashes.
![Page 16: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/16.jpg)
HDFS / Amazon S3
Issue 3
In-‐memory Data Duplica4on & Java Garbage Collec4on
Spark Task
Spark mem block manager
block 1
block 3
Spark Task
Spark mem block manager
block 3
block 1
block 1
block 3
block 2
block 4
execution engine & storage engine same process (duplication & GC)
16
![Page 17: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/17.jpg)
Tachyon
Reliable data sharing at memory-‐speed
within and across cluster frameworks/jobs
17
![Page 18: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/18.jpg)
SoluDon Overview
Basic idea • Feature 1: memory-‐centric storage architecture • Feature 2: push lineage down to storage layer Facts • One data copy in memory • RecomputaDon for fault-‐tolerance
![Page 19: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/19.jpg)
Stack
Computation Frameworks (Spark, MapReduce, Impala, H2O, …)
Existing Storage Systems (HDFS, S3, GlusterFS, …)
Tachyon
![Page 20: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/20.jpg)
Memory-‐Centric Storage Architecture
20
![Page 21: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/21.jpg)
Issue 1 revisited
Memory-‐speed data sharing among jobs in different frameworks
execution engine & storage engine same process (fast writes)
Spark Task
Spark mem block 1
Hadoop MR
YARN
HDFS / Amazon S3 block 1
block 3
block 2
block 4
HDFS disk
block 1
block 3
block 2
block 4 Tachyon in-‐memory
block 1
block 3 block 4
21
![Page 22: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/22.jpg)
Issue 2 revisited
Spark Task
Spark memory block manager
block 1
HDFS / Amazon S3 block 1
block 3
block 2
block 4
execution engine & storage engine same process
Tachyon in-‐memory
block 1
block 3 block 4
22
Keep in-‐memory data safe, even when a job crashes.
![Page 23: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/23.jpg)
Issue 2 revisited
Spark memory block manager
HDFS disk
block 1
block 3
block 2
block 4
execution engine & storage engine same process
Tachyon in-‐memory
block 1
block 3 block 4
crash
HDFS / Amazon S3 block 1
block 3
block 2
block 4 23
Keep in-‐memory data safe, even when a job crashes.
![Page 24: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/24.jpg)
Issue 2 revisited
HDFS disk
block 1
block 3
block 2
block 4
execution engine & storage engine same process
Tachyon in-‐memory
block 1
block 3 block 4
crash
HDFS / Amazon S3 block 1
block 3
block 2
block 4
Keep in-‐memory data safe, even when a job crashes.
24
![Page 25: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/25.jpg)
Issue 3 revisited
No in-‐memory data duplica4on, much less GC
Spark Task
Spark mem
Spark Task
Spark mem
HDFS / Amazon S3 block 1
block 3
block 2
block 4
execution engine & storage engine same process (no duplication & GC)
HDFS disk
block 1
block 3
block 2
block 4 Tachyon in-‐memory
block 1
block 3 block 4
25
![Page 26: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/26.jpg)
Lineage in Storage (alpha)
26
![Page 27: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/27.jpg)
Comparison with in Memory HDFS
![Page 28: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/28.jpg)
Workflow Improvement
Performance comparison for realisDc workflow. The workflow ran 4x faster on Tachyon than on MemHDFS. In case of node failure, applicaDons in Tachyon sDll finishes 3.8x faster.
28
![Page 29: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/29.jpg)
Further Improve Spark’s Performance
Grep Program
![Page 30: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/30.jpg)
How easy / hard to use Tachyon?
30
![Page 31: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/31.jpg)
Spark/MapReduce/Shark without Tachyon
• Spark – val file = sc.textFile(“hdfs://ip:port/path”)
• Hadoop MapReduce – hadoop jar hadoop-‐examples-‐1.0.4.jar wordcount hdfs://localhost:19998/input hdfs://localhost:19998/output
• Shark – CREATE TABLE orders_cached AS SELECT * FROM orders;
![Page 32: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/32.jpg)
Spark/MapReduce/Shark with Tachyon
• Spark – val file = sc.textFile(“tachyon://ip:port/path”)
• Hadoop MapReduce – hadoop jar hadoop-‐examples-‐1.0.4.jar wordcount tachyon://localhost:19998/input tachyon://localhost:19998/output
• Shark – CREATE TABLE orders_tachyon AS SELECT * FROM orders;
![Page 33: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/33.jpg)
Spark on Tachyon ./bin/spark-‐shell sc.hadoopConfiguraDon.set("fs.tachyon.impl", "tachyon.hadoop.TFS") // Load input from Tachyon val file = sc.textFile("tachyon://localhost:19998/LICENSE") file.count() ; file.take(10); // Store RDD OFF_HEAP in Tachyon import org.apache.spark.storage.StorageLevel; file.persist(StorageLevel.OFF_HEAP) file.count(); file.take(10); // Save output to Tachyon file.flatMap(line => line.split(" ")).map(s => (s, 1)).reduceByKey((a, b) => a + b).saveAsTextFile("tachyon://localhost:19998/LICENSE_WC")
![Page 34: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/34.jpg)
Outline
• Overview – Feature 1: Memory Centric Storage Architecture – Feature 2: Lineage in Storage
• Open Source
• Roadmap
![Page 35: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/35.jpg)
History Started at UC Berkeley AMPLab from the summer of 2012
• Reliable, Memory Speed Storage for Cluster CompuDng Frameworks (UC Berkeley EECS Tech Report)
• Haoyuan Li, Ali Ghodsi, Matei Zaharia, Ion Stoica, Scot Shenker
35
![Page 36: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/36.jpg)
A Open Source Status
• Apache License 2.0, Version 0.5.0 (July 2014)
• Deployed at tens of companies
• 20+ Companies Contributing
• Spark and MapReduce applications can run without any code change
![Page 37: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/37.jpg)
Release Growth
37
Tachyon 0.1: -1 contributor
Dec ‘12
![Page 38: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/38.jpg)
Release Growth
Tachyon 0.2: - 3 contributors
Apr ‘13 38
Tachyon 0.1: -1 contributor
Dec ‘12
![Page 39: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/39.jpg)
Release Growth
Tachyon 0.2: - 3 contributors
Oct‘13 Apr ‘13
Tachyon 0.3: - 15 contributors
39
Tachyon 0.1: -1 contributor
Dec ‘12
![Page 40: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/40.jpg)
Release Growth
Tachyon 0.2: - 3 contributors
Feb ‘14 Oct‘13 Apr ‘13
Tachyon 0.3: - 15 contributors
Tachyon 0.4: - 30 contributors
40
Tachyon 0.1: -1 contributor
Dec ‘12
![Page 41: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/41.jpg)
Release Growth
Tachyon 0.2: - 3 contributors
Feb ‘14 Oct‘13 Apr ‘13
Tachyon 0.3: - 15 contributors
Tachyon 0.4: - 30 contributors
41 July ‘14
Tachyon 0.5: - 46 contributors
Tachyon 0.1: -1 contributor
Dec ‘12
![Page 42: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/42.jpg)
Open Community
42
Berkeley Contributors
Non-‐Berkeley Contributors
![Page 43: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/43.jpg)
Thanks to our Code Contributors! Aaron Davidson Achal Soni Ali Ghodsi Andrew Ash Anurag Khandelwal Aslan Bekirov Bill Zhao Brad Childs Calvin Jia Chao Chen Cheng Chang Cheng Hao Colin Patrick McCabe David Capwell
43
David Zhu Du Li Fei Wang Gerald Zhang Grace Huang Haoyuan Li Henry Saputra Hobin Yoon Huamin Chen Jey Kottalam Joseph Tang Juan Zhou Jun Aoki Lin Xing
Lukasz Jastrzebski Manu Goyal Mark Hamstra Mingfei Shi Mubarak Seyed Nick Lanham Orcun Simsek Pengfei Xuan Qianhao Dong Qifan Pu Raymond Liu Reynold Xin Robert Metzger Rong Gu
Sean Zhong Seonghwan Moon Shivaram Venkataraman Srinivas Parayya Tao Wang Timothy St. Clair Thu Kyaw Vamsi Chitters Xi Liu Xiang Zhong Xiaomin Zhang Zhao Zhang
![Page 44: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/44.jpg)
Thanks to Redhat!
![Page 45: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/45.jpg)
Thanks to Redhat!
Commercially supported by x
and running in dozens of their customers’ clusters
![Page 46: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/46.jpg)
Thanks to Redhat!
Tachyon is the Default Off-‐Heap Storage
SoluLon for .
![Page 47: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/47.jpg)
Exchange Data Between Spark and H20
47
![Page 48: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/48.jpg)
Believe from Industry
48
![Page 49: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/49.jpg)
Reaching wider communiDes: e.g. GlusterFS
49
![Page 50: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/50.jpg)
Under Filesystem Choices (Big Data, Cloud, HPC, Enterprise)
![Page 51: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/51.jpg)
Under Filesystem Choices (Big Data, Cloud, HPC, Enterprise)
![Page 52: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/52.jpg)
Outline
• Overview – Feature 1: Memory Centric Storage Architecture – Feature 2: Lineage in Storage
• Open Source
• Roadmap
![Page 53: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/53.jpg)
Features
• Memory Centric Storage Architecture • Lineage in Storage (alpha) • Hierarchical Local Storage • Data Serving • Different hardware • More… • Your Requirements?
53
![Page 54: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/54.jpg)
Short Term Roadmap (0.6 Release) • Ceph IntegraDon (Ceph Community, Redhat) • Hierarchical Local Storage (Intel) • Performance Improvement (Yahoo)
• MulD-‐tenancy (AMPLab)
• Mesos IntegraDon (Mesos Community, Mesosphere)
• Network Sub-‐system Improvement (Pivotal) • Many more from AMPLab and Industry Contributors
54
![Page 55: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/55.jpg)
Goal?
55
![Page 56: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/56.jpg)
Beter Assist Other Components
Tachyon
Spark MapReduce
Spark SQL H2O GraphX Impala
HDFS S3 GlusterFS
OrangeFS NFS Ceph ……
……
Welcome CollaboraLon!
![Page 57: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/57.jpg)
Thanks! Ques.ons?
• More Informa.on: – Website: h;p://tachyon-‐project.org – Github: h;ps://github.com/amplab/tachyon – Meetup: h;p://www.meetup.com/Tachyon
• Email: [email protected]
![Page 58: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &](https://reader031.vdocuments.us/reader031/viewer/2022030408/5a880c2e7f8b9afc5d8e352c/html5/thumbnails/58.jpg)
Release Growth
Tachyon 0.2: - 3 contributors
Feb ‘14 Oct‘13 Apr ‘13
Tachyon 0.3: - 15 contributors
Tachyon 0.4: - 30 contributors
58 July ‘14
Tachyon 0.5: - 46 contributors
Tachyon 0.1: -1 contributor
Dec ‘12