imca: a high-performance caching front-end for glusterfs...
TRANSCRIPT
![Page 1: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/1.jpg)
IMCa: A High-Performance Caching Front-end for GlusterFS on InfiniBand
Ranjit Noronha and Dhabaleswar K. PandaNetwork Based Computing Lab
The Ohio State University<noronha, panda>@cse.ohio-state.edu
![Page 2: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/2.jpg)
Outline of the Talk
• Background and Motivation
• Architecture and Design of IMCa
• Experimental Evaluation of IMCa
• Conclusions and Future Work
![Page 3: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/3.jpg)
Background
• Large Scale Scientific and Commercial Workloads
• Petascale Computers have arrived
• High-Performance access to the I/O data is crucial – Parallel applications is often limited by I/O
• Clustered/Parallel File Systems have evolved to meet this challenge
![Page 4: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/4.jpg)
• File System performance still dependent on disk performance
• Single Server Bandwidth Drop With Multiple Clients
• Parallel I/O Bandwidth From Multiple Servers
0
500
1000
1 2 3 4 5 6 7 8
RDMA IPoIB GigE
0
500
1000
1 2 3 4 5 6 7 8
R DMA IP oIB G igE
Ba
nd
wid
th (
Meg
aB
yte
s/s)
Number of clients
4GB Server Memory
8GB Server Memory
![Page 5: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/5.jpg)
• Performance for Small Files– Generally difficult to achieve– Many environments with a large number of small files– Storing on the same disk block provides limited benefit– Striping does not provide benefit– Store on different servers
• Cache Coherency Problems– Client side cache provides good performance– Non-coherent client cache limited when there is sharing– Limited Scalability of coherent caches
• Server Load Problems– RDMA reduces overhead from TCP/IP– RDMA based transport protocols cannot reduce copying costs
within the file system
![Page 6: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/6.jpg)
Problem Statement• Which file-system operations are potential
targets for caching?• What are the alternatives to the traditional
client cache/server cache architecture?• What are the advantages and disadvantages
of alternate cache architectures?• How do we provide the performance of the
non-coherent client cache without the scalability problems of the coherent client cache?
![Page 7: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/7.jpg)
Outline of the Talk
• Background and Motivation
• Architecture and Design of IMCa
• Experimental Evaluation of IMCa
• Conclusions and Future Work
![Page 8: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/8.jpg)
Potential File System Operations That May Be Cached
• Potential Targets For Caching– Should be something the client reads– Should be possible to uniquely identify cache target– Should be possible to chunk the data element
• Small Operations Stat, Create, Delete, Open• Stat
– Read by the client – Used as a form of update by many applications– Should be used – Should be updated on read/write operations on the server
• Create/delete– Not read by the client– Delete should invalidate previous cache entries
• File Open– Not a target for caching, but may be used for prefetching
• Data Transfer Operations – Read and Writes– Blocks Needed
![Page 9: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/9.jpg)
Intermediate Cache Architecture (IMCa)
• Easy to maintain coherency• Extensible• Can multiple Cache nodes
provide benefit?
FS Client
SMCache
Underlying FS Cache
Cache1 Cache2 CachenEach Cache is a node (MCD Array)
Hash Function (CRC32) toFind the Cache Server
Hash Function (CRC32) toFind the Cache Server
CMCache
ext3
![Page 10: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/10.jpg)
Need for Blocks In IMCa
• Most file system store data on the disk as blocks
• Parallel file-systems stripe data across multiple servers
• IMCa uses a fixed block size to store data across the cache servers– Block size should provide good performance
for most small files– Should avoid
• excessive fragmentation
Requested Data
Data Block Boundaries
Extradata File data segmented
by IMCa blocksize
![Page 11: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/11.jpg)
Design-Read Operations (Hit)Client
SMCache
Underlying FS Cache
Cache1 Cache2 Cachen
CMCache
ext3
![Page 12: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/12.jpg)
Design-Read Operations (Miss)Client
SMCache
Underlying FS Cache
Cache1 Cache2 Cachen
CMCache
ext3
![Page 13: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/13.jpg)
Design-Write Operations
Client
SMCache
Underlying FS Cache
Cache1 Cache2 Cachen
CMCache
ext3
![Page 14: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/14.jpg)
Advantages/Disadvantages of IMCa
• Fewer Requests Hit the Server• Latency for requests read from the cache is
lower• MCDs are self-managing• Failures in MCDs do not impact correctness• Additional node elements needed especially
for caching• Cold Misses are expensive• Additional Blocks/Data Transfers Needed• Overhead and delayed updates
![Page 15: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/15.jpg)
Outline of the Talk
• Background and Motivation
• Architecture and Design of IMCa
• Experimental Evaluation of IMCa
• Conclusions and Future Work
![Page 16: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/16.jpg)
GlusterFS File System
• Clustered File System
• Client and Server in userspace
• Use FUSE interface to translate FS calls from the kernel to the user daemons
• No Stripping data distributed across servers
• Possible to apply translators at the server and client to perform different functions
• WWW.glusterfs.org
![Page 17: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/17.jpg)
Experimental Setup
• 64-node cluster– 8-core Intel Clovertown– 8 GB memory
• InfiniBand DDR is the interconnect• GlusterFS file-system• The data servers each have 8 RAID highpoint disks• Communication protocol is IPoIB in Reliable Connected
(RC) mode• MCDs run on independent nodes and use up to 6GB of
memory • CMCache and SMCache use a CRC32 hash function for
locating data on the MCDs• Lustre 1.6.4.3 is used with a socklnd for comparison
![Page 18: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/18.jpg)
Experiment-stat
– Consists of two stages
– First stage (untimed)• 262144 files created by a single node
– Second stage (timed)• each node tries to perform a stat on each of the 262144 files sequentially
![Page 19: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/19.jpg)
Stat performance
• Time to stat 262144 different files• Benchmark has two phase create (untimed), followed by stat (timed)• 82% improvement at 64 nodes
0
500
1000
1500
2000
2500
3000
3500
16 32 64
No Cache 1 Cache Server 2
2 Cache Servers 4 Cache Servers
6 Cache Servers Lustre-4DS
Number of Nodes
Tim
e (s
eco
nd
s)
0
100
200
300
400
500
1 2 4 8
No Cache 1 Cache Server 2
2 Cache Servers 4 Cache Servers
6 Cache Servers Lustre-4DS
![Page 20: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/20.jpg)
Experiment-Write Single Client
– One Client
– Writes 1,024 records of size r sequentially to the file
– Measure time for this to complete
![Page 21: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/21.jpg)
Write – Single Client
0200400600800
1000120014001600
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
4
3276
8
No Cache IMCa (2K) IMCa (Server Threads)
Write
La
ten
cy (
mic
rose
con
ds)
I/O Record Size (bytes)
• 2KB block size• Server thread helps performance
![Page 22: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/22.jpg)
Experiments-Read
• Single Client Read– Follows Write component of the benchmark
– Move file pointer to the beginning of the file
– Read 1,024 records of size r sequentially to the file
– Measure time for this to complete
• Multiple Client Read– Each client uses a separate file
• Multiple Client Read Shared – Same file used by every client
• Lustre configurations– Cold Client Cache Unmount between Write and Read
– Warm Client Cache No unmount between Write and Read
![Page 23: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/23.jpg)
Read Latency (Single Client)
0
200
400
600
800
1000
1200
1 4 16 64 256
1024
4096
1638
4
6553
6
No Cache Cache (256)
Cache (2K) Cache (8K)
Lustre-1DS (Cold) Lustre-4DS (Cold)
Lustre-4DS (Warm)
0
5000
10000
15000
20000
25000
No Cache Cache (256)
Cache (2K) Cache (8K)
Lustre-1DS (Cold) Lustre-4DS (Cold)
Lustre-4DS (Warm)
La
ten
cy (
us)
Bytes•Lustre shows best latency•Cache provides benefit for small message sizes
![Page 24: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/24.jpg)
Read Multiple Client (32 clients)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1 2 4 8
16
32
64
12
8
25
6
51
2
10
24
20
48
40
96
81
92
16
38
4
32
76
8
65
53
6
NoCache
IMCa (1)
IMCa (2)
IMCa (4)
Lustre (Cold)
Lustre (Warm)
•51% improvement in latency at 16K•Multiple MCDs help reduce capacity misses
Bytes
Tim
e (m
icro
seco
nd
s)
![Page 25: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/25.jpg)
Iozone throughput
0
200
400
600
800
1000
No Cache Cache (1) Cache (2) Cache (4) Lustre-1DS (Cold)
1
2
4
8
Th
rou
gh
pu
t (M
ega
By
tes/
seco
nd
)
•1, 2, 4, 8 IOzone threads, 1GB files, 2KB block size•325 MB/s (NoCache) -> 868 MB/s (4 MCDs)
![Page 26: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/26.jpg)
Read-Shared Latency
0
200
400
600
800
1000
1200
1400
1600
2 4 8 16 32
No Cache Lustre-1DS (Cold) MCD (1)
Tim
e (
mic
rose
cond
s)
Number of nodes
•IMCa helps improve performance over NoCache case
![Page 27: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/27.jpg)
Outline of the Talk
• Background and Motivation
• Architecture and Design of IMCa
• Experimental Evaluation of IMCa
• Conclusions and Future Work
![Page 28: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/28.jpg)
Conclusions and Future Work
•Proposed, Designed and Evaluated an Intermediate Cache for GlusterFS•Good improvement in stat performance•Improvement in latency/throughput of read operations
•Depends on block size• Would like to evaluate the performance with RDMA•Would like to evaluate distribution algorithms
![Page 29: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/29.jpg)
Acknowledgements
Our research is supported by the following organizations
![Page 30: IMCa: A High-Performance Caching Front-end for GlusterFS ...mvapich.cse.ohio-state.edu/static/media/publications/slide/imca... · IMCa: A High-Performance Caching Front-end for GlusterFS](https://reader031.vdocuments.us/reader031/viewer/2022030423/5aaab4b67f8b9a9a188e90e6/html5/thumbnails/30.jpg)
Thank you
{noronha, panda}@cse.ohio-state.edu
Network-Based Computing Laboratory
http://nowlab.cse.ohio-state.edu/