shredder gpu-accelerated incremental storage and computation pramod bhatotia §, rodrigo rodrigues...
TRANSCRIPT
![Page 1: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/1.jpg)
ShredderGPU-Accelerated Incremental
Storage and Computation
Pramod Bhatotia§, Rodrigo Rodrigues§, Akshat Verma¶
§MPI-SWS, Germany¶IBM Research-India
USENIX FAST 2012
![Page 2: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/2.jpg)
Pramod Bhatotia 2
Handling the data deluge• Data stored in data centers is growing at a fast
pace
• Challenge: How to store and process this data ?• Key technique: Redundancy elimination
• Applications of redundancy elimination• Incremental storage: data de-duplication• Incremental computation: selective re-execution
![Page 3: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/3.jpg)
Pramod Bhatotia 3
Redundancy elimination is expensive
For large-scale data, chunking easily becomes a bottleneck
Content-based chunking [SOSP’01]
ContentMarker
ChunkingFile Chunks Hash Yes
No
Duplicate
Hashing Matching
0 1 0 1 1 0 1 0 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 0 1 0 1 0
FileFingerprint
![Page 4: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/4.jpg)
Pramod Bhatotia 4
Accelerate chunking using GPUs
GPUs have been successfully applied to compute-intensive tasks
Multicore GPU based design0
0.5
1
1.5
2
2.5
3
GBp
s
2X
20Gbps (2.5GBps)
Using GPUs for data-intensive tasks presents new challenges
?
StorageServers
![Page 5: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/5.jpg)
Pramod Bhatotia 5
Rest of the talk
• Shredder design• Basic design• Background: GPU architecture & programming
model• Challenges and optimizations
• Evaluation• Case studies
• Computation: Incremental MapReduce• Storage: Cloud backup
![Page 6: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/6.jpg)
Pramod Bhatotia 6
Shredder basic design
GPU (Device)CPU (Host)
Reader
Transfer
Store
Chunkingkernel
Data for chunking
Chunkeddata
![Page 7: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/7.jpg)
Pramod Bhatotia 7
GPU architecture
CPU(Host)
GPU (Device)
Multi-processor N
Multi-processor 2Multi-processor 1
SP SP SPSP
PCI
Host memory
Device global
memory
Shared memory
![Page 8: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/8.jpg)
Pramod Bhatotia 8
GPU programming model
CPU(Host)
GPU (Device)
Multi-processor N
Multi-processor 2Multi-processor 1
SP SP SPSP
PCI
Host memory
Device global
memory
Shared memory
Input
Output
Threads
![Page 9: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/9.jpg)
Pramod Bhatotia 9
Scalability challenges
1. Host-device communication bottlenecks
2. Device memory conflicts
3. Host bottlenecks(See paper for details)
![Page 10: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/10.jpg)
Pramod Bhatotia 10
Host-device communication bottleneck
Challenge # 1
GPU (Device)
Device global
memoryPCI
Synchronous data transfer and kernel execution • Cost of data transfer is comparable to kernel
execution• For large-scale data it involves many data transfers
CPU (Host)
Mainmemory Chunking
kernel
I/O
Transfer
Reader
![Page 11: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/11.jpg)
Pramod Bhatotia 11
Asynchronous execution
TimeCopy to Buffer 1
GPU (Device)Asynchronous
copy
Buffer 1
Device global memoryMain
memoryTransfer Buffer 2
CPU (Host)
ComputeBuffer 2
ComputeBuffer 1
Copy to Buffer 2
Pros: + Overlaps communication with computation+ Generalizes to multi-buffering
Cons:- Requires page-pinning of buffers at host side
![Page 12: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/12.jpg)
Pramod Bhatotia 12
Circular ring pinned memory buffers
GPU (Device)
CPU (Host)
Memcpy
Pinned circular Ring buffers
Pageablebuffers
Device global memory
Asynchronouscopy
![Page 13: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/13.jpg)
Pramod Bhatotia 13
Device memory conflictsChallenge # 2
GPU (Device)
Device global
memoryPCI
CPU (Host)
Mainmemory Chunking
kernel
I/O
Transfer
Reader
![Page 14: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/14.jpg)
Pramod Bhatotia 14
Accessing device memory
Thread-1 Thread-2 Thread-4
Device global memory
Multi-processor
SP-1 SP-2 SP-3 SP-4
Thread-3400-600Cycles
Fewcycles
Device shared memory
![Page 15: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/15.jpg)
Pramod Bhatotia 15
Device global memory
Un-coordinated accesses to global memory lead to a large number of memory bank conflicts
Memory bank conflicts
Multi-processor
SP
Device shared memory
SP SP SP
![Page 16: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/16.jpg)
Pramod Bhatotia 16
Accessing memory banks
Bank 0 Bank 3Bank 1 Bank 2
048
15
26
37
MSBs LSBs
Address
Memoryaddress
Chipenable
OR
Data out
Interleaved memory
![Page 17: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/17.jpg)
Pramod Bhatotia 17
Memory coalescingDevice global memory
Device shared memory
Memorycoalescing
Tim
e1 2 4Thread # 3
![Page 18: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/18.jpg)
Pramod Bhatotia 18
Processing the data
Thread-1 Thread-2 Thread-4
Device shared memory
Thread-3
![Page 19: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/19.jpg)
Pramod Bhatotia 19
Outline
• Shredder design• Evaluation• Case-studies
![Page 20: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/20.jpg)
Pramod Bhatotia 20
Evaluating Shredder
• Goal: Determine how Shredder works in practice• How effective are the optimizations?• How does it compare with multicores?
• Implementation• Host driver in C++ and GPU in CUDA• GPU: NVidia Tesla C2050 cards• Host machine: Intel Xeon with12 cores
(See paper for details)
![Page 21: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/21.jpg)
Pramod Bhatotia 21
Shredder vs. Multicores
Multicore GPU Basic GPU Async GPU Async + Coalescing
0
0.5
1
1.5
2
2.5
GBp
s 5X
![Page 22: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/22.jpg)
Pramod Bhatotia 22
Outline
• Shredder design• Evaluation• Case studies
• Computation: Incremental MapReduce• Storage: Cloud backup (See paper for details)
![Page 23: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/23.jpg)
Pramod Bhatotia 23
Read input
Map tasks
Reduce tasks
Write output
Incremental MapReduce
![Page 24: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/24.jpg)
Pramod Bhatotia 24
Unstable input partitions
Read input
Map tasks
Reduce tasks
Write output
![Page 25: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/25.jpg)
Pramod Bhatotia 25
GPU accelerated Inc-HDFS
Input file
ShredderHDFS Client
copyFromLocal
Split-1 Split-2 Split-3
Content-based chunking
![Page 26: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/26.jpg)
Pramod Bhatotia 26
Related work
• GPU-accelerated systems• Storage: Gibraltar [ICPP’10], HashGPU [HPDC’10]• SSLShader[NSDI’11], PacketShader[SIGCOMM’10],
…
• Incremental computations• Incoop[SOCC’11], Nectar[OSDI’10],
Percolator[OSDI’10],…
![Page 27: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/27.jpg)
Pramod Bhatotia 27
Conclusions
• GPU-accelerated framework for redundancy elimination• Exploits massively parallel GPUs in a cost-
effective manner
• Shredder design incorporates novel optimizations• More data-intensive than previous usage of GPUs
• Shredder can be seamlessly integrated with storage systems • To accelerate incremental storage and
computation
![Page 28: Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues §, Akshat Verma ¶ § MPI-SWS, Germany ¶ IBM Research-India](https://reader035.vdocuments.us/reader035/viewer/2022062619/5518bea5550346a61f8b5491/html5/thumbnails/28.jpg)
Thank You!