an evaluation of using deduplication in swappers weiyan wang, chen zeng
TRANSCRIPT
![Page 1: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/1.jpg)
An Evaluation of Using Deduplication in Swappers
Weiyan Wang, Chen Zeng
![Page 2: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/2.jpg)
Motivation
Deduplication detects duplicate pages in storageNetApp, Data Domain: billion $ business
We explore another direction: use deduplication in swappers
Our experimental results indicate that using deduplication in swappers is beneficial
![Page 3: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/3.jpg)
What is a swapper?
A mechanism to expand usable address spacesSwap out: swap a page in memory to swap areaSwap in: swap a page in swap area to memory
Swap area is on disk
pte’
Free P1Used P1
![Page 4: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/4.jpg)
Why deduplication is useful?
Writes to disk is slowDisk accesses is much slower than memory!
When duplicate pages exist:Do we really need to swap out all of them? If a duplicate page appear in swap area, we can save
one I/O.
P1 P3P2
P1
![Page 5: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/5.jpg)
Architecture
Swap out A page
Compute checksum
Lookup in the dedupcache
YES
Skip pageout
pageout
NO
Add to dedup cache
![Page 6: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/6.jpg)
Computing Checksum
SHA-1 checksum (160bit)Collision probability of one in 280
Only use the first 32bit (one in 216)Related to the implementation of dedup cache
Only store checksum
We assume two pages are identical if their checksums are equalTrade consistency for performance
![Page 7: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/7.jpg)
Dedup Cache
Dedup cache - radix tree Checksum -> dedup_entry_tA Trie with O(|key|) lookup and update
overheadWell written in the kernel
Key in radix tree is 32 bitsWe only keep the first 32 bits of a checksum as
key
![Page 8: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/8.jpg)
Entries in Dedup Cache
The index of a page in swap areaThe number of duplicates pages given a
checksumA lock for consistency typedef struct {
swp_entry_t base;
atomic_t count;
spinlock_t lock;
}dedup_entry_t;
![Page 9: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/9.jpg)
Changes to Linux Kernel
Swap cacheswap_entry_t ->pageAvoid repeatedly swapping in
Happens when a page swapped out is shared by multiple processes
ExampleProcess A and B share the page PP is swapped out, PTE in A and B are updatedA wants to access PB wants to access P
![Page 10: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/10.jpg)
Will dedup cache grows infinitely?
Swap Counter for each swap_entry_t# of reference in the memorycounter++ when
one more pte contains swap_entry_tIt’s in swap cacheIt’s in dedup cache
counter-- when swap in a page remove swap_entry_t from dedup cache and
swap cache when counter = 2
![Page 11: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/11.jpg)
Reference Counters
(4)
A
B
Swap cache
dedup cache
Swap area
(2)
![Page 12: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/12.jpg)
Changes to Swap Cache
Maintain the mapping between swap_entry and page
We change that mapping to swap_entry and a list of pages of same contents
Why we need a list?
![Page 13: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/13.jpg)
Possible Inconsistency
Swap out page P1 to swap_entry e1Swap out page P2, a duplicate of P1
The mapping of e1->P2 can not be added to swap cache
Swap in P1: mapping is deleted Swap in P2: Ooops!
Swap Cache
E1 -> P1
![Page 14: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/14.jpg)
Our Solution
Swap out page P1 to swap_entry E1Swap out page P2, a duplicate of P1
The mapping of e1->P2 is added to the list
Swap in P1: only P1 is deleted Swap in P2: delete E1->P2
Swap Cache
E1 -> P2E1 -> P1,P2E1 -> P1
![Page 15: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/15.jpg)
Experimental Evaluation
We run our experiment on VMWare with Linux 2.6.26
Our testing program: sequentially access an arrayEach element is of size 4KBWe change the percentage of duplicate pages
in that array
![Page 16: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/16.jpg)
All of the pages are duplicates
Duplication significantly reduces the access time
![Page 17: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/17.jpg)
No Duplicate Pages
However, duplication also incurs a significant overhead
![Page 18: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/18.jpg)
Overheads in Deduplication
Major overheads:Calculating checksums: 35 us
When a page is swapped in or swapped out, we all calculate the checksums.
Maintain the reference counterExplicitly require locks impose significant overhead:
average of 65 us in our experiments
![Page 19: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng](https://reader035.vdocuments.us/reader035/viewer/2022072011/56649e005503460f94ae8b83/html5/thumbnails/19.jpg)
Conclusion
Deduplication is a double-edged sword in swappersWhen a lot of duplicate pages are presented,
deduplication reduces the access time by orders of magnitude
When few duplicate pages are presented, the overhead is also non-negligible