Download - Cost-effective Hybrid Storages
![Page 1: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/1.jpg)
Cost-effective Hybrid Storages
Flash Group, Cao Qingling
![Page 2: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/2.jpg)
Motivation
• Dry up things with the least money!
![Page 3: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/3.jpg)
Motivation
• High cost, low density, low reliability.• Replacement as HDD is not recommended, especially
when the data volume is very large.• Cache as HDD, make up the gap between RAM and
HDD. Hurt the lifetime.• A permanent store at the same level as HDD, store
some special data.
![Page 4: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/4.jpg)
![Page 5: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/5.jpg)
Introduction
• Put forward a Hybrid Hbase with SSD.• Storing system component of Hbase in SSD,
which at the same level as HDD.• Perform quantitative assessment, Hybrid
Hbase perform 1.5-2 times better.
![Page 6: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/6.jpg)
HBase
• Column-based key-value store.
• Each region server has a write-ahead log(WAL).
• First write WAL and then in-memory memstore.
• Region is a horizontal division.
• A region could split.
Data on disks is stored as Log-structured merge(LSM) trees.
![Page 7: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/7.jpg)
HBase System Component
Zookeeper:• Clients contact it for -ROOT- table.• Master contacts it to know available region servers.• Region servers contact with it in a heartbeat keep-
alive mechanism.• Zookeeper is I/O intensive.Catalog Tables:• -ROOT- and .META. Tables.• Mostly read intensive and are not updated frequently.
![Page 8: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/8.jpg)
HBase System Component
Write-ahead-log(WAL):• Any write is first done on the WAL.• The size grows with: i) WAL committed; ii) write rate;
iii) the size of key-value pair.
Temporary Storage:• Used when a region is split or merged.• Sequentially read or write.
![Page 9: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/9.jpg)
Assessment
1% of the database size. Gain more than 10% performance.
Price: 1:10
![Page 10: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/10.jpg)
Experimental Evaluation
• Experiment: Intel processor(4 cores and 4 threads at 3 GHz) with 8 GB RAM, Western Digital 1TB HDD, Kingston 128 GB SSD.
• Yahoo! Cloud Serving Benching(YCSB).• Workloads: 100w queries on database with
6000w records. Record size is 1KB. Totally 72 regions.
![Page 11: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/11.jpg)
Experimental Evaluation
![Page 12: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/12.jpg)
Experimental Evaluation
![Page 13: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/13.jpg)
![Page 14: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/14.jpg)
Introduction
• Approximate membership query data structure(AMQ). Bloom Filter.
• Larger than RAM, performance decays.• Quotient Filter: better data locality,
squential operations, available delete, dynamically resized, space-saving.
• Buffered Quotient Filter(BQF) and Cascade Filter(CF) designed for flash.
![Page 15: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/15.jpg)
Introduction
• Approximate membership query data structure(AMQ). Bloom Filter.
• Larger than RAM, performance decays.• Quotient Filter: better data locality,
squential operations, available delete, dynamically resized, space-saving.
• Buffered Quotient Filter(BQF) and Cascade Filter(CF) designed for flash.
![Page 16: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/16.jpg)
Quotient Filter
• fr = f mod 2r
• fq =
• T[fq] = fr Fingerprint: f = fq2r + fr.
![Page 17: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/17.jpg)
Quotient Filter
• is_occupied: check if fq = i, namely if T[i] has data.
• is_shifted: if fr belongs to slot i.• is_continuation: if blongs to the same run with i-1.
Physical Storage
run
![Page 18: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/18.jpg)
Quotient Filter
• Check if f in the QA: step1: step2: to the beginning of the cluster. step3: to the start of the run. step4: search f. • Insert a f.• Delete a f.
![Page 19: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/19.jpg)
Quotient Filters on Flash
• Buffered Quotient Filter - BQF: one QF as the buffer, another on SSD. - Optimized for lookup performance.• Cascade Filter - Optimized for insertion. - Offer a trade off between lookup and insertion.
![Page 20: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/20.jpg)
Quotient Filter on Flash
• Cascade Filter - Based on cache-oblivious lookahead arrary(COLA).
![Page 21: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/21.jpg)
Evaluation
![Page 22: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/22.jpg)
Conclusions
• Bloom Filter has wide use in key-value storage.• Change the way of thinking.• Gain inspiration from traditional algorithms of
database.• Design corresponding hybrid system by
applications.
![Page 23: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/23.jpg)
![Page 24: Cost-effective Hybrid Storages](https://reader036.vdocuments.us/reader036/viewer/2022062309/568158ec550346895dc62da1/html5/thumbnails/24.jpg)
Bloom Filter
• Initial state:
• Insert: H(1), H(b).
• Can not expend, support no delete, poor data locality.
Back