fastbit for allele data dave matthews usda-ars, cornell university ithaca, ny 10 april 2012
TRANSCRIPT
![Page 1: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/1.jpg)
FastBit for Allele DataFastBit for Allele Data
Dave MatthewsDave MatthewsUSDA-ARS, Cornell UniversityUSDA-ARS, Cornell University
Ithaca, NYIthaca, NY
10 April 201210 April 2012
![Page 2: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/2.jpg)
A Lightning-Fast Index Drives Massive Data Analysis
http://www.scidacreview.org/0904/html/fastbit.html
FastBit significantly improves the speed of a searching operation onboth high- and low-cardinality values with a number of techniques,including a vertical data organization, an innovative bitmap compressiontechnique, and several new bitmap encoding methods...The ability to index high-cardinality data is unique to FastBit and isnot supported by other bitmap indexing methods.
![Page 3: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/3.jpg)
Allele Data Variables
Allele = f(Marker, Line, Experiment)Size:
10^9 10^4 10^4 10^1
Cardinality:
2 = = =
![Page 4: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/4.jpg)
Bitmap Indexing
![Page 5: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/5.jpg)
The FastBit Technologies
1. vertical data organization
= 'vertical partitioning'. Only a few of the
(hundreds of) variables in each partition.
2. bitmap compression: Word-Aligned Hybrid Compression
3. two-level bitmap encoding
![Page 6: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/6.jpg)
Word-aligned Hybrid Compression
• run-length encoding• 31-bit groups
![Page 7: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/7.jpg)
Two-level Bitmap Encoding
• Approximate solution, then refine.
• Bin the values into groups, e.g. A to G, H to P, Q to Z.
• Encode the bin identifiers as bitmap.
• Encodings: equality, range, interval.– Interval has half the number of bitmap indexes.
• Multicomponent encoding: Bin the bins to reduce number of bitmap indexes.
• Multi-level encoding: hierarchy of bins, coarse to fine. Use interval encoding for coarse, equality for fine.
![Page 8: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/8.jpg)
Indexing Bin Identifiers
![Page 9: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/9.jpg)
Querying on more than one variable
FastBit performs extremely well on multi-variable queries because the intersection between the search results on each variable is a simple AND operation over the resulting bitmaps.
![Page 10: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/10.jpg)
Performance
![Page 11: FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012](https://reader033.vdocuments.us/reader033/viewer/2022061304/5513bfbb5503463a298b48e4/html5/thumbnails/11.jpg)
Instructions
http://crd-legacy.lbl.gov/~kewu/fastbit/doc/quickstart.html