effect of node size on the performance of cache-conscious b+ trees
DESCRIPTION
Effect of Node Size on the Performance of Cache-Conscious B+ Trees. Written by: R. Hankins and J.Patel. Presented by: Ori Calvo. Introduction. Who cares about cache improvement Traditional databases are designed to reduce IO accesses. But … Chips are cheap. Chips are big. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/1.jpg)
Effect of Node Size on the Performance of Cache-
Conscious B+ Trees
Written by: R. Hankins and J.Patel
Presented by: Ori Calvo
![Page 2: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/2.jpg)
Introduction Who cares about cache improvement Traditional databases are designed to
reduce IO accesses. But… Chips are cheap. Chips are big. Why not store all the database in
memory? Reducing main memory accesses is the
next challenge.
![Page 3: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/3.jpg)
Objectives Introduction to cache-conscious
B+Trees. Provide a model to analyze the
effect of node size. Examine “real-life” results against
our model’s conclusions.
![Page 4: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/4.jpg)
B+Tree Refresher
d Ordered B+Tree has between d and 2d keys in each node.
Root has between 1 and 2d keys. Every node must be at least half full. 2*(d+1)^(h-1) <= N <= (2d+1)^h Fill percentage is usually ln2 ~ 69%
![Page 5: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/5.jpg)
B+Tree Refresher (Cont…) Good search performance. Good incremental performance. Better cache behavior than T-Tree. What is the optimal node size ?
![Page 6: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/6.jpg)
Improving B+Tree
Question:Assuming node size = cache line size, how can we make B+Tree algorithm to utilize better the cache?
Hint:Locality !!!
![Page 7: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/7.jpg)
Pointer Elimination Node size = cache line size. Only half of a node is used for
storing keys. Get rid of pointers and store more
keys. Instead of pointers to child nodes
use offsets.
![Page 8: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/8.jpg)
Introducing CSB+Tree Balanced search tree. Each node contains m keys, where
d<=m<=2d and d is the order of the tree. All child nodes are put into a node group. Nodes within a node group are stored
contiguously. Each node holds:
pFirstChild - pointer to first child nKeys - number of keys arrKeys[2d] - array of keys
![Page 9: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/9.jpg)
CSB+TreeP N K1 K2
P N K1 K2 P N K1 K2 P N K1 K2
P N K1 K2 P N K1 K2 P N K1 K2
P N K1 K2 P N K1 K2 P N K1 K2
P N K1 K2 P N K1 K2 P N K1 K2
![Page 10: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/10.jpg)
CSB+Tree vs. B+Tree Assuming, node size = 64B B+Tree: 7 Keys + 8 Pointers + 1 Counter CSB+Tree: 1 Pointer + 1 Counter + 14
Keys
Results: A cache line can satisfy almost one more level
of comparisons The fan out is larger Less space
![Page 11: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/11.jpg)
CSS Tree
Can we do more elimination ?
![Page 12: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/12.jpg)
Shaking our foundations Should node size be equal to cache
line size ?
What about instructions count ?
How can we measure the effect of node size on the overall performance ?
![Page 13: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/13.jpg)
Building Execution Time Model We need to take into account:
Instruction executed. Data cache misses. Instruction cache misses (Only 0.5%). Mis-predicted branches.
Model the above during an equality search.
Should be independent of implementation and platform details, but …
![Page 14: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/14.jpg)
Execution Time ModelT = I*cpi + M*miss_latency + B*pred_penalty
VariableDescriptionValueDepend upon
cpiProcessor clock cycles per instruction executed
0.63 (P3)Platform
miss_latencyProcessor clock cycles per L2 cache miss
78 (P3)Platform
pred_penaltyProcessor clock cycles to correct a mis-predicted branch
15 (P3)Platform
IInstructions countImplementation
MData cache misses countImplementation
BMis-predicted branchesImplementation
![Page 15: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/15.jpg)
CPI – 0.63 ? Can be extracted from a processor’s
design manual, but.. Modern processor are very complex Some instructions require more time
to retire than others On Pentium 3 CPI is between 0.33 to
14
![Page 16: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/16.jpg)
Other PSV – Where do they come from?
Miss_latency Same problems as CPI
Pred_penalty The manual provides tight upper and
lower bounds.
![Page 17: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/17.jpg)
PSV Experiment
For(I=0; I<Queries; I++) {address = origin + random offsetval = *address;for(j=0; j<Instructions; j++) {
/* Computing involving “val” */}
}
![Page 18: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/18.jpg)
PSV Results
![Page 19: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/19.jpg)
Calculate I I is depended upon the actual
implementation of the CSB+Tree
Two main components: I_search - Searching inside a node I_trav - Node traversals
Analyzing code leads to the following conclusions: I_search ~ 5 I_trav ~ 30
![Page 20: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/20.jpg)
Calculate I_Serach
BinarySearch:middle = (p1+p2)/2;comp *middle,key;jle less;p1 = middle;less:
p2 = middle;jump BinarySearch;
![Page 21: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/21.jpg)
Calculate T_TravNode *Find(Node *pNode,int key) {
int *pKeysBegin = pNode->Keys; (1)int *pKeysEnd = pNode->Keys + pNode->nKeys; (3)int *pFoundKey,foundKey;
pFoundKey = BinarySearch(pKeysBegin,pKeysEnd,key); (8) ?
if( pFoundKey < pKeysEnd ) {foundKey = *pFoundKey;} (3,1)else {foundKey = INFINITE;} (1)
int offset = (int)(pFoundKey - pKeysBegin); (2)Node *pChild = NULL;if( key < foundKey ) {pChild = pNode->pChilds + offset;} (4,1)else {pChild = pNode->pChilds + offset + 1;} (3)
return pChild; --------} (23-25)
![Page 22: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/22.jpg)
Calculate I (Finishing)
travsearch IhefIhI )1(log* 2
•h - Height of the tree
•f - Fill percentage
•e - Max number of keys in a node
![Page 23: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/23.jpg)
Calculate M M_node – Cache misses while
searching inside a node
)(log2 L
)(log2 nKeys
When L is the number of cache line inside a node
![Page 24: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/24.jpg)
Calculate M (Cont…)
Cache misses per tree traversal is bounded by:
TreeHeight * M_node
What about q traversal ?
![Page 25: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/25.jpg)
Calculate M for q traversals Let’s assume there are no cache
conflicts and no capacity misses On first traversal there are M_node
cache misses per node access On subsequent traversals
Nodes near the root will have high probability of being found in the cache
Leaf nodes will have substantially lower probability
![Page 26: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/26.jpg)
Calculate M for q traversals (Cont..) Suppose,
q is the number of queries b is the number of blocks
Then, the number of Unique Blocks that are visited is:
))/11(1(*),( qbbqbUB
![Page 27: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/27.jpg)
Calculate M for q traversals (Finishing)
Assuming q*M_node queries is performed by each tree traversal, then:M is the sum of UB at each level of the tree:
q
MqbUBM nodei
hi
)*,(1
![Page 28: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/28.jpg)
Calculate B•h - Height of the tree
•f - Fill percentage
•e - Max number of keys in a node
2
)1*(log* 2
efhB
![Page 29: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/29.jpg)
Mid year evaluation We built a simple model
T = I*cpi + M*miss_latency + B*pred_penalty
Now, we want to use it
![Page 30: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/30.jpg)
Our model’s prediction We want to look at the performance
behavior that our model predicts on Pentium 3
The following parameters are used 10,000,000 items Number of queries = 10000 Fill percentage = 67% Cache line size = 32 bytes
![Page 31: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/31.jpg)
Effect of node size on cache misses count
![Page 32: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/32.jpg)
Effect of node size on instructions count
![Page 33: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/33.jpg)
Effect of node size on execution time
![Page 34: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/34.jpg)
Numbers Best cache utilization at small node
sizes: 64-256 bytes For larger node sizes there ate fewer
instructions executed, the minimum is reached at 1632 bytes.
Optimal node size is 1632 bytes, performing 26% faster over a node size of 32 bytes.
![Page 35: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/35.jpg)
Our Model Conclusions Conventional wisdom suggests:
Node size = Cache line size
We show:Using large node size can result in better search performance.
![Page 36: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/36.jpg)
Experimental Setup Pentium 3
768MB of main memory 16KB of L1 data cache 512KB of L2 data/instruction cache
• 4-way, set associative• 32 byte of cache line
Linux, kernel version 2.4.13 10,000,000 entries in database The database is queried 10,000 times
![Page 37: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/37.jpg)
Effect of node size on cache misses count
![Page 38: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/38.jpg)
Effect of node size on instructions count
![Page 39: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/39.jpg)
Effect of node size on execution time
![Page 40: Effect of Node Size on the Performance of Cache-Conscious B+ Trees](https://reader036.vdocuments.us/reader036/viewer/2022062520/568159d1550346895dc7229e/html5/thumbnails/40.jpg)
Final Conclusions We investigated the performance of
CSB+Tree We introduced first-order analytical
models We showed that cache misses and
instruction count must be balanced Node size of 512 bytes performs well Larger node size suffer from poor insert
performance