join processing in database systems with large main memories (part 2)
DESCRIPTION
Join Processing in Database Systems with Large Main Memories (part 2). Yongjoo (Young) Park [1] Query Optimization for Parallel Execution, Ganguly et al. [2] Logic per track devices, Slotnick [3] Implementing a relational database by means of specialzed hardware, Bobb - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/1.jpg)
04/22/2023 EECS 584, Fall 2009 1
Join Processing in Database Systems with Large Main Memories (part 2)
Yongjoo (Young) Park• [1] Query Optimization for Parallel Execution, Ganguly et al.• [2] Logic per track devices, Slotnick• [3] Implementing a relational database by means of specialzed hardware, Bobb• [4] Database System Concepts, 5ed, p.861-862• [5] Netezza_Appliance_Architecture_WP.pdf
![Page 2: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/2.jpg)
04/22/2023 EECS 584, Fall 2009 2
Contents
Comparison of Four Join Methods Partition Overflow Other Tools Memory Management Strategy
![Page 3: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/3.jpg)
04/22/2023 EECS 584, Fall 2009 3
First
Comparison of Four Join Methods– Merge, Simple Hash, GRACE, Hybrid
Partition Overflow Other Tools Memory Management Strategy
![Page 4: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/4.jpg)
COMPARISON OF FOUR JOIN METHODS
Assumptions Made
No relations ordered or indexed Minimum memory size
– for Sort-merge– for Hash
Fixed real memory size
04/22/2023 EECS 584, Fall 2009 4
![Page 5: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/5.jpg)
COMPARISON OF FOUR JOIN METHODS
CPU + I/O Times
04/22/2023 EECS 584, Fall 2009 5
Try to guess what’s whatamong..• GRACE
• Simple hash
• Sort Merge
• Hybrid hash
![Page 6: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/6.jpg)
COMPARISON OF FOUR JOIN METHODS
CPU + I/O Times
04/22/2023 EECS 584, Fall 2009 6
(a) Sort-merge
(b) Simple hash
(c) GRACE
(d) Hybrid hash
Answer is
![Page 7: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/7.jpg)
COMPARISON OF FOUR JOIN METHODS
CPU + I/O Times
04/22/2023 EECS 584, Fall 2009 7
Large memory area
• Simple hash dominates• Hybrid acts like Simple hash
![Page 8: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/8.jpg)
COMPARISON OF FOUR JOIN METHODS
CPU + I/O Times
04/22/2023 EECS 584, Fall 2009 8
Small memory area
• GRACE dominates• Hybrid generally acts like
GRACE• It also use remaining memory
![Page 9: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/9.jpg)
COMPARISON OF FOUR JOIN METHODS
Comparisons Made
In Paper– Hybrid > Simple– Hybrid > GRACE– Hybrid > Merge– GRACE > Merge
1st and 2nd are covered here
04/22/2023 EECS 584, Fall 2009 9
![Page 10: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/10.jpg)
COMPARISON OF FOUR JOIN METHODS
Hybrid vs. Simple
04/22/2023 EECS 584, Fall 2009 10
![Page 11: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/11.jpg)
COMPARISON OF FOUR JOIN METHODS
GRACE and Hybrid
They behave slightly diff, when there’s remaining memory
Comparison on next slides with:– |R| = 1000, F = 1.4, |M| = 100
04/22/2023 EECS 584, Fall 2009 11
![Page 12: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/12.jpg)
COMPARISON OF FOUR JOIN METHODS
GRACE and Hybrid
GRACE (1st phase):
04/22/2023 EECS 584, Fall 2009 12
= 38 …
Used as output buffer
Each
![Page 13: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/13.jpg)
COMPARISON OF FOUR JOIN METHODS
GRACE and Hybrid
GRACE (1st phase modified):
04/22/2023 EECS 584, Fall 2009 13
= 38 = 62 …
Used to save passed-overUsed as output buffer
![Page 14: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/14.jpg)
COMPARISON OF FOUR JOIN METHODS
GRACE and Hybrid
Hybrid (1st phase):
04/22/2023 EECS 584, Fall 2009 14
= 86= 14 …
Used to build hash table
Used as Output buffer
![Page 15: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/15.jpg)
COMPARISON OF FOUR JOIN METHODS
GRACE and Hybrid
GRACE:– Saved = 62 blocks
Hybrid:– Saved = 86 blocks
04/22/2023 EECS 584, Fall 2009 15
![Page 16: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/16.jpg)
COMPARISON OF FOUR JOIN METHODS
Other Advantage of Hash over Merge
Your thinking?
04/22/2023 EECS 584, Fall 2009 16
![Page 17: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/17.jpg)
COMPARISON OF FOUR JOIN METHODS
Other Advantage of Hash over Merge
Good response time– Easier to pipeline to next join [1]
04/22/2023 EECS 584, Fall 2009 17
R1 R2 R3
Not ordered
Wait until J1 finish,if Merge join
J1
J2
![Page 18: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/18.jpg)
COMPARISON OF FOUR JOIN METHODS
Hybrid Always Win?
Scenario that Sort-merge is better than Hybrid?
04/22/2023 EECS 584, Fall 2009 18
![Page 19: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/19.jpg)
COMPARISON OF FOUR JOIN METHODS
Sort-merge Worth to Consider
Scenario that Sort-merge is better than Hybrid– Either relation is sorted– Hard to estimate data distribution
04/22/2023 EECS 584, Fall 2009 19
![Page 20: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/20.jpg)
04/22/2023 EECS 584, Fall 2009 20
Second
Comparison of Four Join Methods Partition Overflow Other Tools Memory Management Strategy
![Page 21: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/21.jpg)
PARTITION OVERFLOW
Partition Overflow
Previous assumption– Ideal hash function that partition into same
size of sets GRACE handle overflow by ‘tuning’
– Begin with small partitions– Combine them when size is known
04/22/2023 EECS 584, Fall 2009 21
![Page 22: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/22.jpg)
PARTITION OVERFLOW
Other Approach to Prevent O/F
Hash function with random property– Uniformly distribute data
Store statistics about distribution– System catalog in previous paper– Use of histogram
These are Not Enough!
04/22/2023 EECS 584, Fall 2009 22
![Page 23: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/23.jpg)
PARTITION OVERFLOW
On Disk Overflow
Partition is too large to fit in memory– Reprocess the partition into smaller ones– One that exactly fit in memory, plus add
smaller one to another partition– Relation S should use same function too
04/22/2023 EECS 584, Fall 2009 23
![Page 24: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/24.jpg)
PARTITION OVERFLOW
In Memory Overflow
Currently hashed subset is too large– Reassign some buckets onto disk– Use different hash function– Relation S should use same function too
04/22/2023 EECS 584, Fall 2009 24
![Page 25: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/25.jpg)
04/22/2023 EECS 584, Fall 2009 25
Third
Comparison of Four Join Methods Partition Overflow Other Tools
– Disk filters– Bobb array– Semijoin
Memory Management Strategy
![Page 26: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/26.jpg)
OTHER TOOLS
Database Filter
Mechanism to process records as they come off disk
Use per-track wired logic circuits [2] Company Netezza use similar approach
with FPGAs [5]– “The Netezza architecture is based on … to filter out
extraneous data as early in the data stream as possible and as fast as data streams off the disk.”
04/22/2023 EECS 584, Fall 2009 26
![Page 27: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/27.jpg)
OTHER TOOLS
Bobb Array
Researched as an extension to CAFS– Content Addressable File Store (CAFS), company product– More on Wikipedia : )
Disk-head-attached logic device [3] Highly dependent on layout of data
04/22/2023 EECS 584, Fall 2009 27
![Page 28: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/28.jpg)
OTHER TOOLS
Semijoin
How to calculate– Construct– Join and , which is semijoin of
and– Join semijoin to
Good in distributed environment and with low selectivity of [4]
04/22/2023 EECS 584, Fall 2009 28
( )R( )R SS
R
R
R
![Page 29: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/29.jpg)
04/22/2023 EECS 584, Fall 2009 29
Fourth
Comparison of Four Join Methods Partition Overflow Other Tools Memory Management Strategy
– Hot set + VM– Why bad utilization
![Page 30: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/30.jpg)
MEMORY MANAGEMENT STRATEGY
Challenges in Memory Mgmt
No fixed amount of memory
OS needs to consider future processes
What if several processes request
04/22/2023 EECS 584, Fall 2009 30
![Page 31: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/31.jpg)
MEMORY MANAGEMENT STRATEGY
Hot Set + Virtual Memory
All requests in virtual memory– Compete for real memory
Hot set wired down to real memory– Hot set size is estimated by the access
planner
04/22/2023 EECS 584, Fall 2009 31
![Page 32: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/32.jpg)
MEMORY MANAGEMENT STRATEGY
Good Case
If – Only two sets and are constructed– Read back in opposite order– All blocks in real memory are processed
before paged out
04/22/2023 EECS 584, Fall 2009 32
* 2F R M
0R 1R
![Page 33: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/33.jpg)
MEMORY MANAGEMENT STRATEGY
Bad Case
If– Only blocks are read directly
• is ratio of ‘in memory’– See next slide for calculation
04/22/2023 EECS 584, Fall 2009 33
* 2F R M2C Q
Q
![Page 34: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/34.jpg)
MEMORY MANAGEMENT STRATEGY
C1, C2, C3 in C– C1 were on disk, and processed– C2 were in memory, and processed– C3 were in memory, and not processed
Bad Case
04/22/2023 EECS 584, Fall 2009 34
![Page 35: Join Processing in Database Systems with Large Main Memories (part 2)](https://reader036.vdocuments.us/reader036/viewer/2022062301/56815d06550346895dcb07ff/html5/thumbnails/35.jpg)
04/22/2023 EECS 584, Fall 2009 35
Thank You