parallel database system: the future of high performance database systems
DESCRIPTION
Parallel Database System: The Future of High Performance Database Systems. Present by: Suresh Babu L. Outline. Why parallel Databases? Scale up and Speedup Parallel DB’s Architectures Parallel Data Flow Data Partitioning Parallelism with Relational Operators The State of the Art. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/1.jpg)
Parallel Database System: The Parallel Database System: The Future of High Performance Future of High Performance
Database SystemsDatabase Systems
Present by: Suresh Babu LPresent by: Suresh Babu L
1
![Page 2: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/2.jpg)
OutlineOutline
Why parallel Databases?Why parallel Databases? Scale up and SpeedupScale up and Speedup Parallel DB’s ArchitecturesParallel DB’s Architectures Parallel Data FlowParallel Data Flow Data PartitioningData Partitioning Parallelism with Relational Parallelism with Relational
OperatorsOperators The State of the Art The State of the Art
2
![Page 3: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/3.jpg)
Why Parallel Databases?Why Parallel Databases?
Edgar F.CoddEdgar F.Codd
3
![Page 4: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/4.jpg)
Parallel Access to DataParallel Access to Data
1 Terabyte1 Terabyte
10 MB/s
1 Terabyte1 Terabyte
1,000 x parallel100 second SCAN.
Parallelism: divide a big problem into many smaller ones
to be solved in parallel.
BANDWID
TH
10 GB/s
At 10 MB/s1.2 days to scan
4
![Page 5: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/5.jpg)
Parallel DBMS: IntroParallel DBMS: Intro
Pipeline Any Sequential Program
Any Sequential Program
Partition outputs split N ways inputs merge M ways
SequentialSequential
SequentialSequential Any Sequential Program
Any Sequential Program
5
Pipeline parallelism:Pipeline parallelism:
Pipeline partition:Pipeline partition:
![Page 6: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/6.jpg)
Pipelined and Pipelined and Partitioned ParallelismPartitioned Parallelism
Both are natural in DBMS!Both are natural in DBMS!
Pipeline parallelismPipeline parallelism Partitioned data allows partitioned Partitioned data allows partitioned parallelismparallelism
6
Source Data
Scan Scan Scan Scan
Merge
Sort Sort Sort Sort
Source Data
Source Data
Source Data
Source Data
Scan
Sort
![Page 7: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/7.jpg)
Scale-Up And Speed-UpScale-Up And Speed-Up SpeedupSpeedup
Scale-up:Scale-up:
7
100GB 100GB
100GB 1TB
![Page 8: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/8.jpg)
Barriers to Achieving Barriers to Achieving Linear Speedup and Linear Speedup and
ScaleupScaleup
8
A Bad Speedup Curve
3-Factors
Processers & Discs
Inte
rfer
ence
Ske
w
Sta
rtu
p
![Page 9: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/9.jpg)
Architectures for Parallel Architectures for Parallel DBsDBs
Shared memory:Shared memory:
Shared –disks:Shared –disks:
CLIENTS
MemoryProcessors
CLIENTS
IBM/370 ,Sequent, SGI, Sun
VMScluster, Sysplex
9
![Page 10: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/10.jpg)
Architectures for Parallel Architectures for Parallel DBs(contd.)DBs(contd.)
Shared Nothing: Shared Nothing: CLIENTS
Tandem, Teradata, SP2
10
![Page 11: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/11.jpg)
Architectures (contd.)Architectures (contd.)Shared Nothing
Teradata: 400 nodes 80x12 nodes
Tandem: 110 nodesIBM / SP2 / DB2: 128 nodesInformix/SP2 100 nodesATT & Sybase 8x14 nodes
Shared DiskOracle 170 nodesRdb 24 nodes
Shared MemoryInformix 9 nodes RedBrick ? nodes
CLIENTS
MemoryProcessors
CLIENTS
CLIENTS
11
![Page 12: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/12.jpg)
Parallel Data Flow and Parallel Data Flow and Relational SystemsRelational Systems
12
Source Data
Scan Scan Scan Scan
Merge
Sort Sort Sort Sort
Source Data
Source Data
Source Data
![Page 13: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/13.jpg)
Data PartitioningData Partitioning
Three main techniques:Three main techniques: Round RobinRound Robin Hash PartitioningHash Partitioning Range partitioningRange partitioning
13
![Page 14: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/14.jpg)
Round Robin Round Robin PartitioningPartitioning
……..
…..
P1 P2 Pn
14
![Page 15: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/15.jpg)
Hash PartitioningHash Partitioning
……..
P1 P2 Pn
15
![Page 16: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/16.jpg)
Range PartitioningRange Partitioning
……..
…… ……
a….c d…..g w…z
P1 P2 Pn
16
![Page 17: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/17.jpg)
Parallelism with Parallelism with Relational OperatorsRelational Operators
Two basic operations:Two basic operations: Merge Merge SplitSplit
17
![Page 18: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/18.jpg)
Merge OperationMerge Operation
18
![Page 19: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/19.jpg)
Split OperationSplit Operation SplitSplit
Used to partition or replicate the stream Used to partition or replicate the stream produced by a relational operatorproduced by a relational operator
19
![Page 20: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/20.jpg)
Example of Parallelizing Example of Parallelizing Relational OperatorsRelational Operators
CC
AA B B
20
SCAN SCAN
JOIN
INSERT
![Page 21: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/21.jpg)
Example (contd.)Example (contd.)
21
![Page 22: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/22.jpg)
The State of the Art The State of the Art
TeradataTeradata Tandem Nonstop sqlTandem Nonstop sql GammaGamma The super database computerThe super database computer BubbaBubba
22
![Page 23: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/23.jpg)
Specialized Parallel Specialized Parallel Relational OperatorsRelational Operators
Algorithms for traditional relational Algorithms for traditional relational operators written to improve their operators written to improve their parallel execution, to better handle parallel execution, to better handle data and execution skew.data and execution skew.
Look at joinLook at join Sort mergeSort merge Hash joinHash join
23
![Page 24: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/24.jpg)
CONCLUSIONCONCLUSION
24
![Page 25: Parallel Database System: The Future of High Performance Database Systems](https://reader035.vdocuments.us/reader035/viewer/2022062309/56813574550346895d9cd82d/html5/thumbnails/25.jpg)
THANK YOUTHANK YOU
QUESTIONS ?QUESTIONS ?25