![Page 1: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/1.jpg)
Parallel Databases
1
![Page 2: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/2.jpg)
Introduction Basic idea: use multiple disks, memory and/or
processors to speed up querying. Measures
– Throughput – how many tasks can be completed in some unit of time.
– Response time – how long does it take to complete one task?
Using parallelism to increase response time is called speedup.
Using parallelism to increase throughput is called scale up. 2
![Page 3: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/3.jpg)
Problems Optimally, we would like linear scale up/speedup.
This is not usually the case. Why?
– Start Up Costs
– Interference – different processors need the same resource.
– Communication Costs
– Some parts may not be able to be parallelized.
– Skew – Not likely to be able to break problem into equal sized parts.
3
![Page 4: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/4.jpg)
Skew Example Suppose I have 8 processors to do a query. I
should be able to do it in 1/8 the time. Now suppose data is distributed this way:
– P1: 5%
– P2: 10%
– P3: 10%
– P4: 5%
– P5: 10%
– P6: 10%
– P7: 25% -- these only allow ¼ of the time.
– P8: 25% 4
![Page 5: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/5.jpg)
What Can Be Shared? Share Memory
– Advantages:
• dynamic partitioning (any process may be allocated all/some of memory available).
• Cheaper than each processor having its own memory.
• Lower communication cost between processors
– Disadvantages:
• Memory can become a bottleneck.
• Scalability is a problem.5
![Page 6: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/6.jpg)
Sharing Continued
Share Disk– Advantages:
• Data need not be replicated – no synchronization
• Better scalability
• Fault tolerance may be built into the system
– Disadvantages:
• Single point of failure
• Communication cost is greater
6
![Page 7: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/7.jpg)
Sharing III
Share Nothing -- really a type of distributed DB– Advantages:
• Complete parallel solution
• Less bottlenecks
• Multiple points of failures
• Scalability
– Disadvantages:
• Cost for the bean counters
• Communication costs are greater
• Multiple points of failures7
![Page 8: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/8.jpg)
Sharing IV
Hierarchical– Advantages:
• Gain advantages of speed and scalability
– Disadvantages:
• How to partition?
8
![Page 9: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/9.jpg)
Disk Partitioning
wikipedia-Standard RAID levels
9
![Page 10: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/10.jpg)
Disk Partitioning for DB Usage
Round Robin Partitioning – like RAID 5 Range Partitioning – all tuples with a column
value within some range go to the same partition. Hash Partition – all tuples with a column value
that hash to the same value go to the same partition.
10
![Page 11: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/11.jpg)
Usage
Which is best for– Simple selects – unique match
– Simple selects – non-unique match
– Range queries
– Print unsorted
– Print sorted
11
![Page 12: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/12.jpg)
Skew In This Context
Attribute-Value Skew – many tuples with the same value for the partitioning column.
Partition Skew – some partitions end up with more tuples, even if they have different values.– Change the ranges – use a histogram to better predict
cut-offs.
Time-Value Skew – a good partitioning algorithm acquires skew over time.
12
![Page 13: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/13.jpg)
Parallel Joins
R ⨝(A=B) S– Range Partition R on A and S on B. Pass same ranges
off to the same partition.
– Hash Partition – would also work
R ⨝(A<B) S– Partition R and replicate S.
13
![Page 14: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/14.jpg)
Example
Emp(Fn, Minit, LN, SSN, Bdate, Addr, Sex, Salary, SuperSSN, Dno)
– r = 100,000 records
– bf = 5 records/block
– b = 20,000 blocks
Dept(D#, Dname, MGRSSN, MgrStartDate)– r = 1250 records
– bf = 10 records/block
– b = 125 blocks
14
![Page 15: Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649eff5503460f94c14abd/html5/thumbnails/15.jpg)
Example Query
I want to perform
Emp ⨝(DNO=D#) Dept
How can I parallelize this and how much can I save?
15