Massively Distributed Database Systems
Broadcasting - Data on airSpring 2014Ki-Joune Li
http://isel.cs.pusan.ac.kr/~likPusan National University
2
Why Broadcasting?
• Simple• Data Access Pattern: mostly asymmetric • Scalability – Very adequate for massively distrib-
uted environments• Example• DMB• TPEG
3
TPEG – Transport Protocol Experts Group
• Broadcasting traffic information protocol
4
TPEG – Message format
5
TPEG Service Contents Example
6
TPEG Service
7
Air Update – Map Data Update
8
Basic Idea – Broadcast DisksDisk Broadcast
Disk Access Time Frequency (Broadcasting Period)
Block Packet
Memory Hierarchy Multiple Broadcasting Disks (paper -1)
File Structure Message Format (paper -2)
Indexing Indexing Broadcasting (paper – 3)
Query Processing Query processing for Broadcasting Data(paper – 4)
9
Key papers and documents
• S. Acharya, et al. “Broadcast Disks: Data Management for Asymmetric Communication Environments”, ACM SIGMOD 1996, pp.199-210• T. Imielinkski, S. Viswanathan, and B.R. Badrinath, “Data on
Air: Organization and Access”, IEEE TKDE Vol.9 No.3, 1997, pp.353-372• J. Xu et al. “Energy Efficient Indexing for Quering Location
Dependent Data in Mobile Broadcasting Environments, ICDE 2003, pp.239-250• B. Zheng et al. “Spatial Queries in Wireless Broadcast Sys-
tems”, Wireless Network, Vol.10, pp.723-736, 2004• tisa.org, TPEG, http://www.tisa.org/assets/Uploads/
Public/TISA14001TPEGWhatisitallabout2014.pdf
10
Paper #1 – Broadcasting disks in SIGMOD 1995
11
Key Ideas
• Broadcasting as a disk• How to organize broadcast message• Flat Message as a disk• Message with different frequencies as multiple disks
• Two Issues• How to organize message – Server Side• How to maintain cache – Client Side
12
Message Format
• Given three data items A, B, and C to broadcast with different access probability,
Flat format
Skewed format
Multiple disks format
13
Performance Measures
• What is the goal?• To minimize the average waiting time (expected delay)
• Example
14
Message Formatting Method - Server
• Algorithm• 1. Sort and classify pages by access probability • 2. Determine relative frequency of each disk (page)• 3. Partition each disk into a set of chunks• 4. Define the message format with multiple disks
• Example• 4 pages/cycle
Relative frequenciesF(T1)=1, F(T2)=2, F(T3)=4
LCM=4 minor cycles
Length(T3)/LCM=2
Major Cycle=S*LCM
15
Caching Policy at Client
• Replacement Policy• Not LRU
• Point 1Caching hottest page – problematic.If a page is considered as a hottest page by server, then frequent broadcasting, and therefore caching is not really necessary
• Point 2Server’s policy is to minimize the average delay!= Local Demands
16
Caching Policy at Client
• For a given item A, we need to consider • Broadcasting frequency (X) and• Local access probability (P)
• Replacement in terms of• PIX (P/X) instead of LRU
17
Paper #2 – Organization and Access, TKDE 9(3), 1997
18
Key Ideas
• Disk Access – Disk Access Time• Two different measures• Latency and• Energy Consumption
• Data Access Time in Data on Air• Tuning Time: Amount of time spent by a client listening
to the channel Power Consumption• Latency: Time elapsed from the time that a client re-
quests data to the point of completing data downloads • Tuning time + Latency Data Access Time
19
Broadcast data format
Bucket ID
Bcast ptr
idx ptr
Bucket type
Bucket
. . .
bcast
• Without Index, we need a full scanning of a bcast• Issue• How to organize and Where to place Index• For reducing tuning time and latency
20
Data Access
. . .
1. Client joins here
Index
2. Wait until the index arrives
3. Wait until data bucket arrives
. . .
4. Read data
21
Where to place Index
No Index
Single Index
(1,m) Index
What’s the difference? Probably (1,m) may improve the performance
22
How to organize
• Full duplication vs. Relevant Duplication
23
No replication
24
Entire Path Replication
25
Distributed Index