efficient data dissemination and survivable data storage
DESCRIPTION
Efficient Data Dissemination and Survivable Data Storage. Lihao Xu http://www.cs.wayne.edu/~lihao/. Ubiquitous Information Access. Key Building Blocks. Storage Retrieval Dissemination Consumption. Key Building Blocks. Storage Retrieval Dissemination Consumption. - PowerPoint PPT PresentationTRANSCRIPT
Efficient Data Dissemination and Survivable Data Storage
Lihao Xuhttp://www.cs.wayne.edu/~lihao/
Ubiquitous Information Access
Key Building Blocks
• Storage
• Retrieval
• Dissemination
• Consumption
Key Building Blocks
• Storage
• RetrievalRetrieval
• Dissemination
• ConsumptionConsumption
Error Correcting Codes
Error Correcting Codes
21 k…3Message
Error Correcting Codes
21 k…3Message
Codeword 21 n - 1…3 n
Error Correcting Codes
21 k…3Message
Codeword 21 n - 1…3 n
m
21 k…3Message
MDS (Maximum Distance Separable ) Codes
m = k
(n,k) MDS Codes
Reed-Solomon (RS) Code
(n,k) MDS Codes
(4,2) B-Code
a
d+c
b
d+a
c
a+b
d
b+c
Data Dissemination:Broadcast Scheduling
WirelessServer
Data Dissemination
want 1want 2
want 1
want 3
WirelessClients
WirelessServer
Broadcast in a Cell
want 1want 2
want 1
want 3
WirelessClients
want 1want 2
want 1
want 3
WirelessServer
Broadcast Model
Model clients as random processesModel clients as random processes Desired item is random with probability Desired item is random with probability ppii
for item for item ii of length of length llii..
WirelessClients
Scheduling Problem
S =
• 2 items, l1=l2
• Each item consists of k packets, k large
• Challenge: choose packet broadcast schedule to minimize wait for clients
1 2 1 2
Prior Work
Complexity of optimal schedules Complexity of optimal schedules Bar-Noy, Bhatia, Naor, Schieber, FoltzBar-Noy, Bhatia, Naor, Schieber, Foltz
Complexity of computing optimal Complexity of computing optimal schedulesschedules Kenyon, SchabanelKenyon, Schabanel
Error correction/detectionError correction/detection BestavrosBestavros
Metric: Delivery Time
Delivery Time for item 1
1,SdelivT
S =
initt
1 2 1 2
Delivery Time
initiS
deliv tT , Total amount of time spent waiting for item i whenstarting at time in schedule S.
initt Instant in time when client starts waiting for item.
S =
initt
1 2 1 2
initt
initiS
deliv tT ,
Expected Delivery Time (EDT)
iS
n
iin EDTppppSEDT ,
121 ),...,,,(
][ ,, init
iSdelivtiS tTEEDT
init
initt uniformly distributed over schedule S.
EDT Calculation
1 2 1 2
P = P = 1/21 2
EDT Calculation
1 2 1 2
DT 2
P = P = 1/21 2
EDT Calculation
1 2 1 2
DT 2 3/2
P = P = 1/21 2
EDT Calculation
1 2 1 2
DT 2 3/2
P = P = 1/21 2
DT1 7/4
EDT Calculation
1 2 1 2
DT 2 3/2
P = P = 1/21 2
DT1 7/4
EDT 7/4
Performance with Errors
Data items consist of Data items consist of kk packets packets What happens if a packet is lost?What happens if a packet is lost?
Original:
Transmitted:
12345 . . . k
12345 . . . k
Received: 1234 . . . k
1
k 1
k 1
Performance with Errors
What happens if a packet is lost?What happens if a packet is lost?
Original:
Transmitted:
12345 . . . k
12345 . . . k
Received: 1234 . . . k
1
k 1
k 1 12345
Performance with Errors
What happens if a packet is lost?What happens if a packet is lost?
Original:
Transmitted:
12345 . . . k
12345 . . . k
Received: 1234 . . . k
1
k 1
k 1 12345
EDT = 3 !
Use Use kk of of nn MDS code, MDS code, nn = 2 = 2kk Now only need to wait for 1 additional packetNow only need to wait for 1 additional packet
Solution – Coding
Original:
Transmitted:
12345 . . . k
12345 . . . k
Received: 1234 . . . k
1
k 1
k 1 1
12345 . . . k
12345 . . . k
k +
k +
k +
EDT = 9/4EDT = 9/4
Solution – Coding
Original:
Transmitted:
12345 . . . k
12345 . . . k
Received: 1234 . . . k
1
k 1
k 1 1
12345 . . . k
12345 . . . k
k +
k +
k +
Solution – Coding
Use Use kk of of nn MDS code, MDS code, mm = 2( = 2(k+1)k+1) Now only need to wait for 1 additional packetNow only need to wait for 1 additional packet
Original:
Transmitted:
12345 . . . k
Received:
1k +
k +
n 12345 . . . kn
12345 . . . k 1n 12345 . . . kn
12345 . . . kn
Solution – Coding
Original:
Transmitted:
12345 . . . k
Received:
1k +
k +
n 12345 . . . kn
12345 . . . k 1n 12345 . . . kn
12345 . . . kn
EDT = 7/4 + e
General Solution
Original:
Transmitted:
12345 . . . k
Received:
1k +
k +
n 12345 . . . kn
12345 . . . k 1n 12345 . . . kn
12345 . . . kn
Given loss probability p, what is the optimal n?
General Solution
General Solution
General Solution
General Solution
k = 100 and p = 0.1
General Solution
k = 100
Two-Channel Broadcasting
WirelessServer
want 1want 2
want 1
want 3
WirelessClients
WirelessServer
Coordinating Schedule Data
Use (2Use (2kk, , kk) MDS code to eliminate data overlap) MDS code to eliminate data overlap Channel 1 sends packets 1 through Channel 1 sends packets 1 through kk (raw data) (raw data) Channel 2 sends packets Channel 2 sends packets kk+1 through 2+1 through 2kk
FeaturesFeatures Each channel is self-sufficientEach channel is self-sufficient No overlap between channelsNo overlap between channels
S1 = 12 1 2
S2 = 12 1 2(same schedule, different data)
Scheduling for two channelsScheduling for two channels Two items with equal length and demandTwo items with equal length and demand Two synchronized channels of equal Two synchronized channels of equal
bandwidthbandwidth First channel’s schedule fixed at 12First channel’s schedule fixed at 12
What is the optimal schedule for channel 2?What is the optimal schedule for channel 2?
Two Broadcast Channels
S1 =
S2 =
1 2
?
Some Schedules
1 2
1 2
1 2
12
1 2
1 2
1 2
1 2
Repeat
Swap
Shift
2
Reshuffle
Unequal Portions
121 112 2 2
1 2
1 12 2
Arbitrary
2
1 11 2 2
Some Schedules
1 2
1 2
1 2
12
1 2
1 2
1 2
1 2
Repeat
Swap
Shift
2
Reshuffle
1 1
Unequal Portions
121 112 2 2
1
1 2
1 12 2
Arbitrary
2
EDT = 1
EDT = 1
EDT = 1
EDT = 1
2 2
Some Schedules
1 2
1 2
1 2
12
1 2
1 2
1 2
1 2
Repeat
Swap
Shift
2
Reshuffle
1 1
Unequal Portions
1 21 112 2 2
1
1 2
1 12 2
Arbitrary
2
EDT = 1
EDT = 1
EDT = 1
EDT = 1
EDT = 63/64
EDT < 63/64?
2 2
Schedule Performance
Symmetric ProblemSymmetric Problem Equal lengthsEqual lengths Equal demandsEqual demands Equal bandwidth channelsEqual bandwidth channels Symmetric “fixed” schedule for 1Symmetric “fixed” schedule for 1stst channel channel
Asymmetric SolutionAsymmetric Solution Asymmetric schedules can beat any symmetric Asymmetric schedules can beat any symmetric
schedule for the 2schedule for the 2ndnd channel channel How is this possible?How is this possible?
More to Explore …
More servers/ChannelsMore servers/Channels Differing levels of synchronizationDiffering levels of synchronization Transmission ErrorsTransmission Errors Streaming DataStreaming Data BoundsBounds Wireless
Server
want 1want 2
want 1
want 3
WirelessClients
WirelessServer
WirelessServer
WirelessServer
Hydra: A Platform for SSS
Secure and Survivable Storage
• Availability
• Recoverability
• Persistence
• Confidentiality
• Integrity
• Scalability
• Efficiency
Secure and Survivable Storage
• Yahoo
• Ebay
• Amazon
• Banks
• Your Labs
• More …
Hydra
Hydra Design Goals
• Portable to various OS/FS
• Hardware independent
• Unix FS semantics maintained
• Low overhead in performance and storage
• Transport independent
• Easy to install, configure, scale, maintain and
automate
Hydra and System
App.
Hydra
FS
I/O
Hydra and System
App.
Hydra
FS
I/O
App.
Hydra
FS
I/O
Hydra and System
App.
Hydra
FS
I/O
App.
Hydra
FS
I/O
App.
FS/Hydra
I/O
Basics of Hydra
(4,2) B-Code
a
d+c
b
d+a
c
a+b
d
b+c
Performance Test2.4G P4, 512 MB, 80GB ATA/100 7200rpm, Redhat 9.0 (kernel 2.4.2.0)
Operations Throughput (Mbps)
File Read 384 File Write 200 Memory Copy 17572(4,2) B-Code Encoding 5522(4,2) B-Code Decoding 22866 (4,2) RS Encoding 286 (4,2) RS Decoding 216
Hydra Components
• Meta Data ( hnode)
• Operations
• Monitor
Hydra Meta Data
• Code
• Symbol Location
• Data Layout
• Security Flag
• Access Rights
• Extensions
Hydra Operations
• Distribute (Write)
• Recover (Read)
• Detect
• Repair
• Restore
• Others
Hydra Monitor
• Connectivity
• Security
Hydra Applications
• Web Server
• CDN/P2P/Data Server
• Archiving
• Data Security
system activity logger, forensic, file integrity checker …
• Others
Acknowledgement