Download - Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University
Peer-to-peer archival data trading
Brian Cooper and Hector Garcia-Molina
Stanford University
2 Data trading
Problem: Fragile Data
Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business
3 Data trading
Replication-based preservation
4 Data trading
Replication-based preservation
5 Data trading
Motivation
Several systems use replication Preserve digital collections SAV, others
Archival part of digital library Individual organizations cooperate Not a lot of money to spend
6 Data trading
Goal Reliable replication of digital collections Given that
Resources are limited Sites are autonomous Not all sites are equal
Traditional methods Central control Random Replicate popular
Metric Reliability Not necessarily “efficiency”
7 Data trading
Our solution
Data trading “I’ll store a copy of your collection if you’ll store
a copy of mine” Sites make local decisions
Who to trade with How many copies to make How much space to provide Etc.
8 Data trading
Trading network A series of binary, peer-to-peer trading
links
A
D
B
H
C
E
G
F
9 Data trading
Reliability layer
Archived data
Architecture
Users Users
Filesystem
InfoMonitor
SAV ArchiveSAV Archive
Archived data
Internet
Local archive
Remote archive
Reliability layer
10 Data trading
Overview
Trading model Trading algorithm Simulating trading Simulation results
11 Data trading
Trading model
12 Data trading
Trading model Archive site: an autonomous archiving
provider
13 Data trading
Trading model Archive site: an autonomous archiving
provider Digital collection: a set of related digital
materials
14 Data trading
Trading model Archive site: an autonomous archiving
provider Digital collection: a set of related digital
materials Archival storage: stores locally and remotely
owned digital collections
15 Data trading
Trading model Archive site: an autonomous archiving
provider Digital collection: a set of related digital
materials Archival storage: stores locally and remotely
owned digital collections Archiving client: deposit and retrieve
materials
16 Data trading
Trading model Archive site: an autonomous archiving
provider Digital collection: a set of related digital
materials Archival storage: stores locally and remotely
owned digital collections Archiving client: deposit and retrieve
materials Data reliability: probability that data is not
lost
17 Data trading
Deeds
A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred
Trading algorithm Sites trade deeds Sites exercise deeds to
replicate collections
Deed for spaceFor use by: Library of Congress
or for transfer
623 gigabytes
Stanford University
18 Data trading
C
A B
Deed trading
Collection 1
Collection 1
Collection 2
Collection 2 Collectio
n 3Collection 3
19 Data trading
C
The challenge
A B
Collection 3
Collection 1
Collection 2
Collection 1
Collection 2
Collection 3
20 Data trading
C
The challenge
A B
Collection 3
Collection 1
Collection 2Collection
1
Collection 3 Collection
2
Collection 3
21 Data trading
Alternative solutions
Are there other ways besides trading?
22 Data trading
Other solutions: central control
CA B
Collection 3
Collection 1
Collection 2Collection
1
Collection 3 Collection
2
Collection 3
23 Data trading
Other solutions: client-based
CA B
Collection 3
Collection 1
Collection 2Collection
1
Collection 3 Collection
2
Collection 3
24 Data trading
Other solutions: random
CA B
Collection 3
Collection 1
Collection 2Collection
1
Collection 3 Collection
2
Collection 3
25 Data trading
Why is trading good?
High reliability Framework for replication
Site autonomy Make local decisions No submission to external authority
Fairness Contribute more = more reliability Must contribute resources
A
D
B
H
C
E
G
F
26 Data trading
Decisions facing an archive
Who to trade with Providing space Advertising space Picking a number of copies Joining a cluster Coping with varying site
reliabilities
27 Data trading
How do we evaluate policies?
Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy
28 Data trading
Simulation parameters
Number of sites 2 to 15
Site reliability 0.5 to 0.8
Collections per site
4 to 25
Data per collection
50 Gb to 1000 Gb
Space per site 2x data to 7x data
Replication goal 2 to 15 copies
Scenarios per simulation
200
29 Data trading
Reliability
Site reliability Will a site fail? Example: 0.9 = 10% chance of failure
Data reliability How safe is the data? Despite site failures Example: 320 year MTTF
30 Data trading
Example: trading strategy
Who should we try to trade with? The most reliable sites? Sites with reliability close to ours? The sites we have traded with before? Some other policy (like random)?
31 Data trading
1
10
100
1000
10000
0.5 0.6 0.7 0.8 0.9
Local site reliability
Av
era
ge
loc
al d
ata
MT
TF
Clustering MostReliable ClosestReliability
Example: trading strategy
R=0.8
35 Data trading
Results
Clusters of sites?
Social or political clusters E.g. all universities within a particular state Is the cluster big enough? What if it isn’t?
Result A few archives are sufficient E.g. 5 archives to make 3 copies Too many sites is counter-productive
36 Data trading
Trading clusters
39 Data trading
Current and future work Bidding versus direct trading
Local site holds an auction Bids = size of local site’s deed
“Deviant” sites Greedy sites Follow protocol but do not play nice
Access Support searching over collections Distribute indexes via trading
40 Data trading
Current and future work
Security Will sites actually preserve data? Will they give it to others? Can I protect sensitive information? What if I fail and lose my keys? Can I authenticate myself?
41 Data trading
Other parts of SAV project SAV data model
Write-once objects Signature-based naming
How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.)
Modeling archival repositories Arturo Crespo Choose best components and design
42 Data trading
Related work Peer-to-peer replication
SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems
RAID, mirrored disks, replicated databases
Caching systems (Andrew, Coda) Barter/auction based systems
ContractNet Distributed resource allocation
File Allocation Problem
43 Data trading
Conclusion Important, exciting area
Preservation critical Difficult to accomplish
Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions
Trading networks replicate data Model for trading networks Trading algorithm Simulation results
A
D
B
H
C
E
G
F