open problems in data- sharing peer-to-peer systems neil daswani, hector garcia-molina, beverly yang

Post on 19-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Open Problems in Data-Sharing Peer-to-Peer Systems

Neil Daswani,Hector Garcia-Molina,

Beverly Yang

Peer-To-Peer Systems

Autonomous, large-scale, decentralized systems

A large pool of resources Files, compute cycles

Open performance and security challenges

Research problems Search

Efficiency Expressiveness Quality of Service

Security Availability Authenticity Anonymity Access Control

Search Mechanism Submit queries and receive results

Keywords, SQL statements Defines the behavior of peers

Topology How peers are connected to each other

Data placement How data is distributed across the peers

Message Routing How messages are propagated

System Requirements Expressiveness

Query language should provide detailed description

Key lookups not expressive enough Comprehensiveness

Single result not sufficient for some systems All results required in some cases

Autonomy Nodes should control their organization

Goals of Search Mechanism

Maximize efficiency Light overhead, higher throughput

Maximize Quality of Service Number of results Response time

Robustness Stability in presence of failures

Expressiveness (1/2) Key lookup Keyword queries

Partial search Efficient for certain types of file , e.g music

Ranked Keyword Rank the results of keyword queries Global statistics required Collection and maintenance challenging “top k” results

Expressiveness (2/2)

Aggregates SUM, COUNT, MAX and MEDIAN E.g. COUNT nodes belonging to

forth.gr domain SQL

The most difficult query language Performance “hotspots” (PIER

system)

spiros antonatos

Autonomy/ Efficiency/ Robustness

Correlation between autonomy and efficiency Locate data with bounded cost

(Chord) Small sets of nodes guaranteed to

hold the answer Increased chance of finding results on

random node

Tuning the autonomy / efficiency tradeoff Varying needs

E.g. sensitive files should remain on the intranet

Different systems for different purposes not always desirable

SkipNet Specify a range of peers on which a

document can be stored Single peer range: high autonomy All peers range: traditional P2P system

Autonomy and Robustness Viceroy network construction

Low level of autonomy Reduced cost of maintaining structure

=> Increased robustness and efficiency Distributed hash tables

Logarithmic maintenance cost Super-peer redundancy

Stricter topology => decreased autonomy => greater robustness

Quality of Service Number of results

Tradeoff between number of results and cost BFS technique

Send messages to “productive” nodes Depends on ad-hoc topology

Concept-clustering Communicate according to “interest”

“Satisfaction” True when a threshold of results found Important to partial-search systems Cost can be drastically reduced

Security Availability

Bandwidth, CPU and file availability File Authenticity

Which responses are authentic? Anonymity

How we can hide our identity? Access Control

Restrict accessibility

Availability Nodes should be always up DoS attacks

Flooding a node with messages Malicious super-nodes in Gnutella

Claims that the victim has all files requested Attack CPU availability

Sending complex queries Attack file storage

Submit bogus documents Attack quality-of-service

Serve a file slowly Send a different file

Countermeasures Careful design of P2P protocols

Gnutella is loosely constrained Back-door communication channels are

prohibited Techniques for detecting failures

High message overhead, complexity Assume pairwise connectivity

Allocate storage proportionally to what a node contributes

Hash trees to ensure a node is sending the correct data and at a reasonable rate

Security Availability

Bandwidth, CPU and file availability File Authenticity

Which responses are authentic? Anonymity

How we can hide our identity? Access Control

Restrict accessibility

File Authenticity

Different than file integrity CRC, hashing, MACs, digital

signatures Given a query, the authentic

response has to be distinguished What does “authentic” mean?

Definition of “authentic” Oldest Document

The oldest submission is consider authentic Timestamping systems

Expert-based Authoriative nodes keep track of signatures Susceptible to failures Offline digital signature schemes

Voting-based Votes of many experts Experts may be humans Spoofing of votes, nodes and files

Reputation-based Weight votes, some experts more trustworthy Maintenance, update and propagation of weights

Security Availability

Bandwidth, CPU and file availability File Authenticity

Which responses are authentic? Anonymity

How we can hide our identity? Access Control

Restrict accessibility

Anonymity (1/2) Illegal trade of files vs. censorship

resistance, freedom of speech, privacy protection

Types of anonymity Author: which users created which documents Server: which nodes store a given document Reader: which users access which documents Document: which documents are stored at a

given node Anonymity vs. efficiency

Free Haven provides server anonymity, Freenet provides author anonymity

Anonymity (2/2)

Achieve server anonymity through intermediate nodes Forwarding proxies Servers identified by nicknames Degradation of anonymity protocols

under attacks Problem of collusion

Free Haven and Crowds use forwarding proxies

Security Availability

Bandwidth, CPU and file availability File Authenticity

Which responses are authentic? Anonymity

How we can hide our identity? Access Control

Restrict accessibility

Access Control

Restrict accessibility to documents P2P systems cannot enforce

copyright laws Violation of copyright laws by users Lawsuits against companies that build

P2P systems Limited utilization vs. free

distribution

top related