pond: the oceanstore prototype sean rhea, patrick eaton, dennis geels, hakim weatherspoon, ben zhao...

16
POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage Technologies, March 2003 Presenter: Prashanth

Upload: osborne-bruce

Post on 16-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

POND: the OceanStore Prototype

Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz

UC, Berkeley

File and Storage Technologies, March 2003 Presenter: Prashanth

Page 2: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

Goals of OceanStore

Provide an Internet-scale cooperative file system

High Durability Universal availability Balance between privacy & information

sharing Integrity

Page 3: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

Challenges

Maintenance• Many components, many administrative

domains

• Constant change

• Must be self-organizing

• Must be self-maintaining Security

• Must have end-to-end encryption

• Must not place too much trust in any one host

Page 4: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

Assumptions

Infrastructure is untrusted except in aggregate.• No more than some fraction of a given set are

faulty/malicious. Infrastructure is constantly changing.

Page 5: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

OceanStore uses Tapestry

Tapestry performs

Distributed Object Location and Routing

Locality aware Efficient

• O(log N ) location time Self-organizing, self-maintaining

Page 6: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

Data Model of OceanStore

The unit of storage is called Data Object. • Analogous to file in a file system

• Ordered sequences of read-only versions.

Page 7: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage
Page 8: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

Byzantine agreement

Guarantees all non-faulty replicas agree• Given N =3f +1 replicas, up to f may be

faulty/corrupt Expensive

• Requires O(N 2) communication

Page 9: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

Erasure Codes

Z

W

W

ZY

Xf

f -1

Page 10: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

The Path of a Write

Primary ReplicasHotOSAttendee

Other Researchers

Archival Servers(for durability)

SecondaryReplicas

(soft state)

Page 11: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

The prototype: Pond

Coding in Java Staged Event-Driven Architecture

Page 12: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

OceanStore

Phase NFS 512 1024

I 0.9 2.8 6.6

II 9.4 16.8 40.4

III 8.3 1.8 1.9

IV 6.9 1.5 1.5

V 21.5 32.0 70.7

Total 47.0 54.9 120.3

(times in seconds)

Performance Results: Andrew Benchmark

• Pond faster on reads: 4.6x – Phases III and IV– Only contact primary

when cache older than 30 seconds

• Ran Andrew on Pond– Primary replicas at UCB,

UW, Stanford, Intel Berkeley

– Client at UCB

• But slower on writes: 7.3x– Phases I, II, and V– Only 1024-bit are secure– 512-bit keys show CPU

cost

Page 13: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

A closer look at Write

Small writes• Signature dominates

• Threshold sigs. slow!

• Takes 70+ ms to sign

• Compare to 5 ms for regular sig

Large writes• Encoding dominates

• Archive cost per byte

• Signature cost per write

Phase4 kB write

2 MB write

Validate 0.3 0.4

Serialize 6.1 26.6

Apply 1.5 113.0

Archive 4.5 566.9

Sign Result 77.8 75.8

(times in milliseconds)

Page 14: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

Performance of Write

Page 15: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

Stream Benchmark

Page 16: POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage

Sources:

The OceanStore Project http://oceanstore.cs.berkeley.edu/

THANK YOU!