storage systems cse 598d, spring 2007 lecture 15: consistency semantics, introduction to...

116
Storage Systems Storage Systems CSE 598d, Spring 2007 CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Lecture 15: Consistency Semantics, Introduction to Network-attached Introduction to Network-attached Storage Storage March 27, 2007 March 27, 2007

Upload: reginald-cox

Post on 29-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Storage SystemsStorage SystemsCSE 598d, Spring 2007CSE 598d, Spring 2007

Lecture 15: Consistency Semantics, Lecture 15: Consistency Semantics, Introduction to Network-attached StorageIntroduction to Network-attached Storage

March 27, 2007March 27, 2007

Page 2: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Agenda• Last class– Consistency models: Brief Overview

• Next– More details on consistency models– Network storage introduction

• NAS vs SAN• DAFS• Some relevant technology and systems innovations• FC, Smart NICs, RDMA, …

– A variety of topics on file systems (and other storage-related software)• Log-structured file systems• Databases and file systems compared• Mobile/poorly connected systems, highly distributed & P2P storage• NFS, Google file system• Asynchronous I/O• Flash-based storage• Active disks, object-based storage devices (OSD)• Archival and secure storage• Storage virtualization and QoS

– Reliability, (emerging) miniature storage devices

Page 3: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Problem Background and Definition

• Consistency issues were first studied in the context of shared-memory multi-processors and we will start our discussion in the same context– Ideas generalize to any distributed system with shared storage

• Memory consistency model (MCM) of an SMP provides a formal specification of how the memory system will appear to the programmer– Places restrictions on the values that can be returned by a read in

a shared-memory program execution– An MCM is a contract between the memory and the programmer

• Why different models?– Trade-offs involved between “strictness” of consistency

guarantees, implementation efforts (hardware, compiler, programmer), system performance

Page 4: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Atomic/Strict Consistency

• Most intuitive, naturally appealing• Any read to a memory location x returns the value stored by

the most recent write operation to x• Defined w.r.t. a “global” clock

– That is the only way “most recent” can be defined unambiguously• Uni-processors typically observe such consistency

– A programmer on a uni-processor naturally assumes this behavior– E.g., As a programmer, one would not expect the following code segment to print 1

or any other value than 2• A = 1; A = 2; print (A);

– Still possible for compiler and hardware to improve throughput by re-ordering instructions

• Atomic consistency can be achieved as long as data and control dependencies are adhered to

• Often considered a base model (for evaluating MCMs that we will see next)

Page 5: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Atomic/Strict Consistency

• What happens on a multi-processor?– Even on the smallest and fastest multi-processor, global time can

not be achieved!– Achieving atomic consistency not possible– But not a hindrance, since programmers manage quite well with

something weaker than atomic consistency– What behavior do we expect when we program on a multi-

processor?• What we DO NOT expect: a global clock• What we expect:

– Operations from a process will execute sequentially» Again: A = 1; A =2; print (A) should not print 1

• And then we can use Critical section/Mutual exclusion mechanisms to enforce desired order among instructions coming from different processors

– So we expect a MCM less strict than atomic consistency. What is this consistency model, what are its properties, and what does the hardware/software (compiler) have to do to provide it?

Page 6: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Sequential Consistency• What we typically expect from a shared-memory multi-

processor system is captured by sequential consistency– Lamport [1979]: A multi-processor is sequentially consistent if

the result of any execution is the same as if • The operations of all the processors were executed in some

sequential order– That is, memory accesses occur atomically w.r.t. other memory

accesses• The operations of each individual processor appear in this

sequence in the order specified by its program– Equivalently, any valid interleaving is acceptable as long as all

processes see the same ordering of memory references

– Programmer’s view

Memory

P1 P3 P3 Pn

Page 7: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Example: Sequential Consistency

P1: W(x)1P2: W(y)2P3: R(y)2 R(x)0 R(x)1

• Not atomically consistent because:– R(y)2 by P3 reads a value that has not been written yet– W(x)1 and W(y)2 appear commuted at P3

• But sequentially consistent– SC doesn’t have the notion of global clock

Page 8: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Example: Sequential Consistency

P1: W(x)1P2: W(y)2P3: R(y)2 R(x)0 R(x)1

• Not atomically consistent because:– R(y)2 by P3 reads a value that has not been written yet– W(x)1 and W(y)2 appear commuted at P3

• But sequentially consistent• What about?

P1: W(x)1P2: W(y)2 R(y)2 R(x)0

R(x)1P3: R(y)2 R(x)0 R(x)1

Page 9: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Example: Sequential Consistency

P1: W(x)1P2: W(y)2P3: R(y)2 R(x)0 R(x)1

• Not atomically consistent because:– R(y)2 by P3 reads a value that has not been written yet– W(x)1 and W(y)2 appear commuted at P3

• But sequentially consistent• What about?

P1: W(x)1P2: W(y)2 R(y)2 R(x)0

R(x)1P3: R(x)1 R(y)0 R(y)2

Page 10: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Causal Consistency• Hutto and Ahamad, 1990• Each operation is either “causally related” or “concurrent” with another

– When a processor performs a read followed later by a write, the two operations are said to be causally related because the value stored by the write may have been dependent upon the result of the read

– A read operation is causally related to an earlier write that stored the data retrieved by the read– Transitivity applies – Operations that are not causally related are said to be concurrent.

• A memory is causally consistent if all processors agree on the order of causally related writes– Weaker than SC that requires all writes to be seen in the same order

P1: W(x)1 W(x)3 P2: R(x)1 W(x)2 P3: R(x)1 R(x)3 R(x)2 P4: R(x)1 R(x)2 R(x)3

W(x)1 and W(x)2 causally relatedW(x)2 and W(x)3 not causally related!

Page 11: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Summary: Uniform MCMs

Atomic consistency

Sequential consistency

Processor consistency

Causal consistency

PRAM consistencyCache consistency

Slow memory

Page 12: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

UNIX and session semantics

• UNIX file sharing semantics on a uni-processor system– When a read follows a write, the read returns the value

just written– When two writes happen in quick succession, followed

by a read, the value read is that stored by the last write

• Problematic for a distributed system– Theoretically achievable if single file server and no

client caching

• Session semantics– Writes made visible to others only upon the closing of a

file

Page 13: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Delta Consistency• Any write will become visible within at most Delta

time units– Barring network latency– Meanwhile … all bets are off!– Push versus pull– Compare with sequential, causal, etc. in terms of valid

orderings of operations

• Related: Mutual consistency with parameter Delta– A given set of “objects” are within Delta time units of each

other at all times as seen by a client– Note that it is OK to be stale with respect to the server by

more than Delta!– Generally, specify two parameters

• Delta1: Freshness w.r.t. server• Delta2: Mutual consistency of related objects

Page 14: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

File Systems Consistency Semantics

• What is involved in providing these semantics?• UNIX semantics easy to implement on a uni-processor• Session semantics: session state at the server• Delta consistency: timeouts, leases• Meta-data consistency

– Some techniques we have seen• Journaling, LFS, Meta-data journaling: ext3• Synchronous writes• NVRAM: expensive, unavailable

– Disk scheduler enforced ordering!• File system passes sequencing restrictions to the disk scheduler• Problem: Disk scheduler can not enforce an ordering among requests not yet

visible to it– Soft updates

• Dependency information is maintained for meta-data blocks in write-back cache on a per-field and/or per-pointer granularity

Page 15: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Network-attached Storage

• Introduction to important ideas and technologies• Lots of slides, will cover some in class, post all on Angel• Subsequent classes will cover some topics in depth

Page 16: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Attached Storage

• Problems/shortcomings in enterprise/commercial settings– Sharing of data difficult– Programming and client access inconvenient– Wastage of data– More?

Page 17: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

“Remote” Storage• Idea: Separate storage from the clients and application servers and

locate it on the other side of a scalable networking infrastructure– Variants on this idea that we will see soon

• Advantages– Reduction in wasted capacity by pooling devices and consolidating unused

capacity formerly spread over many directly-attached storage devices– Reduced time to deploy new storage

• Client software is designed to tolerate dynamic changes in network resources but not the changing of local storage configurations while the client is operating

– Backup made more convenient• Application server involvement removed

– Management simplified by centralizing storage under a consolidate manager interface

– Availability improved (potentially)• All software and hardware is specifically developed and tested to run together

• Disadvantages– Complexity, more expertise needed

• Implies more set-up and management cost

Page 18: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Network Attached Storage

File interface exported to rest of the network

Page 19: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Storage Area Network (SAN)

Block interface exported to rest of the network

Page 20: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

SAN versus NAS

Source: November 2000/Vol. 43, No. 11 COMMUNICATIONS OF THE ACM

Page 21: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Differences between NAS and SAN

• NAS– TCP/IP or UDP/IP protocols and Ethernet networks– High-level requests and responses for files– NAS devices translate file requests into operations on disk

blocks– Cheaper

• SAN– Fibre Channel and SCSI– More scalable– Clients translate files access to operate on specific disk– Data block level – Expensive– Separation of storage traffic from general network traffic

• Beneficial from security, performance

Page 22: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

NAS File Servers• Pre-configured file servers• Consists of one or more

internal servers with pre-configured capacity

• Have a stripped down OS; any component not associated with file services is discarded

• Connected via Ethernet to LAN

• OS stripping makes it more efficient than a general purpose OS

• Have plug and play functionality

Source: Storage Networks Explained: Basics and Application of Fibre Channel SAN, NAS iSCSI and InfiniBandby Ulf Troppens,Rainer Erkens,Wolfgang Mueller

Page 23: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

NAS Network Performance

• NAS and traditional network file systems use IP-based protocols over NIC devices.

• A consequence of this deployment is poor network performance.

• The main culprits often cited include: - Protocol processing in network stacks

- Memory copying - Kernel overhead including system calls and context switches.

Page 24: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

NAS Network Performance

Figure depicting sources of TCP/IP overhead

Page 25: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

NAS Network Performance

Protocol Processing• Data transmission involves the OS services for memory and

process management, the TCP/IP protocol stack and the network device and its device driver.

• The network per-packet costs include the overhead to execute the TCP/IP protocol code, allocate and release memory buffers, and device interrupts for packet arrival and transmit completion.

• The per-byte costs include overheads to move data within the end to end system and to compute checksums to detect data corruption in the network.

Page 26: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

NAS Network Performance

Current implementation for data transmission requires the same data to be

copied at several stages.

Memory Copy

Page 27: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

NAS Network Performance

• An NFS client requesting data stored on a NAS server with internal SCSI disk would involve:

- Hard Disk to RAM transfer using SCSI, PCI and system buses - RAM to NIC transfer using the System and PCI buses

• For a traditional NFS this would further involve a transfer from the application memory to the kernel buffer cache of the transmitting computer before forwarding to the network card.

Page 28: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Accelerating Performance

• Two starting points to accelerate network file system performance are :

- The underlying communication protocol TCP/IP was designed to provide a reliable framework for data exchange

over an unreliable network. The TCP/IP stack is complex and CPU-intensive. Example alternate: VIA/RDMA

- The Network file system Development of new network file systems which have a reliable network

connection requirement. Network file systems could be modified to use thinner communication

protocols Example alternate: DAFS

Page 29: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Proposed SolutionsTCP/IP offloading Engines (TOEs)• An increasing number of network adapters are able to

compute internet checksum• Some adapters can now perform TCP or UDP protocol

processing

Copy Avoidance• Several buffer management schemes had been proposed to

either reduce or eliminate data copying

Page 30: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Proposed SolutionsFibre Channel • Fibre Channel reduces the communication overhead by offloading

transport processing to the NIC instead of using the host processor• Zero copying is facilitated by direct communication between the

host memory and the NIC device

Direct-Access Transport• Requires NIC support for remote DMA• User-level networking made possible through user-mode process

interacting directly with the NIC to send or receive messages with minimal kernel intervention

• Reliable message transport network

Page 31: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Proposed SolutionsNIC Support Mechanism • NIC device exposes an array of connection descriptors to the

system’s physical address space• During connection setup time network device driver maps a free

descriptor into the user virtual address space• This grants user process a direct and safe access to the NIC’s

buffers and registers • This facilitates user-level networking and copy avoidance

Page 32: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Proposed SolutionsUser-Level File System

• Kernel policies for file system caching and prefetching do not favor some applications

• The migration of OS functions into user level libraries allow user applications more control and specialization.

• Clients would run in user mode as libraries linked directly with applications.This reduces the overhead due to system calls

• Clients may evolve independent of the operating system• Clients could also run on any OS, with no special kernel support

except the NIC device driver.

Page 33: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Virtual Interface And RDMA

• The virtual interface architecture facilitates fast and efficient data exchange between applications running on different machines

• VIA reduces complexity by allowing applications (VI consumers) to communicate directly with the network card (VI NIC) via common memory areas, bypassing the operating system

• The VI provider is the NIC and its device driver

• RDMA is a communication model supported on the VIA which allow applications to read and write memory areas of processes running on different computers

Page 34: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

VI Architecture and RDMA

Source: Storage Networks Explained: Basics and Application of Fibre Channel SAN, NAS iSCSI and InfiniBandby Ulf Troppens,Rainer Erkens,Wolfgang Mueller

Page 35: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Remote DMA (RDMA)

Page 36: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

VIA Model

LANai

CPU

LANai

CPU

senddoorbel

l

receivedoorbell

user addressspace

receivedescriptorsend

descriptor

data packetsin NIC

memory

Myrinet NIC Myrinet NIC

1

2

4 5 9

73

sendbuffer

receivebuffer

86 1

0

user addressspace

host host

Page 37: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

InfiniBand

• “Infinite Bandwidth”• A Switch-based I/O interconnect architecture• Low pin count serial architecture• Infiniband Architecture(IBA) defines a System Area

Network (SAN) – IBA SAN is a communications and management

infrastructure for I/O and IPC• IBA defines a switched communications fabric

– high bandwidth and low latency • Backed by top companies in the industries; Compaq, Dell,

Hewlett Packard, IBM, Intel, Microsoft and sun

Page 38: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Limits of the PCI Bus

• Parallel Component Interconnect (PCI)– Introduced in 1992– Has become the standard bus architecture for servers– PCI bus

• 32-bit/33MHz -> 64-bit/66 MHz– PCI-X

• The latest version 64 bits at: PCI-X 66, PCI-X 133, PCI-X 266 and PCI-X 533 [4.3GBps]

– Other PCI concerns include• Bus sharing• Bus speed• Scalability• Fault Tolerance

Page 39: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

PCI Express• High-speed point-to-point architecture that is

essentially a serialized,packetizedversion of PCI • General purpose serial I/O bus for chip-to-chip

communication, USB 2.0 / IEEE 1349b interconnects,and high-end graphics – viable AGP replacement

• Bandwidth 4 Gigabit/second full duplex per lane – Up to 32 separate lanes – 128 Gigabit/second

• Software-compatible with PCI device driver model • Expected to coexist with and not displace technologies

like PCI-X in the foreseeable future

Page 40: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Benefits of IBA• Bandwidths• An open and industry-inclusive standard• Improved connection flexibility and scalability• Improved reliability• Offload communications processing from the OS and CPU• Wide access to a variety of storage systems• Simultaneous device communication • Built-in security, quality of Service• Support for Internet Protocol version (IPv6)• Fewer and better managed system interrupts• Support for up to 64000 addressable devices• Support for copper cable and optic fiber

Page 41: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

InfiniBand Components• Host Channel Adapter (HCA)

– An interface to a host and supports all software Verbs • Target Channel Adapter (TCA)

– Provides the connection to an I/O device from InfiniBand• Switch

– Fundamental component of an IB fabric– Allows many HCAs and TCAs to connect to it and handles

network traffic.• Router

– Forwards data packets from a local network to other external subnets

• Subnet Manager– An application responsible for configuring the local subnet

and ensuring its continued operation

Page 42: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007
Page 43: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

An IBA SAN

Page 44: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

InfiniBand Layers• Physical Layer

Link Pin Count Signaling Rate

Data Rate

Full-Duplex Data Rate

1x 4 2,5 Gb/s 2 Gb/s 4 Gb/s (500 MB/s)

4x 16 10 Gb/s 8 Gb/s 16 Gb/s (2 GB/s)

12x 48 30 Gb/s 24 Gb/s 48 Gb/s (6 GB/s)

Page 45: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

InfiniBand Layers• Link Layer

– Is central to the IBA and includes packet layout, point to point link instructions, switching within a local subnet and data integrity

– Packets• Data and management packets

– Switching• Data forwarding within a local subnet

– QoS• Supported by Virtual lanes • is a unique logical communication link that shares a single physical link• Up to 15 virtual lane per physical link (VL0 – VL15)• Packet is assigned a priority

– Credit Based Flow Control• Used to manage data flow between two point-to-point links

– Integrity check using CRC

Page 46: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

InfiniBand Layers• Networking Layer

–Responsible for routing packets from one subnet to another–The global route header (GRH) located within a packet includes an

IPv6 address for the source and destination of each packet• Transport Layer

–Handles the order of packet delivery as well as partitioning, multiplexing and transport services that determine reliable connections

Page 47: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Infiniband Architecture• The Queue Pair Abstraction

–2 queues of communication meta data (send & recv)–Registered buffers which to send from/recv to

“Architectural Interactions of I/O Networks and Inter-networks”, Philip Buonadonna, Intel Research & University of California, Berkeley

Page 48: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System• A new network file system derived from NFS version 4• Tailored to use remote DMA (RDMA) which requires the

virtual interface (VI) framework• Introduced to combine the low overhead of SAN products

with the generality of NAS file servers• Communication between a DAFS server and client is done

through RDMA• Client side caching of locks for easier subsequent access to

same file• Clients can be implemented as a shared library in user space

or in the kernel

Page 49: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

DAFS Architecture

Source: Storage Networks Explained: Basics and Application of Fibre Channel SAN, NAS iSCSI and InfiniBandby Ulf Troppens,Rainer Erkens,Wolfgang Mueller

Page 50: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System

DAFS Protocol • Defined as a set of send and request formats and their

semantics• Defines recommended procedural APIs to access DAFS

services from a client program• Assumes a reliable network transport and offers server-

directed command flow• Each operation is a separate request but also supports

request chaining• Defines features for session recovery and locking primitives

Page 51: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System

Direct Access Data Transfer • Supports direct variants of data transfer operations such as read,

write, setattr etc.• Direct transfer operations to and from client-provided memory

using RDMA read and write operations• Client registers each memory region with local kernel before

requesting direct I/O on region• API defined primitives register and unregister for memory region

management; register returns a region descriptor• Registration issues a system call to pin buffer regions in physical

memory, then loads page translations for the region into a lookup table on the NIC

Page 52: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System

RDMA Operations • RDMA operations for direct I/O are initiated by the server.• Client write request to server includes a region token for the

buffer containing the data• Server then issues a RDMA read to fetch data from client

and responds with a write request response after RDMA completion

Page 53: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System

Asynchronous I/O and Prefetching • Supports fully asynchronous API interface which enables

clients to pipeline I/O operations and overlap them with application processing

• Event notification mechanisms delivers asynchronous completions and client may create several completion groups

• DAFS can be implemented as a user library to be linked with applications or within the kernel.

Page 54: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System

Figure depicting DAFS and NFS Client ArchitecturesSource: http://www.eecs.harvard.edu/~vino/fs-perf/dafs.html

Page 55: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System

Server Design and Implementation

• The kernel server design is fashioned on an event driven state transition diagram

• The main event triggering state transitions are:

recv_done, send_done and bio_done

Figure 1. An event-driven DAFS server

Source: http://www.eecs.harvard.edu/~vino/fs-perf/dafs.html

Page 56: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System

Event Handlers • Each network or disk event is associated with a handler routine• recv_done - Client initiated transfer is complete. This signal is

asserted by the NIC and initiates the processing of an incoming RPC request

• send_done - Server initiated transfer is complete. The handler for this signal releases all the locks involved in the RDMA operation and returns an RPC response

• bio_done - Block I/O request from disk is complete. This signal is raised by the disk controller and wakes up any thread that is blocking on a previous disk I/O

Page 57: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System

Server Design and Implementation • Server performs disk I/O using the zero-copy buffer cache

interface• This interface facilitates the locking pages and their mappings• Buffers involved in RDMA need to be locked during the entire

transfer duration• Transfers are initiated using RPC handlers and processing is

asynchronous• Kernel buffer cache manager registers and de-registers buffer

mappings to the NIC on the fly, as physical pages are returned or removed from the buffers

Page 58: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Direct Access File System

Server Design and Implementation • Server creates multiple kernel threads to facilitate I/O

concurrency • A single listener thread monitors for new transport connections.

Other worker threads handle data transfer• Arriving messages generate a recv_done interrupt which is

processed by a single handler for the completion group • Handler queues up incoming RPC requests and invokes a worker

thread to start data processing• A thread locks all the necessary file pages in the buffer cache,

creates RDMA descriptors and issues RDMA operations• After RDMA completion, a send_done signal is sent which

initiates the clean up and release of all resources associated with the completed operation

Page 59: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Communication Alternatives

Source: Storage Networks Explained: Basics and Application of Fibre Channel SAN, NAS iSCSI and InfiniBandby Ulf Troppens,Rainer Erkens,Wolfgang Mueller

Page 60: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental Setup

Source: http://www.eecs.harvard.edu/~vino/fs-perf/dafs.html

Page 61: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental SetupSystem Configuration

• Pentium III 800 MHz clients and servers

• Server cache 1GB, 133MHz memory bus

• 9GB Disks, 10K RPM Seagate Cheetah, 64-bit/33MHz PCI bus

• VI over Giganet cLAN 1000 adapter (DAFS)

• UDP/IP over Gigabit Ethernet, Alteon Tigon-II adapters (NFS)

Page 62: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental Setup• NFS block I/O transfer size is set at mount time• Packets sent in fragmented UDP packets• Interrupt coalescing is set to high on Tigon-II• Checksum offloading enabled on Tigon-II

• NFS-nocopy required modifying Tigon-II firmware, IP fragmentation code, file cache code,VM system and Tigon-II driver, to facilitate header splitting and page remapping

Page 63: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental ResultsThe table below shows the results for one-byte round trip latency andbandwidth. The higher latency in Tigon-II was due to datapath crossingthe kernel UDP/IP stack

Page 64: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental ResultsBandwidth and Overhead• Server pre-warmed with 768MB dataset• Designed to stress on network data transfer• Hence client caching not considered

Sequential Configuration• DAFS client utilized the asynchronous I/O API• NFS had read-ahead enabled Random Configuration• NFS tuned for best-case performance at each request size by

selecting a matching NFS transfer size

Page 65: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental Results

Page 66: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental Results

Page 67: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental ResultsTPIE Merge• The sequential record merge program combines n sorted

input files of x y-bytes each into a single sorted output file• Depicts raw sequential I/O performance with varying

amounts of processing• Performance is limited by the client CPU

Page 68: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental Results

Page 69: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental ResultsPostMark• A synthetic benchmark used in measuring file system

performance over workloads composed of many short-lived, relatively small files

• Creates a pool of files with random sizes followed by sequence of file operations

Page 70: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Experimental ResultsBerkeley DB• Synthetic workload composed of read-only transactions,

processing one small record at random from a B-tree

Page 71: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Disk Storage Interfaces

• Parallel ATA (IDE, E-IDE)• Serial ATA (SATA)• Small Computer System Interface

(SCSI)• Serial Attached SCSI (SAS)• Fiber Channel (FC)

"It's More Then the Interface" By Gordy Lutz of Seagate, August, 2002.

Page 72: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Parallel ATA• 16-bit bus• Two bytes per bus transaction• 40-pin connector• Master/slave shared bus

• Bandwidth25MHz strobex 2 for double data rate clockingx 16bits per edge/ 8 bits per byte-------------------------------------= 100MBytes/sec

Page 73: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Serial ATA (SATA)• 7-pin connector• Point to Point connections for dedicated bandwidth• Bit-by-bit

– One single signal path for data transmission– The other signal path for acknowledgement

• Bandwidth 1500MHz embedded clockx 1 bit per clockx 80% for 8b10b encoding/ 8 bits per byte-------------------------------------= 150MBytes/sec

• 2002 -> 150MB/sec • 2004 -> 300MB/sec • 2007 -> 600MB/sec

Page 74: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

8b10b encoding• IBM Patent• Used in SATA, SAS, FC and

InfiniBand• Convert 8 bits data to 10 bits

codes• Provides better synchronization

than Manchester encoding

Page 75: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Small Computer Systems Interface (SCSI)

• SCSI for high-performance storage market

• SCSI-1 proposed in 1986• Parallel Interface• Maximum cabling distance is 12 meters• Terminators required• Bus width is 8-bit (narrow)• 16 devices per bus• A device with a high priority has a bus

Page 76: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

SCSI (cont’d)• Peer-to-peer connection (channel) • 50/68 pins

• Hot repair not provided• Multiple buses needed beyond 16 devices• Low bandwidth• Distance limitation

Page 77: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

SCSI Roadmap• Wide SCSI (16-bit bus)• Fast SCSI (double data rate)

Page 78: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Serial Attached SCSI (SAS)

• ANSI standard in 2003• Interoperability with SATA• Full-duplex• Dual-port• 128 devices• 10 meters

Page 79: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Dual port• ATA, SCSI and SATA support a single

port• Controller is a single point of failure• SAS and FC support dual port

Page 80: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

SAS Roadmap

http://www.scsita.org/aboutscsi/sas/SAS_roadmap2004.html

Page 81: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Fibre Channel (FC)• Developed to backbone technology of

LANs• The name is a misnomer

– Runs on copper also– 4 wire cable or fiber optic

• 10 km or less per link• 126 devices per loop• No terminators• Installed base of Fibre Channel devices*

– $2.45 billion FC HBAs in 2005– $5.4 billion FC switches in 2005

*Source: Gartner, Dec 13, 2001

Page 82: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

FC (cont’d)• Advantage

– High bandwidth– Secure– Zero-copy send and receive– Low host CPU utilization– FCP (Fibre Channel Protocol)

• Disadvantage– Not a wide-area network– Separate physical network infrastructure – Expensive– Different management mechanisms– Interoperability from difference vendors

Page 83: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Fiber Channel Topologies

Ulf Troppens, Rainer Erkens and Wolfgang Muller, Storage Networks Explained

Page 84: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Fiber Channel Ports• N-Port: Node port• F-Port: Fabric port• L-Port: Loop port

– Only connect to AL• E-Port: Expansion port

– Connect two switches• G-Port: Generic port• B-Port: Bridge port

– Bridge to other networks (IP, ATM, etc)• NL-Port: Node_Loop_port

– Can connect both in fabric and in AL• FL-Port: Fabric_Loop_port

– Makes a fabric to connect to a loop

Ulf Troppens, Rainer Erkens and Wolfgang Muller, Storage Networks Explained

Page 85: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Arbitrated Loop in FC

Ulf Troppens, Rainer Erkens and Wolfgang Muller, Storage Networks Explained

Page 86: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Arbitrated Loop in FC

Ulf Troppens, Rainer Erkens and Wolfgang Muller, Storage Networks Explained

Page 87: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Routing mechanisms in switch

• Store-forward routing

• Cut-through routing

William James Dally and Brian Towles, Principles and practices of Interconnection networks, chapter 13

Page 88: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Fibre Channel Hub and Switch

• Switch– Thousands of

connections– Bandwidth per device

is nearly constant– Aggregate bandwidth

increases with increased connectivity

– Deterministic latency

• Hub– 126 Devices– Bandwidth per device

diminished with increased connectivity

– Aggregate bandwidth is constant with increased connectivity

– Latency increases as the number of devices increases

Page 89: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Fibre Channel Structure

Page 90: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Fibre Channel Bandwidth

• Clock rate is 1.0625GHz• 1.0625[Gbps] x

2048[payload]/2168[payload+overhead] x 0.8[8b10b]/8[bits] = 100.369 MB/s

Page 91: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Cable types in FC

Page 92: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

FC Roadmap

ProductNaming

Throughput(MB/s)

T11 Spec Completed

(Year)

Market Availability

(Year)

1GFC 200 1996 1997

2GFC 400 2000 2001

4GFC 800 2003 2005

8GFC 1,600 2006 2008

16GFC 3200 2009 2011

32GFC 6400 2012 Market Demand

64GFC 12800 2016 Market Demand

128GFC 25600 2020 Market Demandhttp://www.fibrechannel.org/OVERVIEW/Roadmap.html

Page 93: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Interface Comparison

Page 94: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Market Segments

It’s more than interface, Seagate, 2003

Page 95: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Interface Trends - Previous

It’s more than interface, Seagate, 2003

Page 96: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Interface Trends – Today and Tomorrow

It’s more than interface, Seagate, 2003

Page 97: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

IP Storage

Page 98: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

IP Storage (cont’d)• TCP/IP is used as a storage interconnect to

transfer block level data.• IETF working group, the IP Storage (IPS) • iSCSI, iFCP, and FCIP protocols

• Cheaper• Provides one technology for a client to connect

to servers and storage devices• Increases operating distances• Improves availability of storage systems• Can utilize network management tools

It’s more than interface, Seagate, 2003

Page 99: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

iSCSI (Internet SCSI)• iSCSI is a Transport for SCSI Commands

– iSCSI is an End to End protocol– iSCSI can be implemented on Desktops, Laptops

and Servers– iSCSI can be implemented with current TCP/IP

Stacks– iSCSI can be implemented completely in a HBA

• Overcomes the distance limitation• Cost-effective

Page 100: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Protocol Stack - iSCSI

Page 101: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Packet and Bandwidth - iSCSI

• iSCSI overhead: 78 Bytes – 14 (Ethernet) + 20 (IP) + 20 (TCP) + 4 (CRC) + 20 (Interframe

Gap)– iSCSI header occurs 48 bytes per SCSI command

• 1.25[Gbps] x 1460[payload]/1538[payload+overhead] x 0.8[8b10b]/8[bits] = 113.16 MB/s

• Bi-Directional Payload Bandwidth: 220.31 MB/s

Page 102: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Problems with iSCSI• Limited Performance because

– Protocol overhead in TCP/IP– Interrupts are generated for each

network packet– Extra copies when sending and

receiving data

Page 103: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

iSCSI Adapter Implementations

Page 104: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

• Software approach– Show the best performance– This approach is very competitive due to fast modern

CPUs

• Hardware Approaches– Relatively slow CPU compared to host CPU– Development speed is also slower than that in host

CPU– Performance improvement is limited without

superior advances in embedded CPU– Can show performance improvement in highly-

loaded systems

Prasenjit Sarkar, Sandeep Utamchandani, Kaladhar Voruganti, Storage over IP: When Does Hardware Support help?, FAST 2003

Page 105: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

iFCP (Internet Fiber Channel Protocol)

• iFCP is a gateway-to-gateway protocol for the implementation of a fibre channel fabric over a TCP/IP transport

• Allow users to interconnect FC devices over a TCP/IP network at any distance

• Traffic between fibre channel devices is routed and switched by TCP/IP network

• iFCP maps each FC address to an IP address and each FC session to an TCP session

• FC messaging and routing services are terminated at the gateways so that are not merged

• Data backup and replication• mFCP uses UDP/IP

Page 106: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

How does iFCP work?

Page 107: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Types of iFCP communication

Page 108: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

FCIP (Fiber Channel over IP)

• TCP/IP-based tunneling protocol to encapsulate fibre channel packets

• Allow users to interconnect FC devices over a TCP/IP network at any distance (same as iFCP)

• Merges connected SANs into a single FC fabric• Data backup and replication• Gateways

–used to interconnect fibre channel SANs to the IP network –set up connections between SANs or between fibre

channel devices and SANs

Page 109: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

FCIP (Fiber Channel over IP)

Page 110: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Comparison between FCIP and iFCP

Page 111: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

IP Storage Protocols: iSCSI, iFCP and FCIP

Page 112: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007
Page 113: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

RAS• Reliability

– The basic InfiniBand link connection is comprised of only four signal wires

– IBA accommodates multiple ports for each I/O unit– IBA provides multiple CRCs

• Availability– An IBA fabric in inherently redundant, with multiple paths to

sources assuring data delivery– IBA allows the network to heal itself if a link fails or is

reporting errors– IBA has a many-to-many server-to-I/O relationship

• Serviceability– Hot-pluggable

Page 114: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Feature Infini Band Fibre Channel 1Gb & 10 Gb Ethernet

PCI-X

Bandwidth 2.5 , 10, 30 Gb/s 1, 2.1 Gb/s 1, 10 Gb/s 8.51 Gb/s

Bandwidth Full-Duplex

5, 20, 60 Gb/s 2.1 , 4.2 GB/s 2, 20 Gb/s N/A

Pin Count 4, 16, 48 4 4 / 8 90

Media Copper/Fiber Copper/Fiber Copper/Fiber PCB

Max Length Copper

250 / 125 m 13m 100m inches

Max Length Fiber 10 km km km N/A

Partitioning X X X N/A

Scalable Link Width

X N/A N/A N/A

Max Payload 4 KB 2KB 1.5 KB No Packets

Page 115: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

A classification of storage systems

(warning - not comprehensive)

• Isolated– E.g., A laptop/PC with a local file system– We know how these work– File systems were first developed for centralized computer systems as an OS

facility providing a convenient programming interfact to (disk) storage– Subsequently acquired features like AC, file-locking that made them useful for

sharing of data and programs• Distributed

– Why?• Sharing, scalability, mobility, fault tolerance, …

– “Basic” Distributed file system• Give the illusion of local storage when the data is spread across a network (usually a LAN)

to clients running on multiple computers• Support the sharing of information of in the form of files and hardware resources in the

form of persistent storage throughout an intranet

– Enhancements in various domains for “real-time” performance (multimedia), high failure resistance, high scalability (P2P), security, longevity (archival systems), mobility/disconnections, …

– Remote objects to support distributed object-oriented programming

Page 116: Storage Systems CSE 598d, Spring 2007 Lecture 15: Consistency Semantics, Introduction to Network-attached Storage March 27, 2007

Storage systems and their properties

Main memory No No No Strict one-copy RAM

File system No Yes No Strict one-copy UNIX FS

Distributed file system

Yes Yes Yes Yes (approx.) NFS

Web Yes Yes Yes Very approx/No Web server

Distributed shared memory

Yes No Yes Yes (approx) Ivy

Remote objects (RMI/ORB)

Yes No No Strict one-copy CORBA

Persistent object store

Yes Yes No Strict one-copy CORBA persistent state service

P2P storage system

Yes Yes Yes Very approx OceanStore

Sharing PersistenceCaching/replication

Consistencymaintenance Example