ece 6160: advanced computer networks

35
ECE 6160: Advanced Computer Networks SAN Instructor: Dr. Xubin (Ben) He Email: [email protected] Tel: 931-372-3462 Course web: http://www.ece.tntech.edu/hexb/616f05

Upload: networksguy

Post on 20-Jun-2015

359 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ECE 6160: Advanced Computer Networks

ECE 6160: Advanced Computer Networks

SAN

Instructor: Dr. Xubin (Ben) He

Email: [email protected]

Tel: 931-372-3462

Course web: http://www.ece.tntech.edu/hexb/616f05

Page 2: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

2

Prev…

• Networked storage

• NAS

Page 3: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

3

Storage Architectures

Page 4: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

4

Storage Area Networks

Page 5: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

5

SAN connection

• FC:– FC-SAN

• LAN (Ethernet)– IP-SAN

– iSCSI

• Other networks– Petal (ATM)

Page 6: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

6

Typical SAN

•Backup solutions (tape sharing) •Disaster tolerance solutions (distance to remote location) •Reliable, maintainable, scalable infrastructure

Page 7: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

7

A real SAN.

Page 8: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

8

NAS and SAN shortcomings

• SAN Shortcomings--Data to desktop--Sharing between NT and UNIX--Lack of standards for file access and locking

• NAS Shortcomings--Shared tape resources--Number of drives--Distance to tapes/disks

• NAS--Focuses on applications, users, and the files and data that they share

• SAN--Focuses on disks, tapes, and a scalable, reliable infrastructure to connect them

• NAS Plus SAN--The complete solution, from desktop to data center to storage device

Page 9: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

9

NAS plus SAN.

•NAS Plus SAN--The complete solution, from desktop to data center to storage device

Page 10: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

10

Petal/Frangipani

PetalPetal

FrangipaniFrangipani

NFSNFS

““SAN”SAN”

““NAS”NAS”

Page 11: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

11

Petal/Frangipani

PetalPetal

FrangipaniFrangipani

NFSNFSUntrustedOS-agnostic

FS semanticsSharing/coordinationDisk aggregation (“bricks”)Filesystem-agnosticRecovery and reconfigurationLoad balancingChained declusteringSnapshotsDoes not control sharing

Each “cloud” may resize or reconfigure independently.What indirection is required to make this happen, and where is it?

Page 12: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

12

Remaining Slides

The following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is still available through Chandu Thekkath’s site at www.thekkath.org.

For ECE6160, several issues are important:• Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN).• Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani.•Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it.

Page 13: ECE 6160: Advanced Computer Networks

Petal: Distributed Virtual Disks

Systems Research Center

Digital Equipment Corporation

Edward K. Lee

Chandramohan A. Thekkath

04/13/23

Page 14: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

14

Logical System View

/dev/vdisk1/dev/vdisk2 /dev/vdisk3 /dev/vdisk4

/dev/vdisk5

AdvFS NT FS PC FS UFS

Scalable Network

Petal

Page 15: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

15

Physical System View

Scalable Network

Petal Server Petal Server Petal Server Petal Server

Parallel Database or Cluster File System

/dev/shared1

Page 16: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

16

Virtual Disks

• Each disk provides 2^64 byte address space.

• Created and destroyed on demand.

• Allocates disk storage on demand.

• Snapshots via copy-on-write.

• Online incremental reconfiguration.

Page 17: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

17

Virtual to Physical Translation

PMap0

vdiskID

offset

(disk, diskOffset)

PMap1

Virtual Disk Directory

GMap

PMap2 PMap3

(server, disk, diskOffset)(vdiskID, offset)

Server 0 Server 1 Server 2 Server 3

Page 18: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

18

Global State Management

• Based on Leslie Lamport’s Paxos algorithm.

• Global state is replicated across all servers.

• Consistent in the face of server & network failures.

• A majority is needed to update global state.

• Any server can be added/removed in the presence of failed servers.

Page 19: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

19

Fault-Tolerant Global Operations

• Create/Delete virtual disks.

• Snapshot virtual disks.

• Add/Remove servers.

• Reconfigure virtual disks.

Page 20: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

20

Data Placement & Redundancy

• Supports non-redundant and chained-declustered virtual disks.

• Parity can be supported if desired.

• Chained-declustering tolerates any single component failure.

• Tolerates many common multiple failures.

• Throughput scales linearly with additional servers.

• Throughput degrades gracefully with failures.

Page 21: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

21

Chained Declustering

D0

Server0

D3

D4

D7

D1

Server1

D0

D5

D4

D2

Server2

D1

D6

D5

D3

Server3

D2

D7

D6

Page 22: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

22

Chained Declustering

D0

Server0

D3

D4

D7

Server1

D2

Server2

D1

D6

D5

D3

Server3

D2

D7

D6

D1

D0

D5

D4

Page 23: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

23

The Prototype

• Digital ATM network.– 155 Mbit/s per link.

• 8 AlphaStation Model 600.– 333 MHz Alpha running Digital Unix.

• 72 RZ29 disks.– 4.3 GB, 3.5 inch, fast SCSI (10MB/s).

– 9 ms avg. seek, 6 MB/s sustained transfer rate.

• Unix kernel device driver.

• User-level Petal servers.

Page 24: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

24

The Prototype

src-ss1

Digital ATM Network (AN2)

src-ss2 src-ss8

petal1 petal2 petal8

/dev/vdisk1

/dev/vdisk1 /dev/vdisk1 /dev/vdisk1

………

………

Page 25: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

25

Throughput Scaling

0

2

4

6

8

0 2 4 6 8

Number of Servers

Th

rou

pu

t S

cale

-up LINEAR

512B Rd

8KB Rd

64KB Rd

512B Wr

8KB Wr

64KB Wr

Page 26: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

26

Virtual Disk Reconfiguration

0

5

10

15

20

25

30

0 1 2 3 4 5 6

Elapsed Time in Minutes

Th

rou

gh

pu

t in

MB

/s

6 servers

8 servers

virtual disk w/ 1GB of allocated storage8KB reads & writes

Page 27: ECE 6160: Advanced Computer Networks

Frangipani: A Scalable Distributed File System

C. A. Thekkath, T. Mann, and E. K. Lee

Systems Research Center

Digital Equipment Corporation

Page 28: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

28

Why Not An Old File System on Petal?

• Traditional file systems (e.g., UFS, AdvFS) cannot share a block device

• The machine that runs the file system can become a bottleneck

Page 29: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

29

Frangipani

• Behaves like a local file system– multiple machines cooperatively manage

a Petal disk

– users on any machine see a consistentview of data

• Exhibits good performance, scaling, and load balancing

• Easy to administer

Page 30: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

30

Ease of Administration

• Frangipani machines are modular– can be added and deleted transparently

• Common free space pool – users don’t have to be moved

• Automatically recovers from crashes

• Consistent backup without halting the system

Page 31: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

31

Components of Frangipani

• File system core– implements the Digital Unix vnode interface

– uses the Digital Unix Unified Buffer Cache

– exploits Petal’s large virtual space

• Locks with leases

• Write-ahead redo log

Page 32: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

32

Locks

• Multiple reader/single writer

• Locks are moderately coarse-grained– protects entire file or directory

• Dirty data is written to disk before lock is given to another machine

• Each machine aggressively caches locks– uses lease timeouts for lock recovery

Page 33: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

33

Logging

• Frangipani uses a write ahead redo log for metadata

– log records are kept on Petal

• Data is written to Petal– on sync, fsync, or every 30 seconds

– on lock revocation or when the log wraps

• Each machine has a separate log– reduces contention

– independent recovery

Page 34: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

34

Recovery

• Recovery is initiated by the lock service

• Recovery can be carried out on any machine– log is distributed and available via Petal

Page 35: ECE 6160: Advanced Computer Networks

ECE6160:Advanced Computer Networks

35

References

• E. Lee and C. Thekkath, “Petal: Distributed Virtual Disks,” Proceedings of the international conference on Architectural support for programming languages and operating systems (ASPLOS 1996)

• P. Sarkar, S. Uttamchandani, and K. Voruganti, “Storage Over IP: When Does Hardware Support Help?” Proc. of 2nd USENIX Conference on File And Storage Technologies (FAST’2003)

• C. Thekkath, T. Mann, and E. Lee, “Frangipani: A scalable distributed file system,” Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP), pp. 224-237, October 1997