robustness in the salus scalable block store

38
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin University of Texas at Austin

Upload: haruki

Post on 03-Feb-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Robustness in the Salus scalable block store. Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin University of Texas at Austin. Scalable and robust storage. More hardware. More failures. More complex software. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Robustness in the Salus scalable block store

Robustness in the Salus scalable block store

Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam,

Lorenzo Alvisi, and Mike DahlinUniversity of Texas at Austin

Page 2: Robustness in the Salus scalable block store

Scalable and robust storage

More hardware More complex softwareMore failures

Page 3: Robustness in the Salus scalable block store

Achieving both is hard

Scalable systems (GFS/Bigtable, HDFS/HBase, WAS, Spanner, FDS, …..)

Strong protections (End-to-end checks, BFT, Depot, …)

Challenge:

Read from 1 node

BFT: read from f+1 nodes

Consistency

Parallelismvs

Page 4: Robustness in the Salus scalable block store

Salus

• Scalability:– Thousands of servers

• Robustness:– Tolerate disk/memory corruptions, CPU errors, …– Do NOT hurt performance/scalability.

• Usage:– Provide remote disks to users (Amazon EBS)

Page 5: Robustness in the Salus scalable block store

Outline

• Challenges• Salus’ overview• Solutions

– Pipelined commit– Active storage– Scalable end-to-end checks

• Evaluation

Page 6: Robustness in the Salus scalable block store

Challenge: Parallelism vs Consistency

Metadata server

Storage servers

Clients

Infrequent metadata transfer

Parallel data transfer

Data is replicated for durability and availability

State-of-the-art scalable architecture(GFS/Bigtable, HDFS/HBase, WAS, …)

Page 7: Robustness in the Salus scalable block store

Challenges

• Write in parallel and in order• Eliminate single points of failure

– Write: prevent a single node from corrupting data– Read: read safely from one node

• Do not increase replication cost

Page 8: Robustness in the Salus scalable block store

Write in parallel and in order

Metadata server

Data servers

Clients

Write 1 Write 2

Write 2 is committed but write 1 is not.Not allowed for block store.

Page 9: Robustness in the Salus scalable block store

Prevent a single node from corrupting data

Metadata server

Data servers

Clients

Single point of failure

Computation nodes:• Data forwarding, garbage collection, etc• Tablet server (Bigtable), Region server (HBase), etc

Page 10: Robustness in the Salus scalable block store

Read safely from one node

Metadata server

Data servers

Clients

Single point of failure

Page 11: Robustness in the Salus scalable block store

Do not increase replication cost

• Industrial systems: – Write to f+1 nodes and read from one node

• BFT systems: – Write to 2f+1 nodes and read from f+1 nodes

Page 12: Robustness in the Salus scalable block store

Outline

• Challenges• Salus’ overview• Solutions

– Pipelined commit– Active storage– Scalable end-to-end checks

• Evaluation

Page 13: Robustness in the Salus scalable block store

Salus’ approach

Start from a scalable architecture (Bigtable/HBase)

Ensure robustness techniques do not hurt scalability

Page 14: Robustness in the Salus scalable block store

Salus’ interface and model

• Disk-like interface:– A fixed number ….– Single writer– Barrier semantic

• Failure model:– Byzantine but not malicious

Page 15: Robustness in the Salus scalable block store

Salus’ key ideas

• Pipelined commit – Guarantee ordering despite parallel writes

• Active storage– Prevent a computation node from corrupting data

• End-to-end verification – Read safely from one node

Page 16: Robustness in the Salus scalable block store

Salus’ key ideas

Metadata server

Clients

Pipelined commit

Active storage

End-to-end verification

Page 17: Robustness in the Salus scalable block store

Outline

• Challenges• Salus’ overview• Solutions

– Pipelined commit– Active storage– Scalable end-to-end checks

• Evaluation

Page 18: Robustness in the Salus scalable block store

Pipelined commit

• Goal: barrier semantic– A request can be marked as a barrier.– All previous ones must be executed before it.

• Naïve solution:– The client blocks at a barrier: lose parallelism

• A weaker version of distributed transaction– Well-known solution: two phase commit (2PC)

Page 19: Robustness in the Salus scalable block store

Pipelined commit – 2PC

1 2 3

4 5

1 3

2

4 5

Previous leader

PreparedCommitted

Client

Servers

Leader

Prepared

Leader

Batch i

Batch i+1

Page 20: Robustness in the Salus scalable block store

Pipelined commit – 2PC

1 2 3

4 5

1 3

2

4 5

Previous leader

Batch i-1 committed

Client

Servers

Leader

Commit

Batch i committed

CommitLeader

Batch i

Batch i+1

Page 21: Robustness in the Salus scalable block store

Pipelined commit - challenge

• Is 2PC slow?– Additional network messages? Disk is the bottleneck.– Additional disk write? Let’s eliminate that.– Challenge: whether to commit a write after recovery

1 3

22 is prepared. Should it be committed?Both cases are possible.

• Salus’ solution: ask other nodes

Page 22: Robustness in the Salus scalable block store

Active Storage

• Goal: a single node cannot corrupt data• Well-known solution: BFT replication

– Problem: 2f+1 replication cost

• Salus’ solution: use f+1 replicas– Require unanimous consent of the whole quorum– How about availability if one replica fails?– If one replica fails, replace the whole quorum

Page 23: Robustness in the Salus scalable block store

Active Storage

Computation node

Storage nodes

Page 24: Robustness in the Salus scalable block store

Active StorageComputation nodes

Storage nodes

• Unanimous consent:– All updates must be agreed by f+1 computation nodes.

• Additional benefit: – Collocate computation and storage: save network bw

Page 25: Robustness in the Salus scalable block store

Active StorageComputation nodes

Storage nodes

• What if one computation node fails?– Problem: we may not know which one is faulty.

• Replace the whole quorum

Page 26: Robustness in the Salus scalable block store

Active StorageComputation nodes

Storage nodes

• What if one computation node fails?– Problem: we may not know which one is faulty.

• Replace the whole quorum– The new quorum must agree on the states.

Page 27: Robustness in the Salus scalable block store

Active Storage

• Does it provide BFT with f+1 replication?• No ….• During recovery, may accept stale states if:

– The client fails;– At least one storage node provides stale states;– All other storage nodes are not available.

• 2f+1 replicas can eliminate this case:– Is it worth adding f replicas to eliminate that?

Page 28: Robustness in the Salus scalable block store

End-to-end verification

• Goal: read safely from one node– The client should be able to verify the reply.– If corrupted, the client retries another node.

• Well-known solution: Merkle tree– Problem: scalability

• Salus’ solution:– Single writer– Distribute the tree among servers

Page 29: Robustness in the Salus scalable block store

End-to-end verification

Server 1 Server 2 Server 3 Server 4

Client maintains the top tree.

Client does not need to store anything persistently.It can rebuild the top tree from the servers.

Page 30: Robustness in the Salus scalable block store

Recovery

• Pipelined commit– How to ensure write order after recovery?

• Active storage:– How to agree on the current states?

• End-to-end verification– How to rebuild Merkle tree?

Page 31: Robustness in the Salus scalable block store

Discussion – why HBase?

• It’s a popular architecture– Bigtable: Google– HBase: Facebook, Yahoo, …– Windows Azure Storage: Microsoft

• It’s open source.• Why two layers?

– Necessary if storage layer is append-only

• Why append-only storage layer? – Better random write performance– Easy to scale

Page 32: Robustness in the Salus scalable block store

Discussion – multiple writers?

Page 33: Robustness in the Salus scalable block store

Lessons

• Strong checking makes debugging easier.

Page 34: Robustness in the Salus scalable block store

Outline

• Challenges• Salus’ overview• Solutions

– Pipelined commit– Active storage– Scalable end-to-end checks

• Evaluation

Page 35: Robustness in the Salus scalable block store

Evaluation

Page 36: Robustness in the Salus scalable block store

Evaluation

Page 37: Robustness in the Salus scalable block store

Evaluation

Page 38: Robustness in the Salus scalable block store

Read safely from one node

• Read is executed on one node:– Maximize parallelism– Minimize latency

• If that node experiences corruptions, …