eecs 491 introduction to distributed systems · 2019. 9. 26. · case study: gfs google file system...
TRANSCRIPT
![Page 1: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/1.jpg)
EECS 491Introduction to Distributed
Systems
Fall 2019
Harsha V. Madhyastha
![Page 2: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/2.jpg)
Case study: GFS
● Google File System◆ Distributed storage system tailored to Google’s
workload
● Workload characteristics and setting:◆ Multi-GB files◆ Files are mostly appended to◆ Failures are extremely common
September 24, 2019 EECS 491 – Lecture 7 2
![Page 3: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/3.jpg)
High-level Design● Files are split into 64 MB chunks
● Every chunk is replicated on three randomly selected machines
● A central chunkmaster server picks replicas of every chunk
September 24, 2019 EECS 491 – Lecture 7 3
![Page 4: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/4.jpg)
GFS Overview
September 24, 2019 EECS 491 – Lecture 7 4
![Page 5: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/5.jpg)
Replication in GFS
Client Primary
Backup
Backup
Chunkmaster
September 24, 2019 EECS 491 – Lecture 7 5
![Page 6: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/6.jpg)
Replication in GFS
● High latency to distant primary◆ In data center, bandwidth degrades with distance
September 24, 2019 EECS 491 – Lecture 7 6
Client Primary
Backup
Backup
Chunkmaster
![Page 7: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/7.jpg)
Replication in GFS
● High latency to distant primary● Submitting write to nearest replica will
compromise total ordering of writes
September 24, 2019 EECS 491 – Lecture 7 7
Client Primary
Backup
Backup
Chunkmaster Client2
![Page 8: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/8.jpg)
Replication in GFS
● High latency to distant primary● Writing to nearest replica compromises total ordering● Optimize performance without violating consistency?
September 24, 2019 EECS 491 – Lecture 7 8
Client Primary
Backup
Backup
Chunkmaster Client2
![Page 9: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/9.jpg)
Data flow vs. Control flow
September 24, 2019 EECS 491 – Lecture 7 9
![Page 10: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/10.jpg)
GFS Performance Benchmark
September 24, 2019 EECS 491 – Lecture 7 10
![Page 11: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/11.jpg)
Implementing RSMs
● Logical clock based ordering of requests◆ Cannot serve requests if any one replica is down
● Primary-backup replication◆ Replace primary/backup upon failure
September 24, 2019 EECS 491 – Lecture 7 11
![Page 12: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/12.jpg)
Availability of P/B-based RSM● When is RSM unavailable to serve requests?
● Temporarily:◆ While primary is bootstrapping new backup◆ Replica is down but viewservice yet to detect
● Permanently:◆ Primary↔backup down but both can talk to viewservice◆ Primary fails while bootstrapping backup
September 24, 2019 EECS 491 – Lecture 7 12
![Page 13: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/13.jpg)
How to …
● … make RSM tolerant to network partitions?
● … ensure that operations don’t block even if some machines are unavailable?
September 24, 2019 EECS 491 – Lecture 7 13
![Page 14: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/14.jpg)
RSM via Consensus
● Idea: Apply update if majority of replicas commit● If 2f+1 replicas, need f+1 to commit
● Why majority? Why not fewer or more?● Remaining replicas cannot accept some other
update
September 24, 2019 EECS 491 – Lecture 7 14
![Page 15: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/15.jpg)
Context for Today’s Lecture
● Say all replicas are in sync with each other
● First: Among several concurrent new updates, how to pick next update to apply?
● Later: How to apply all updates in a consistent order at all replicas?
September 24, 2019 EECS 491 – Lecture 7 15
![Page 16: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/16.jpg)
Let’s plan a camping trip!● Before going away on
internships, three friends plan on going camping
● Can only coordinate via unreliable text messages
● How to decide on a camp to meet at?
September 24, 2019 EECS 491 – Lecture 7 16
Alice
BobSam
![Page 17: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/17.jpg)
Strawman Approaches● Every user sends their proposal to everyone● Every user accepts first proposal received● Proposal accepted by majority is chosen● Why might this not work?
● Every user tags proposal with seq number● Every user collects proposals and accepts
highest seq number proposal● Why might this not work?
September 24, 2019 EECS 491 – Lecture 7 17
![Page 18: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/18.jpg)
Paxos
● Original paper submitted in 1990◆ Tells mythical story of Greek island of Paxos with
“legislators” and “current law” passed through parliamentary voting protocol
● Widely used in industry todaySeptember 24, 2019 EECS 491 – Lecture 7 18
![Page 19: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/19.jpg)
Desirable Properties● Safety
◆ “No bad things happen”◆ System never reaches an undesirable state
● Liveness◆ “Good things eventually happen”◆ System makes progress eventually
● Tradeoff between consistency and latency
September 24, 2019 EECS 491 – Lecture 7 19
![Page 20: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/20.jpg)
Desired Properties of Solution
● Safety:◆ Choose a proposal only if accepted by a majority◆ Choose from proposals made
● Liveness:◆ If proposals exist, one will eventually be chosen◆ If a proposal is chosen, all replicas will eventually
discover that it was chosen
September 24, 2019 EECS 491 – Lecture 7 20
![Page 21: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/21.jpg)
Project 2● View service:
◆ Draw state diagram◆ What events can cause view to change?◆ What state determines how to change view?◆ Think about cases not covered by unit tests
● Primary backup service:◆ Carefully think about implications of every failed RPC◆ Sleep for PingInterval before retrying RPC
September 24, 2019 EECS 491 – Lecture 7 21
![Page 22: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/22.jpg)
Roles of a Process in Paxos
● Three conceptual roles◆ Proposers propose values◆ Acceptors accept values; chosen if majority accept◆ Learners learn the outcome (chosen value)
● In reality, a process can play any/all roles● Roles in bank account example?● Roles in camping trip example?● Roles if Paxos used to pass laws?
22September 24, 2019 EECS 491 – Lecture 7
![Page 23: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/23.jpg)
Paxos: High-level Intuition
● May be unable to reach consensus in one round● So, protocol runs over multiple rounds● In each round:
◆ Elect a leader◆ Proposal by current leader accepted by majority
● Once a value is accepted by a majority, a diff. value won’t be proposed subsequently
September 24, 2019 EECS 491 – Lecture 7 23
![Page 24: EECS 491 Introduction to Distributed Systems · 2019. 9. 26. · Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics](https://reader036.vdocuments.us/reader036/viewer/2022062318/601e1262f60c1430325c0bdc/html5/thumbnails/24.jpg)
Paxos Overview● Three phases within each round● Prepare phase (elect a leader):
◆ Proposer sends unique proposal no. to all acceptors◆ Waits to get commitment from majority of acceptors
● Accept phase (get majority to accept):◆ Proposer sends proposed value to all acceptors◆ Waits to get proposal accepted by majority
● Learn phase (disseminate chosen value):◆ Learners discover value accepted by majority
September 24, 2019 EECS 491 – Lecture 7 24