introduction to distributed systems · distributed systems • tannenbaum: • distributed system...
TRANSCRIPT
![Page 1: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/1.jpg)
Introduction to Distributed Systems
SWE 622, Spring 2017Distributed Software Engineering
![Page 2: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/2.jpg)
J. Bell GMU SWE 622 Spring 2017
Today• Logistics + introductions • Distributed systems: high level overview and key
concepts • Homework description and introduction
• Time permitting, we’ll step into some code
• Relevant links: • HW 1: http://www.jonbell.net/swe-622-
spring-2017/homework-1/
2
![Page 3: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/3.jpg)
J. Bell GMU SWE 622 Spring 2017
Course Topics• This course will teach you how and why to build
distributed systems • This course will give you theoretical knowledge of
the tradeoffs that you’ll face when building distributed systems
• This course will give you significant hands-on experience working with real distributed systems, working with well-used systems like Redis and Zookeeper
3
![Page 4: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/4.jpg)
J. Bell GMU SWE 622 Spring 2017
Prerequisites• “SWE Foundation or equivalent”, AKA:
• INFS 501 Discrete and Logical Structures for Information Systems
• INFS 515 Computer Organization • INFS 519 Program Design and Data Structures • SWE 510 Object Oriented Programming in Java
• You need to know how to program Java • Awareness of threads and related
synchronization issues is a big plus, but not mandatory
4
![Page 5: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/5.jpg)
J. Bell GMU SWE 622 Spring 2017
Logistics• Syllabus: http://www.jonbell.net/swe-622-spring-2017/
• 50% homework (we’ll come back to this) • 20% midterm • 20% final • 10% participation
• Piazza for Q+A • Reminders
• Honor code • Late policy (10% deducted if < 24 hrs late, no credit
after 24 hrs late) • NO extra credit
5
![Page 6: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/6.jpg)
J. Bell GMU SWE 622 Spring 2017
Introductions• Prof Jonathan Bell (me)
• Office hour: ENGR 4422 Weds 3:30-4:30 pm or by appointment; can do Google Hangouts too.
• Areas of research: Software Engineering, Program Analysis, Software Systems
6
Two hobbies: cycling, ice cream
![Page 7: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/7.jpg)
J. Bell GMU SWE 622 Spring 2017
Introductions (from you!)
7
![Page 8: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/8.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems• Tannenbaum:
• Distributed System is “a collection of independent computers that appears to its users as a single coherent system”
• Takada: • “Given infinite money and infinite R&D time, we
wouldn't need distributed systems. All computation and storage could be done on a magic box - a single, incredibly fast and incredibly reliable system that you pay someone else to design for you.”
8
![Page 9: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/9.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems
9
Model: Many servers talking through cloud
![Page 10: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/10.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems
10
Model: Servers and Clients talking through cloud
![Page 11: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/11.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems
11
Model: Many clients talking through cloud
![Page 12: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/12.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems
12
Model: Two clients talking through cloud
![Page 13: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/13.jpg)
J. Bell GMU SWE 622 Spring 2017
What do we want from Distributed Systems?
• Scalability • Performance • Latency • Availability • Fault Tolerance
13
“Distributed Systems for Fun and Profit”, Takada
![Page 14: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/14.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability• Performance • Latency • Availability • Fault Tolerance
“the ability of a system, network, or process, to handle a growing
amount of work in a capable manner or its ability to be enlarged to accommodate that growth.”
14
“Distributed Systems for Fun and Profit”, Takada
![Page 15: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/15.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability • Performance• Latency • Availability • Fault Tolerance
15
“is characterized by the amount of useful work accomplished by a
computer system compared to the time and resources used.”
![Page 16: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/16.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability • Performance • Latency• Availability • Fault Tolerance
16
“The state of being latent; delay, a period between the initiation of something and the it becoming
visible.”
![Page 17: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/17.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability • Performance • Latency • Availability• Fault Tolerance
17
“the proportion of time a system is in a functioning condition. If a user
cannot access the system, it is said to be unavailable.”
Availability = uptime / (uptime + downtime).
Availability % Downtime/year90% >1 month99% < 4 days
99.9% < 9 hours99.99% <1 hour99.999% 5 minutes
99.9999% 31 seconds
Often measured in “nines”
![Page 18: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/18.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability • Performance • Latency • Availability • Fault Tolerance
18
“ability of a system to behave in a well-defined manner once faults
occur”
What kind of faults?
Disks failPower supplies fail
Power goes out
Networking failsSecurity breached
Datacenter goes offline
![Page 19: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/19.jpg)
J. Bell GMU SWE 622 Spring 2017
More machines, more problems
• Say there’s a 1% chance of having some hardware failure occur to a machine (power supply burns out, hard disk crashes, etc)
• Now I have 10 machines • Probability(at least one fails) = 1 - Probability(no
machine fails) = 1-(1-.01)10 = 10% • 100 machines -> 63% • 200 machines -> 87% • So obviously just adding more machines doesn’t
solve fault tolerance
19
![Page 20: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/20.jpg)
J. Bell GMU SWE 622 Spring 2017
Constraints• Number of nodes • Distance between nodes
20
![Page 21: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/21.jpg)
J. Bell GMU SWE 622 Spring 2017
Constraints• Number of nodes • Distance between nodes
21DC
NY
LONDON
SFEven if cross-city links are fast and cheap (are they?) Still that pesky speed of light…
![Page 22: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/22.jpg)
J. Bell GMU SWE 622 Spring 2017
Recurring Solution #1: Partitioning
22
A B
All accesses go to single server
![Page 23: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/23.jpg)
J. Bell GMU SWE 622 Spring 2017
Recurring Solution #1: Partitioning
• Divide data up in some (hopefully logical) way • Makes it easier to process data concurrently (cheaper
reads)
23
A [0…100]
B [A…N]
A [101.. 200]
B [O…
Z]
Each server has 50% of data, limits amount of processing per server.
Even if 1 server goes down, still have 50% of the data online.
![Page 24: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/24.jpg)
J. Bell GMU SWE 622 Spring 2017
Recurring Solution #2: Replication
24
A B
All accesses go to single server
![Page 25: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/25.jpg)
J. Bell GMU SWE 622 Spring 2017
Recurring Solution #2: Replication
25
A B
Entire data set is copied
A B
![Page 26: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/26.jpg)
J. Bell GMU SWE 622 Spring 2017
Recurring Solution #2: Replication
• Improves performance: • Client load can be evenly shared between
servers • Reduces latency: can place copies of data
nearer to clients • Improves availability:
• One replica fails, still can serve all requests from other replicas
26
![Page 27: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/27.jpg)
J. Bell GMU SWE 622 Spring 2017
Partitioning + Replication
27
A B
![Page 28: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/28.jpg)
J. Bell GMU SWE 622 Spring 2017
Partitioning + Replication
28
A [0…100]
B [A…N]
A [101.. 200]
B [O…
Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…
Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…
Z]
![Page 29: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/29.jpg)
J. Bell GMU SWE 622 Spring 2017
Partitioning + Replication
29
A [0…100]
B [A…N]
A [101.. 200]
B [O…Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…
Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…
Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…
Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…
Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…
Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…Z]
DC NYC
LondonSF
A [0…100]
B [A…N]
A [101.. 200]
B [O…Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…Z]
A [0…100]
B [A…N]
A [101.. 200]
B [O…Z]
![Page 30: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/30.jpg)
J. Bell GMU SWE 622 Spring 2017
Recurring Problem: Replication• Replication solves some problems, but creates a
huge new one: consistency
30
A B A B
Set A=5
6 7 765
“OK”! Read A “6”!
OK, we obviously need to actually do something here to replicate the data… but what?
![Page 31: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/31.jpg)
J. Bell GMU SWE 622 Spring 2017
Replication• Was it OK for the replicas to be out of sync? • When they diverge, we say that they are not
consistent • What is consistent? • For now, let's talk about sequential consistency,
which will guarantee that the data updates exactly as if there were no replication
31
![Page 32: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/32.jpg)
J. Bell GMU SWE 622 Spring 2017
Sequential Consistency
32
A B A B
Set A=5
6 7 765
“OK”! Read A “5”!
Set A=5
“OK!”
5
![Page 33: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/33.jpg)
J. Bell GMU SWE 622 Spring 2017
Broken Sequential Consistency
33
A B A B
Set A=5
6 7 765
Read A
Set A=5
![Page 34: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/34.jpg)
J. Bell GMU SWE 622 Spring 2017
Availability• Our protocol for sequential consistency does NOT
guarantee that the system will be available!
34
A B A B
Set A=5
6 7 765
Read A
Set A=5
![Page 35: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/35.jpg)
J. Bell GMU SWE 622 Spring 2017
Consistent + Available
35
A B A B
Set A=5
6 7 765
“OK”! “6”!
Set A=5
Read A
Assume replica failed
![Page 36: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/36.jpg)
J. Bell GMU SWE 622 Spring 2017
Still broken...
36
A B A B
Set A=5
6 7 765
“OK”!
Set A=5Assume
replica failed
Read A “6”!
![Page 37: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/37.jpg)
J. Bell GMU SWE 622 Spring 2017
Network Partitions• The communication links between nodes may fail
arbitrarily • But other nodes might still be able to reach that
node
37
A B A B
Set A=5
6 7 765
“OK”!
Set A=5Assume
replica failed
Read A “6”!
![Page 38: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/38.jpg)
J. Bell GMU SWE 622 Spring 2017
CAP Theorem• Pick two of three:
• Consistency: All nodes see the same data at the same time (strong consistency)
• Availability: Individual node failures do not prevent survivors from continuing to operate
• Partition tolerance: The system continues to operate despite message loss (from network and/or node failure)
• You can not have all three, ever*• If you relax your consistency guarantee (we’ll talk about
in a few weeks), you might be able to guarantee THAT…
38
![Page 39: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/39.jpg)
J. Bell GMU SWE 622 Spring 2017
CAP Theorem• C+A: Provide strong consistency and availability,
assuming there are no network partitions • C+P: Provide strong consistency in the presence of
network partitions; minority partition is unavailable • A+P: Provide availability even in presence of
partitions; no strong consistency guarantee
39
![Page 40: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/40.jpg)
J. Bell GMU SWE 622 Spring 2017
Still broken...
40
A B A B
Set A=5
6 7 765
“OK”! Read A “6”!
Set A=5
“OK!”
![Page 41: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/41.jpg)
J. Bell GMU SWE 622 Spring 2017
Byzantine Failures• We typically assume that we can control how our
actors behave • But perhaps, some begin to behave arbitrarily • Generally, this is very very hard (computationally)
to control for, and is often ignored in real systems
41
![Page 42: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/42.jpg)
J. Bell GMU SWE 622 Spring 2017
Designing and Building Distributed Systems
To help design our algorithms and systems, we tend to leverage abstractions and models to make assumptions
42
Stre
ngth
System model
Synchronous
Asynchronous
Failure Model
Crash-fail
Partitions
Byzantine
Consistency ModelEventual
Sequential
Generally: Stronger assumptions -> worse performance Weaker assumptions -> more complicated
![Page 43: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/43.jpg)
J. Bell GMU SWE 622 Spring 2017
Review• Distributed systems can help us with:
• Scalability • Performance • Latency • Availability • Fault Tolerance
• We usually partition + replicate our data to achieve these goals
• Replication is not trivial
43
![Page 44: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/44.jpg)
J. Bell GMU SWE 622 Spring 2017
A Distributed Filesystem
44
$echo“test”>f1$lsf1$catf1test
![Page 45: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/45.jpg)
J. Bell GMU SWE 622 Spring 2017
A Distributed Filesystem
45
$echo“test”>f1 $lsf1$catf1test
![Page 46: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/46.jpg)
J. Bell GMU SWE 622 Spring 2017
A Distributed Filesystem• This semester, you will create a distributed filesystem • Motivation:
• Storing files in memory is faster than on disk (much lower latency)
• But… • Practical limits of MB/machine • Very ephemeral: machine crashes/reboots, it’s gone
• Solution: • Store data in memory of many computers, can
partition (to store more than can fit on 1), replicate (fault tolerance)
• Use a permanent backing store to keep a canonical store of files
46
![Page 47: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/47.jpg)
J. Bell GMU SWE 622 Spring 2017
High level design questions• How do we expose the system?
• Files (e.g. NFS)? • Blocks (e.g. SANs)? • Key/value (e.g. S3)? • Database?
• No right answer for a generic solution - always tradeoffs
• For practicality, we'll use a file interface, but make it look a lot like a key/value interface for simplicity
47
![Page 48: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/48.jpg)
J. Bell GMU SWE 622 Spring 2017
High level design questions• How do we maintain consistency?
• What if two clients try to move the same folder at the same time? Or create a new file at the same time?
• How do we scale? How do we partition? • How do we maintain consistency despite node
failures? • How do clients/servers communicate?
• Are we writing our own network protocol stack? • How do we handle concurrent clients?
48
![Page 49: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/49.jpg)
J. Bell GMU SWE 622 Spring 2017
CloudFS
49
Dropbox API
CFS
Homework 1: Single client, ignore failures, stores files in Dropbox and caches them
locally
![Page 50: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/50.jpg)
J. Bell GMU SWE 622 Spring 2017
CloudFS
50
Dropbox API
CFS CFS
CFS Lock Server
Homework 2: Multiple clients, ignore failures,
stores files in Dropbox and caches them locally
![Page 51: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/51.jpg)
J. Bell GMU SWE 622 Spring 2017
CloudFS
51
Dropbox API
CFS CFS
CFS Lock Server
Homework 3: Multiple clients, cache replicated and partitioned between
clients
CFS
![Page 52: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/52.jpg)
J. Bell GMU SWE 622 Spring 2017
CloudFS
52
Dropbox API
CFS CFS
Homework 4: Replicated, consensus-based lock service removes single
point of failure
CFS
![Page 53: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/53.jpg)
J. Bell GMU SWE 622 Spring 2017
CloudFS
53
Dropbox API
CFS CFS
Homework 5: Fault tolerance and recovery
CFS
![Page 54: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/54.jpg)
J. Bell GMU SWE 622 Spring 2017
CloudFS
54
Dropbox API
CFS CFS
Homework 6: Auditing + Security with Blockchains
CFS
![Page 55: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/55.jpg)
J. Bell GMU SWE 622 Spring 2017
Homework 1• Due 2/8, 4:00pm • Base code is provided to connect to Dropbox, create and mount a
filesystem • Your job: add a cache • Warning: it's tricky! Even with just 1 client, you can still have
concurrency (multiple apps on same machine accessing same files)
• Next week: java concurrency refresher • Very important HW logistics reminders:
• Late policy (>24 hrs is not accepted) • Submission is via GitHub Classroom (use invite link to get your
repository, do NOT fork mine directly), you must make a release to submit
• If it doesn’t run in the provided VM, you won’t do very well.
55
![Page 56: Introduction to Distributed Systems · Distributed Systems • Tannenbaum: • Distributed System is “a collection of independent computers that appears to its users as a single](https://reader033.vdocuments.us/reader033/viewer/2022052719/5f0729097e708231d41b9b3a/html5/thumbnails/56.jpg)
J. Bell GMU SWE 622 Spring 2017
Online activity
Go to: https://b.socrative.com/ and click on Login (student), then “SWE622” for room name
56