distributed systems - tuni · distributed systems principles and paradigms chapter 12 (version...
TRANSCRIPT
Distributed SystemsPrinciples and Paradigms
Chapter 12(version April 7, 2008)
Maarten van Steen
Vrije Universiteit Amsterdam, Faculty of ScienceDept. Mathematics and Computer Science
Room R4.20. Tel: (020) 598 7784E-mail:[email protected], URL: www.cs.vu.nl/∼steen/
01 Introduction02 Architectures03 Processes04 Communication05 Naming06 Synchronization07 Consistency and Replication08 Fault Tolerance09 Security10 Distributed Object-Based Systems11 Distributed File Systems12 Distributed Web-Based Systems13 Distributed Coordination-Based Systems
00 – 1 /
Distributed Web-Based Systems
Essence: The WWW is a huge client-server systemwith millions of servers; each server hosting thousandsof hyperlinked documents:
Client machine
Browser
OS
Server machine
Web server
1. Get document request (HTTP)
3. Response
2. Server fetchesdocument fromlocal file
• Documents are generally represented in text (plaintext, HTML, XML)
• Alternative types: images, audio, video, but alsoapplications (PDF, PS)
• Documents may contain scripts that are executedby the client-side software
12 – 1 Distributed Web-Based Systems/12.1 Architecture
Multi-tiered Architectures
Observation: Already very soon, Web sites were or-ganized into three tiers:
Web server Database serverCGI process
CGI program
1. Get request
3. Start process to fetch document
5. HTML document created
HTTP request handler6. Return result
4. Database interaction
12 – 2 Distributed Web-Based Systems/12.1 Architecture
Web Services
Observation: At a certain point, people started rec-ognizing that it is was more than just user ↔ site in-teraction: sites could offer services to other sites ⇒
standardization is then badly needed.
Service description (WSDL)
Client machine
Client application
Stub
Server application
Stub
Communication subsystem
Communication subsystem
SOAP
Service description (WSDL)Service description (WSDL)
Directory service (UDDI)
Publish serviceLook up
a service
Generate stub from WSDL description
Server machine
Generate stub from WSDL description
12 – 3 Distributed Web-Based Systems/12.1 Architecture
Clients: Web browsers
Observation: browsers form the Web’s most impor-tant client-side sofware. They used to be simple, butthat is long ago.
User interface
Browser engine
Rendering engine
Network comm.
HTML/XML parser
Display back end
Client-side script
interpreter
12 – 4 Distributed Web-Based Systems/12.2 Processes
Apache Web Server
Observation: More than 70% of all Web sites arebased on Apache. The server is internally organizedmore or less according to the steps needed to processan HTTP request:
Hook Hook Hook Hook
Function
... ... ...
Module Module Module
Apache coreFunctions called per hook
Link between function and hook
Request Response
12 – 5 Distributed Web-Based Systems/12.2 Processes
Server Clusters (1/2)
Essence: To improve performance and availability,WWW servers are often clustered in a way that istransparent to clients:
Frontend
Webserver
Webserver
Webserver
Webserver
Request Response
Front end handlesall incoming requestsand outgoing responses
LAN
Problem: The front end may easily get overloaded,so that special measures need to be taken.
Transport-layer switching: Front end simply passesthe TCP request to one of the servers, taking someperformance metric into account.
Content-aware distribution: Front end reads the con-tent of the HTTP request and then selects thebest server.
12 – 6 Distributed Web-Based Systems/12.2 Processes
Server Clusters (2/2)
Question: Why can content-aware distribution be somuch better?
SwitchClient
Webserver
Webserver
Distributor
Distributor
Dis-patcher
1. Pass setup requestto a distributor
2. Dispatcher selectsserver
3. Hand offTCP connection
4. InformswitchSetup request
Other messages
5. Forwardothermessages
6. Server responses
12 – 7 Distributed Web-Based Systems/12.2 Processes
Communication (1/2)
Essence: Communication in the Web is generally basedon HTTP; a relatively simple client-server transfer pro-tocol having the following request messages:
OperationDescription
Head Request to return the header of a documentGet Request to return a document to the clientPut Request to store a documentPost Provide data that are to be added to a docu-
ment (collection)Delete Request to delete a document
12 – 8 Distributed Web-Based Systems/12.3 Communication
Communication (2/2)
HeaderC/S
ContentsAccept C The type of documents the client can handle
Accept-Charset C The character sets are acceptable for the client
Accept-Encoding
C The document encodings the client can handle
Accept-Language
C The natural language the client can handle
Authorization C A list of the client’s credentials
WWW-Authenticate
S Security challenge the client should respond to
Date C+S Date and time the message was sent
ETag S The tags associated with the returned document
Expires S The time for how long the response remains valid
From C The client’s e-mail address
Host C The TCP address of the document’s server
If-Match C The tags the document should have
If-None-Match C The tags the document should not have
If-Modified-Since
C Tells the server to return a document only if it hasbeen modified since the specified time
If-Unmodified-Since
C Tells the server to return a document only if it hasnot been modified since the specified time
Last-Modified S The time the returned document was last modified
Location S A document reference to which the client shouldredirect its request
Referer C Refers to client’s most recently requested document
Upgrade C+S The application protocol sender wants to switch to
Warning C+S Information about status of the data in the message
12 – 9 Distributed Web-Based Systems/12.3 Communication
SOAP
Simple Object Access Protocol: Based on XML,this is the standard protocol for communication be-tween Web services.
• SOAP is bound to an underlying protocol (i.e., itis not independent from its carrier)
• Conversational exchange style: Send a docu-ment one way, get a filled-in response back.
• RPC-style exchange: Used to invoke a Web ser-vice.
12 – 10 Distributed Web-Based Systems/12.3 Communication
A Note on XML
Observation: XML has the advantage of allowing self-describing documents. Full stop (i.e., it introducesperformance problems and is not meant to be readby human beings)
env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"><env:Header>
<n:alertcontrol xmlns:n="http://example.org/alertcontrol"><n:priority>1</n:priority><n:expires>2001-06-22T14:00:00-05:00</n:expires>
</n:alertcontrol></env:Header><env:Body>
<m:alert xmlns:m="http://example.org/alert"><m:msg>Pick up Mary at school at 2pm</m:msg>
</m:alert></env:Body>
</env:Envelope>
12 – 11 Distributed Web-Based Systems/12.3 Communication
Naming: URL
URL: Uniform Resource Locator tells how and whereto access a resource.
Scheme Host name Pathname
Scheme Host name Port Pathname
Scheme Host name Port Pathname
http
http
http
://
://
://
www.cs.vu.nl
www.cs.vu.nl
130.37.24.11
:
:
80
80
/home/steen/mbox
/home/steen/mbox
/home/steen/mbox
(a)
(b)
(c)
Examples:http HTTP http://www.cs.vu.nl:80/globe
mailto Mail mailto:[email protected]
ftp FTP ftp://ftp.cs.vu.nl/pub/minix/README
file Local file file:/edu/book/work/chp/11/11
data Inline data data:text/plain;charset=iso-8859-7,%e1%e2%e3
telnet Remote login telnet://flits.cs.vu.nl
tel Telephone tel:+31201234567
modem Modem modem:+31201234567;type=v32
12 – 12 Distributed Web-Based Systems/12.4
Synchronization: WebDAV
Problem: There is a growing need for collaborativeauditing of Web documents, but bare-bones HTTP can’thelp here. Solution: Web Distributed Authoring andVersioning.
• Supports exclusive and shared write locks, whichoperate on entire documents
• A lock is passed by means of a lock token; theserver registers the client(s) holding the lock
• Clients modify the document locally and post itback to the server along with the lock token
Note: There is no specific support for crashed clientsholding a lock.
12 – 13 Distributed Web-Based Systems/12.5 Synchronization
Web Proxy Caching
Basic idea: Sites install a separate proxy server thathandles all outgoing requests. Proxies subsequentlycache incoming documents. Cache-consistency pro-tocols:
• Always verify validity by contacting server• Age-based consistency:
Texpire = α · (Tcached − Tlast modi f ied) + Tcached
• Cooperative caching, by which you first check yourneighbors on a cache miss:
Webproxy
Webserver
Webproxy
WebproxyCache
Cache
Cache
Client
Client
ClientClient
Client
ClientClient
Client
Client
2. Ask neighboring proxy caches
1. Look inlocal cache
HTTP Get request
3. Forward requestto Web server
12 – 14 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication in Web HostingSystems
Observation: By-and-large, Web hosting systems areadopting replication to increase performance. Muchresearch is done to improve their organization. Fol-lows the lines of self-managing systems:
Web hosting system
Metric estimation
Analysis
+/-+/-+/-
Reference input
Initial configuration
Uncontrollable parameters (disturbance / noise)
Observed output
Measured outputAdjustment triggers
Corrections
Replica placement
Consistency enforcement
Request routing
12 – 15 Distributed Web-Based Systems/12.6 Consistency and Replication
Handling Flash Crowds
Observation: We need dynamic adjustment to bal-ance resource usage. Flash crowds introduce a se-rious problem:
(a) (b)
(c) (d)
2 days 2 days
6 days 2.5 days
12 – 16 Distributed Web-Based Systems/12.6 Consistency and Replication
Server Replication
Content Delivery Network: CDNs act as Web host-ing services to replicate documents across the Inter-net providing their customers guarantees on high avail-ability and performance (example: Akamai).
Origin server
Client
CDN server
CDN DNS server
Regular DNS system
Cache
1. Get base document
2. Document with refs to embedded documents
6. Get embedded documents (if not already cached)
5. Get embedded documents
7. Embedded documentsReturn IP address client-best server
DNS lookups 3
4
Question: How would consistency be maintained inthis system?
12 – 17 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (1/3)
Observation: Replication becomes more difficult whendealing with databses and such. No single best solu-tion.
Authoritative databaseSchema Schema
Server Serverquery
response
full/partial data replication
full schema replication/ query templates
Content-blind cache
Content-aware cache
Database copy
Client
Edge-server side Origin-server side
Assumption: Updates are carried out at origin server,and propagated to edge servers.
12 – 18 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (2/3)
Authoritative databaseSchema Schema
Server Serverquery
response
full/partial data replication
full schema replication/ query templates
Content-blind cache
Content-aware cache
Database copy
Client
Edge-server side Origin-server side
• Full replication: high read/write ratio, often incombination with complex queries. Note: replica-tion may possibly speed-down performance whenR/W ratio goes down.
• Partial replication: high read/write ratio, but incombination with simple queries
12 – 19 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (3/3)
Authoritative databaseSchema Schema
Server Serverquery
response
full/partial data replication
full schema replication/ query templates
Content-blind cache
Content-aware cache
Database copy
Client
Edge-server side Origin-server side
• Content-aware caching: Check for queries at lo-cal database, and subscribe for invalidations atthe server. Works good with range queries andcomplex queries.
• Content-blind caching: Simply cache the resultof previous queries. Works great with simple queriesthat address unique results (e.g., no range queries).
12 – 20 Distributed Web-Based Systems/12.6 Consistency and Replication
Security: TLS (SSL)
Transport Layer Security: Modern version of thethe Secure Socket Layer (SSL), which “sits” betweentransport layer and application protocols. Relativelysimple protocol that can support mutual authentica-tion using certificates:
Clie
nt
Ser
ver
[ K
[ K
+
+
S
C
CA
CA
]
]
([ R ] CKS+ )
Possibilities
Choices
1
2
3
4
5
12 – 21 Distributed Web-Based Systems/12.6 Consistency and Replication