distributed web based systems

45
Distributed Web-Based Systems Department of Engineering – Information Technology Reza Ghanbari 2010

Upload: reza-gh

Post on 06-May-2015

5.186 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Distributed web based systems

Distributed Web-Based Systems

Department of Engineering – Information Technology

Reza Ghanbari2010

Page 2: Distributed web based systems

Outline• WWW

• URL

• Web Documents

• HTTP– Connections

– Methods

– Messages

– Caching

• Content Distribution Network

• Web Service– Terminology

• Architecture– Traditional Web Based Systems

– Multi-tiered Web Based Systems

• Web Server Clusters

• Web Security– SSL

• References

Page 3: Distributed web based systems

World Wide Web

It is a wide distributed system with millions of clients and servers for accessing

linked documents.

Servers maintain collections of documents while clients provide users an easy-

to-use interface for presenting and accessing those documents.

A document is fetched from a server, transferred to a client, and presented on

the screen.

There is conceptually no difference between a document stored locally or in

another part of the world for any user.

Now, Web has become more than just a simple document based system.

With the emergence of Web Services, it is becoming a system of distributed

services rather than just documents offered to any user or machine.

Page 4: Distributed web based systems

Uniform Resource Locator

A reference called Uniform Resource Locator (URL) is used to refer a document.

The DNS name of its associated server along with a file name is specified.

Example: http://www.example.sharif.edu/notes/WebBasedDistributedSystem.ppt

Page 5: Distributed web based systems

WEB DOCUMENTS

A Web document does not only contain text, but it can include

all kinds of dynamic features such as audio, video, animations,

etc.

In many cases special helper applications (interpreters) are

needed, and they are integrated into the browser.

The main part of Web documents are written in a markup

language, such as

HyperText Markup Language (HTML) and

eXtensible Markup Language (XML)

Page 6: Distributed web based systems

WEB DOCUMENTS

HTML and XML can include tags that refer to embedded documents, which are references to other files.

An embedded document can be a complete program executed on-the-fly as part of displaying information.

Multipurpose Internet Mail Exchange (MIME) is used to specify the type of an embedded document.

MIME was originally developed to provide information on the content of e-mail messages.

Page 7: Distributed web based systems

WEB DOCUMENTS

Six top-level Multipurpose Internet Mail Exchange types and some common subtypes.

Page 8: Distributed web based systems

HTTP

All communication between the clients and servers is based on the HTTP. Servers listen on port 80.

HTTP is a simple protocol; a client sends a request to a server and waits for a response.

HTTP is based on TCP; whenever a client issues a request to a server, it first sets up a TCP connection and sends the message on that connection. The same connection is used for receiving the response.

One of the problems with the first versions of HTTP was its inefficient use of TCP connections.

HTTP 1.0 vs. HTTP 1.1

Page 9: Distributed web based systems

HTTP CONNECTIONS

A Web document is constructed from a collection of different files from the same server.

In HTTP version 1.0 and older, each request to a server required setting up a separate connection. When server had responded the connection was broken down. These connections are referred as non-persistent.

In HTTP version 1.1, several requests and their responses can be issued without the need for a separate connection. These connections are referred as persistent.

Furthermore, a client can issue several requests in a row without waiting for the response to the first request which is referred as pipelining.

Page 10: Distributed web based systems

HTTP CONNECTIONS

(a) Using non-persistent connections. (b) Using persistent connections.

Page 11: Distributed web based systems

HTTP Operations

Page 12: Distributed web based systems

HTTP MESSAGES (Request)

Page 13: Distributed web based systems

HTTP MESSAGES (Response)

Status code (Phrase): 200 (OK), 400 (Bad Request), 403 (Forbidden), and 404 (Not Found).

Page 14: Distributed web based systems

HTTP MESSAGES (Response) There are also various message headers that the client

can send to the server explaining what it is able to accept as a response

Page 15: Distributed web based systems

HTTP MESSAGES (Response)

Page 16: Distributed web based systems

HTTP Caching

• Clients often cache documents– Challenge: update of documents

– If-Modified-Since requests to check

• HTTP 0.9/1.0 used just date

• HTTP 1.1 has an opaque “entity tag” (could be a file signature, etc.) as well

• When/how often should the original be checked for changes?– Check every time?

– Check each session? Day? Etc?

– Use “Expires” header

• If no Expires, often use Last-Modified as estimate

16

Page 17: Distributed web based systems

Example Cache Check RequestGET / HTTP/1.1Accept: */*Accept-Language: en-usAccept-Encoding: gzip, deflateIf-Modified-Since: Mon, 29 Jan 2001 17:54:18 GMTIf-None-Match: "7a11f-10ed-3a75ae4a"User-Agent: Mozilla/4.0 (compatible; MSIE 5.5;

Windows NT 5.0)Host: www.intel-iris.netConnection: Keep-Alive

17

Page 18: Distributed web based systems

Example Cache Check Response

HTTP/1.1 304 Not Modified

Date: Tue, 27 Mar 2001 03:50:51 GMT

Server: Apache/1.3.14 (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24

Connection: Keep-Alive

Keep-Alive: timeout=15, max=100

ETag: "7a11f-10ed-3a75ae4a"

18

Page 19: Distributed web based systems

Problems• Over 50% of all HTTP objects are un-cacheable .• Not easily solvable

– Dynamic data : stock prices, scores, web cams

– CGI scripts : results based on passed parameters

– SSL : encrypted data is not cacheable

• Most web clients don’t handle mixed pages well : many generic objects transferred with SSL

– Cookies : results may be based on passed data

– Hit metering : owner wants to measure # of hits for revenue, etc.

19

Page 20: Distributed web based systems

Server Selection

• Lowest load :

– to balance load on servers

• Best performance :

– to improve client performance

• Any alive node :

– to provide fault tolerance

• How to direct clients to a specific server?

– Cluster load balancing : TCP hand-off

– As part of application : HTTP redirect

– As part of naming : DNS20

Page 21: Distributed web based systems

Application-Based Redirection

• HTTP supports simple way to indicate that Web page has moved (30X responses)

• Server receives Get request from client– Decides which server is best suited for particular client and

object– Returns HTTP redirect to that server

• May introduce additional overhead :– multiple connection setup, name lookups, etc.

21

Page 22: Distributed web based systems

Naming Based

• Client does name lookup for service

• Name server chooses appropriate server address

– A record returned is “best” one for the client

• Name server could base decision on

– Server load/location must be collected

– Information in the name lookup request

• Name service client :

– typically the local name server for client

22

Page 23: Distributed web based systems

Web Proxy Caches

23

client

Proxyserver

client

HTTP request

HTTP request

HTTP response

HTTP response

HTTP request

HTTP response

origin server

origin server

• User configures browser: Web accesses via cache• Browser sends all HTTP requests to cache

– Object in cache: cache returns object – Else cache requests object from origin server, then returns object to

client

Page 24: Distributed web based systems

Content Distribution Networks (CDNs)

• The content providers are

the CDN customers.

Content replication

• CDN company installs

hundreds of CDN servers

throughout Internet

– Close to users

• CDN replicates its

customers’ content in

CDN servers. When

provider updates content,

CDN updates servers24

origin server

in North America

CDN distribution node

CDN server

In U.S.A CDN server

in Europe

CDN server

in Asia

Page 25: Distributed web based systems

Content Distribution Networks

• Replicate content on many servers

25The general organization of a CDN as a feedback-control system

Page 26: Distributed web based systems

Web Service

• Web Service: – “software that makes services available on a network using

technologies such as XML and HTTP”

• Service-Oriented Architecture (SOA): – “development of applications from distributed collections of smaller

loosely coupled service providers”

26

Page 27: Distributed web based systems

Web Services Terminology

• SOAP– Simple Object Access Protocol – exchanging XML messages on a network

• WSDL– Web Service Description Language – describing interfaces of Web services

• UDDI – Universal Description, Discovery and Integration– managing registries of Web services

27

Page 28: Distributed web based systems

Web Services Framework

28

Page 29: Distributed web based systems

Why a New Framework?

• CORBA, DCOM, Java/RMI, ... already exist

• XML+HTTP: platform/language neutral, widely accepted and utilized

Web service interoperability

29

Page 30: Distributed web based systems

Servlets/CGI vs. Web Services

Browser

WebServer

HTTP GET/POST

DB

JDBC

WebServer

DB

JDBC

Browser

WebServer

SOAP

GUIClient

SOAPWSDL

WSDL

WSD

LWSD

L30

Page 31: Distributed web based systems

TRADITIONAL WEB-BASED SYSTEMS

Many Web-based systems are still organized as simple client-server architectures.

The core of a Web site: a process that has access to a local file system storing documents.

A client interacts with Web servers through a special application known as browser.

What’s the key function of a browser?

Responsible for displaying documents.

31

Page 32: Distributed web based systems

TRADITIONAL WEB-BASED SYSTEMS

32

Page 33: Distributed web based systems

MULTITIERED ARCHITECTURES

Web documents can be built in two ways:

Static

locates and returns the object identified in the request.

includes predefined HTML pages and JPEG or GIF files.

Web servers do not require communication with any server-side application.

Dynamic

The request is forwarded to an application system where the resulting reply

is generated dynamically. (server-side program execution)

Although Web started as simple two-tiered client-server architecture

for static Web documents, this architecture has been extended to

support advanced type of documents.33

Page 34: Distributed web based systems

MULTITIERED ARCHITECTURES

One of the first enhancements is Common Gateway Interface (CGI): user data comes from an HTML form, specifying the program and parameters.

34

Page 35: Distributed web based systems

MULTITIERED ARCHITECTURES Because of the server-side processing many Web sites are now

organized as three-tiered architectures consisting of a Web server, an application server, and a database server.

Server-side scripting technologies are used to generate dynamic content:Microsoft: Active Server Pages (ASP.NET)Sun: Java Server Pages (JSP)Netscape: JavaScriptFree Software Foundation: PHP

Most popular Web server software– Apache. As of March 2007, 58% of all websites are using

it.35

Page 36: Distributed web based systems

WEB SERVER CLUSTERS

• Web servers are replicated and combined with a front end to improve performance.

36

Page 37: Distributed web based systems

WEB SERVER CLUSTERS

The front end can be designed in two ways:

Transport-layer switch

simply passes data sent along the TCP connection to one of the server’s, depending on some measurement of the server’s load.

Content-aware request distribution

it first inspects the HTTP request and decides which server it should forward that request to.

For example, if the front end always forwards requests for the same document to the same server, the server may cache the document resulting in better response times.

37

Page 38: Distributed web based systems

WEB SERVER CLUSTERS

A scalable content-aware cluster of Web servers.38

Page 39: Distributed web based systems

WEB SERVER CLUSTERS

Another alternative to set up a Web Server Cluster is to use

round-robin DNS

a single domain name is associated with multiple IP addresses.

When resolving a host name, a browser would receive a list of multiple

addresses, each address corresponding a server.

Normally, browsers choose the first address on the list, but most DNS

servers circulate the entries.

As a result, simple distribution of requests over the servers in the

cluster is achieved.

39

Page 40: Distributed web based systems

Web Security Issues• The Web has become the visible interface of the Internet

Many corporations now use the Web for advertising, marketing and sales

• Web servers might be easy to use but

Complicated to configure correctly and difficult to build without security flaws

They can serve as a security hole by which an adversary might be able to access other data and computer systems

Threats Consequences Countermeasures

Integrity Modification of DataTrojan horses

Loss of InformationCompromise of Machine

MACs and Hashes

Confidentiality EavesdroppingTheft of Information

Loss of InformationPrivacy Breach

Encryption

DoS StoppingFilling up Disks and Resources

Stopped Transactions

Authentication ImpersonationData Forgery

Misrepresentation of User Accept false Data

Signatures, MACs

Page 41: Distributed web based systems

Secure the Web

• There are many strategies to securing the web

1. We may attempt to secure the IP Layer of the TCP/IP Stack: This may be accomplished using IPSec, for example.

2. We may leave IP alone and secure on top of TCP: This may be accomplished using the Secure Sockets Layer (SSL) or Transport Layer Security (TLS)

3. We may seek to secure specific applications by using application-specific security solutions: For example, we may use Secure Electronic Transaction (SET)

• The first two provide generic solutions, while the third provides for more specialized services

41

Page 42: Distributed web based systems

Securing the TCP/IP Stack

TCP

IP/IPSEC

HTTP FTP SMTP

TCP

IP

HTTP FTP SMTP

SSL/TLS

TCP

IP

S/MIME PGP

UDP

Kerberos SMTP

SET

HTTP

At the Network LevelAt the Transport Level

At the Application Level

42

Page 43: Distributed web based systems

Secure Sockets Layer (SSL)

• Originally developed (1994) by Netscape in order to secure http

communications

• Slight variation became Transport Layer Security (TLS)

– backward compatible with SSL

• TCP provides a reliable end-to-end service

• Consists of two sublayers:

– SSL Record Protocol (where all the action takes place)

– SSL Management (Handshake/Cipher Change/ Alert Protocols)

43

Page 44: Distributed web based systems

Protocol Structure

Application

SSL

TCP

IP

RecordLayer

TCP

ChangeCipherSpec

HandshakeAlertApplication

Data

44

Page 45: Distributed web based systems

References

Distributed Systems Principles and Paradigms, by Maarten van Steen, VU Amsterdam, [email protected]

Web Service Composition - Current Solutions and Open Problems, by Biplav Srivastava-IBM India Research Laboratory and Jana Koehler-IBM Zurich Research Laboratory

A Reference Architecture for Web Servers, by Ahmed E. Hassan and Richard C. Holt , Software Architecture Group (SWAG), University of Waterloo

An Introduction to Web-based Support Systems, by JingTao Yao, University of Regina

Semantic Annotation for Web Services and their elevance to Environmental Models, by DumitruRoman University of Innsbruck / STI Innsbruck

45