web technologies. typical web usage 1. user interacts with graphical browser 2. browser submits http...

Web Technologies

Typical Web Usage

1. User interacts with graphical browser

2. Browser submits HTTP requests to server

2.1. Request relayed by proxy

3. Server returns HTTP reply

3.1. Proxy caches response

4. Browser displays HTML results

Components of Web Technology

We are primarily interested in the parts that have implications for (reliable) distributed computing (HTML) XML (URLs) HTTP Proxy Servers

Core Web Technologies

HTML(HyperText Markup Language) Defines a standard set of special textual

indicators(markups) specifying how a Web pages words and images should be displayed by the web browser

Technologies for Supporting Remote Clients

Original intent of core Web Technologies enable linking and sharing documents

It was quickly realized, that by wrapping local information systems to expose their presentation layer by using HTML documents, one could leverage the core Web technologies to have clients that are distributed across the internet.

HTML

HyperText Markup Language Text format for publishing hypertexts on the World Wide Web

Based on Standard Generalized Markup Language (SGML; ISO 8879) (as is XML) Created in 1991, HTML 2.0 in 1994 (60 pages), HTML 4.01 (> 350 pages) in 1997,

now work on XHTML Representation rather than presentation – sort of...

HTML is not XML E.g., <br>: start tag required, end tag forbidden XHTML: HTML in XML

XML Extensible Markup Language

Extensible XML is a framework for defining languages tailored to application domains

Markup XML documents are made up of entities Entity data contains intermingled character data or markup No fixed set of markup tags

An example... Reference

http://www.w3.org/TR/2004/REC-xml-20040204/ <?xml version="1.0" encoding="UTF-8"?><patient id="301174-..."> <name> Klaus Marius Hansen </name> <status> Admitted </status> <medicine> <item> <dose>100</dose> <kind>Aspirin</kind> </item> <item> <dose>50</dose> <kind>Ibuprofen</kind> </item> </medicine></patient>

element (end markup) tag

character data

attribute

element name

XML declaration

XML Well-Formedness and Validity Which patient documents are regarded

as describing patients? The valid ones Have a reference to a document

describing legal documents E.g., using XML Schemas

Fulfil the requirements in these Are well-formed

Well-formed patients... Matches the ”document” production

of the XML spec Including that start and end tags

match and that element tags are properly nested

+ other well-formedness constraints in the spec

<?xml version="1.0" encoding="UTF-8"?><p:patient id="301174-..." xmlns:p="http://ehr.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://ehr.org patient.xsd"> <name> Klaus Marius Hansen </name> <medicine> <item> <dose>100</dose> <kind>Aspirin</kind> </item> <item> <dose>50</dose> <kind>Ibuprofen</kind> </item> </medicine> <status> Admitted </status></p:patient>

Namespace

Location of schema

(Altova XMLSpy syntax)

Patient XML Schema Example

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:p="http://ehr.org" targetNamespace="http://ehr.org">

<xs:element name="patient" type="p:patient_type"/> <xs:complexType name="patient_type"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="medicine"> <xs:complexType> <xs:sequence> <xs:element name="item" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="dose" type="xs:int"/> <xs:element name="kind" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="status"/> </xs:sequence> <xs:attribute name="id" use="required"/> </xs:complexType> </xs:schema>

XML Schema Constructs

Constructs A complex type definition

attribute declarations describe which attributes that may or must appear

element references: describe which sub-elements that may or must appear, how many, and in which order

A simple type definition defines a set of strings to be used as attribute values or character

data A global element declaration

associates element names with types (in the patient example, the complex type definition was inlined in

the patient element declaration) Validity

An element is valid according to a given schema if associated element type rules are satisfied

A document is valid if all its elements are valid

Complex Types

Attribute declarations E.g., <xs:attribute name="id" type="xs:string" use="required"/>

Content of one of the following content model kinds Empty content Simple content

<simpleContent>...</simpleContent> Only character data

Regexp content <sequence> ... </sequence> <choice> ... </choice> <all> ... </all>

e.g., with <element name=item minOccurs=”0" maxOccurs=”unbounded"/>

Namespaces

XML languages are typically assigned to namespaces <xs:schema

xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:p="http://ehr.org" targetNamespace="http://ehr.org">

XML Schema Uses namespaces itself to distinguish XML Schema

constructs from the language being defined Allows namespace assignments of the language being

defined

Other XML Technologies

Namespaces Linking

XLink Addressing parts of documents

XPath Transformation

XSL Querying

XQuery RPC

WSDL and SOAP

XML is Bloated An XML encoding is large

What to do (in particular for RPC using XML)? Undecided what actually to do in W3C...

Compress/decompress XML? Compression is expensive (often more than

decompression) Assign gateway between XML and other

formats? E.g., using XML only for interoperable

messaging May introduce single point-of-failure

Another approach Sacrifice self-description Create mapping to a more efficient format

More efficient to serialize and deserialize and on the wire

Allow applications to choose between formats Such a format could be described by ASN.1

Formal language for describing messages exchanged in a distributed system

Used heavily in telecommunications More next time...

HTTP HyperText Transfer Protocol

RPC-style interface to web servers Messages represented as user-readable ASCII strings

May contain encoded information (e.g., Quoted-Printable or Base64) MIME types in Content-Type: and Accept: headers Typically runs on TCP/IP, port 80 as default

But may use other reliable transports Behavior

HTTP/1.0 (RFC 1945) behavior Open socket, request, response, close socket

HTTP/1.1 (RFC 2616) behavior Persistent connections Good for user Good for network


HTTP(HyperText Transfer Protocol) generic, stateless protocol governs the transfer of files across a network developed at CERN (Central European Research

Network), they also came up with the name WWW, later W3C

supports access to SMTP,FTP and other protocols was designed to support hypertext


Exchanged information, can be static or dynamic Every resource, accessible over the Web has a

URL(Uniform resource locator) HTTP mechanism is based on client/server model

typically using TCP/IP sockets


since Version 1.1 HTTP requires servers to support persistent connections, to minimize overhead associated with opening and closing connections.

Typical methods on the server side are:• OPTIONS

send information about the communication options• GET

retrieve document or document produced by a program• POST

Append or attach information• PUT

Store information• DELETE

Delete the resource indicated in the request


Another limitation HTTP is stateless• Does not provide storing of information between

requests• No indication of any relationship between two different

requests

cookies, small data structures that a web server requests the HTTP client to store on the local machine,

are used to maintain state information

e.g. cookies store recently view items on a web shop

HTTP Messages (1)

HTTP-message = Request | Response ; HTTP/1.1 messages generic-message = start-line

*(message-header CRLF) CRLF [ message-body ] start-line = Request-Line | Status-Line message-header = field-name ":" [ field-value ] Request-Line = Method SP Request-URI SP HTTP-Version CRLF Method = "OPTIONS" ; Section 9.2 | "GET" ; Section 9.3 | "HEAD" ; Section 9.4 | "POST" ; Section 9.5 | "PUT" ; Section 9.6 | "DELETE" ; Section 9.7 | "TRACE" ; Section 9.8 | "CONNECT" ; Section 9.9 | extension-method

’host:’ mandatory

An Example GET Interaction

HTTP Messages (2) HTTP-message = Request | Response ; HTTP/1.1 messages

Response = Status-Line ; Section 6.1 *(( general-header ; Section 4.5 | response-header ; Section 6.2 | entity-header ) CRLF) ; Section 7.1 CRLF [ message-body ] ; Section 7.2

Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

Status-Code = "100" ; Section 10.1.1: Continue | "101" ; Section 10.1.2: Switching Protocols | "200" ; Section 10.2.1: OK | "201" ; Section 10.2.2: Created | "202" ; Section 10.2.3: Accepted ... | "300" ; Section 10.3.1: Multiple Choices | "301" ; Section 10.3.2: Moved Permanently ... | "400" ; Section 10.4.1: Bad Request | "401" ; Section 10.4.2: Unauthorized | "402" ; Section 10.4.3: Payment Required | "403" ; Section 10.4.4: Forbidden | "404" ; Section 10.4.5: Not Found | "405" ; Section 10.4.6: Method Not Allowed ... | "500" ; Section 10.5.1: Internal Server Error | "501" ; Section 10.5.2: Not Implemented ... | extension-code

Informational

Success

Redirection

Client Error

Server Error

HTTP Methods GET

Retrieves the resource identified by the request URI. May encode request parameters in URI

HEAD Identical to GET except that a message-body must not be returned. E.g., for

testing validity, recent modification, accessibility of links POST

Request that server accepts entity enclosed in request as new subordinate of URI, e.g., to annotate of resources, append to a database, ...

PUT Requests for a resource to be stored under the URI

DELETE Removes the resource identified by the request URI

OPTIONS Returns the HTTP methods the server supports

CONNECT Reserved for proxies that can dynamically switch to a tunnel

TRACE Returns the header fields sent with the TRACE request, e.g., for testing

Proxies

An intermediary program which acts as both a client and a server

Caching E.g., GET cacheable E.g., HEAD not cacheable

(well, sort of) The Web is stateless so stale

caches a problem Age: sum of time resident at

caches + time on network Used for reliable cache

expiration But proxy MAY still return

stale resource with warning

More on Reliability and Web TechnologyMostly focussed on security

Authentication and confidentiality Secure Socket Layer (SSL)

Privacy

Will get back to this next time on web services

Web Browsers

One of the first problems web Browsers were originally intended only to display static documents, returned by HTTP calls

Difficult to build sophisticated application specific clients for web browsers

Applets

One answer to this problem Applets Java programs, can be embedded in an

HTML documentWhen the document is downloaded, the program

is executed by the JVM, presented in the browser, turning the browser into a client by sending the client code as an applet

• Limitations download the code• Advantage complexity

CGI(Common Gateway Interface)

Web servers must be able to server up content from dynamic sources How can a Web server respond to a request by

invoking an application that will automatically generate a document to be returned

One of the first approaches to solve this problem, was CGI, a standard mechanism that enables HTTP servers, to interface with external applications, which can serve as „gateways“ to the local information system

CGI

How does CGI work it assigns programs to URLs, so that when the URL is

invoked, the program is executed

CGI programs often serve as an interface between a database and a Web server, allowing users to submit complex queries over the DB through predefined URLs

When the Web server receives request for the URL, it will run a program, that will act as a client of the database and submit the query executing and packs the result into a HTML document returned to remote browser

Servlets

Performance CGI programs involve a certain overhead

Separate process for each instance takes time, requires a context switch in the operating system

Multiple request results – multiple process

To avoid this overhead, Jave servlets can be used instead

The idea is exactly the same as in CGI programs, but the implementation differs.

Servlets

How do they work? Execution and result is the same, but servlets

are invoked directly by embedding servlet-specific information within an HTTP request

run as threads of the Java server process, moreover they run as a part of the Web server

eliminates overhead

Summary

Web technology underlies web services HTTP is the basic transport XML a cornerstone in web services definition

web technologies. typical web usage 1. user interacts with graphical browser 2. browser submits http...

Documents