web technologies. typical web usage 1. user interacts with graphical browser 2. browser submits http...
Post on 21-Dec-2015
214 views
TRANSCRIPT
Web Technologies
Typical Web Usage
1. User interacts with graphical browser
2. Browser submits HTTP requests to server
2.1. Request relayed by proxy
3. Server returns HTTP reply
3.1. Proxy caches response
4. Browser displays HTML results
Components of Web Technology
We are primarily interested in the parts that have implications for (reliable) distributed computing (HTML) XML (URLs) HTTP Proxy Servers
Core Web Technologies
HTML(HyperText Markup Language) Defines a standard set of special textual
indicators(markups) specifying how a Web pages words and images should be displayed by the web browser
Technologies for Supporting Remote Clients
Original intent of core Web Technologies enable linking and sharing documents
It was quickly realized, that by wrapping local information systems to expose their presentation layer by using HTML documents, one could leverage the core Web technologies to have clients that are distributed across the internet.
HTML
HyperText Markup Language Text format for publishing hypertexts on the World Wide Web
Based on Standard Generalized Markup Language (SGML; ISO 8879) (as is XML) Created in 1991, HTML 2.0 in 1994 (60 pages), HTML 4.01 (> 350 pages) in 1997,
now work on XHTML Representation rather than presentation – sort of...
HTML is not XML E.g., <br>: start tag required, end tag forbidden XHTML: HTML in XML
XML Extensible Markup Language
Extensible XML is a framework for defining languages tailored to application domains
Markup XML documents are made up of entities Entity data contains intermingled character data or markup No fixed set of markup tags
An example... Reference
http://www.w3.org/TR/2004/REC-xml-20040204/ <?xml version="1.0" encoding="UTF-8"?><patient id="301174-..."> <name> Klaus Marius Hansen </name> <status> Admitted </status> <medicine> <item> <dose>100</dose> <kind>Aspirin</kind> </item> <item> <dose>50</dose> <kind>Ibuprofen</kind> </item> </medicine></patient>
element (end markup) tag
character data
attribute
element name
XML declaration
XML Well-Formedness and Validity Which patient documents are regarded
as describing patients? The valid ones Have a reference to a document
describing legal documents E.g., using XML Schemas
Fulfil the requirements in these Are well-formed
Well-formed patients... Matches the ”document” production
of the XML spec Including that start and end tags
match and that element tags are properly nested
+ other well-formedness constraints in the spec
<?xml version="1.0" encoding="UTF-8"?><p:patient id="301174-..." xmlns:p="http://ehr.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ehr.org patient.xsd"> <name> Klaus Marius Hansen </name> <medicine> <item> <dose>100</dose> <kind>Aspirin</kind> </item> <item> <dose>50</dose> <kind>Ibuprofen</kind> </item> </medicine> <status> Admitted </status></p:patient>
Namespace
Location of schema
(Altova XMLSpy syntax)
Patient XML Schema Example
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:p="http://ehr.org" targetNamespace="http://ehr.org">
<xs:element name="patient" type="p:patient_type"/> <xs:complexType name="patient_type"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="medicine"> <xs:complexType> <xs:sequence> <xs:element name="item" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="dose" type="xs:int"/> <xs:element name="kind" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="status"/> </xs:sequence> <xs:attribute name="id" use="required"/> </xs:complexType> </xs:schema>
XML Schema Constructs
Constructs A complex type definition
attribute declarations describe which attributes that may or must appear
element references: describe which sub-elements that may or must appear, how many, and in which order
A simple type definition defines a set of strings to be used as attribute values or character
data A global element declaration
associates element names with types (in the patient example, the complex type definition was inlined in
the patient element declaration) Validity
An element is valid according to a given schema if associated element type rules are satisfied
A document is valid if all its elements are valid
Complex Types
Attribute declarations E.g., <xs:attribute name="id" type="xs:string" use="required"/>
Content of one of the following content model kinds Empty content Simple content
<simpleContent>...</simpleContent> Only character data
Regexp content <sequence> ... </sequence> <choice> ... </choice> <all> ... </all>
e.g., with <element name=item minOccurs=”0" maxOccurs=”unbounded"/>
Namespaces
XML languages are typically assigned to namespaces <xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:p="http://ehr.org" targetNamespace="http://ehr.org">
XML Schema Uses namespaces itself to distinguish XML Schema
constructs from the language being defined Allows namespace assignments of the language being
defined
Other XML Technologies
Namespaces Linking
XLink Addressing parts of documents
XPath Transformation
XSL Querying
XQuery RPC
WSDL and SOAP
XML is Bloated An XML encoding is large
What to do (in particular for RPC using XML)? Undecided what actually to do in W3C...
Compress/decompress XML? Compression is expensive (often more than
decompression) Assign gateway between XML and other
formats? E.g., using XML only for interoperable
messaging May introduce single point-of-failure
Another approach Sacrifice self-description Create mapping to a more efficient format
More efficient to serialize and deserialize and on the wire
Allow applications to choose between formats Such a format could be described by ASN.1
Formal language for describing messages exchanged in a distributed system
Used heavily in telecommunications More next time...
HTTP HyperText Transfer Protocol
RPC-style interface to web servers Messages represented as user-readable ASCII strings
May contain encoded information (e.g., Quoted-Printable or Base64) MIME types in Content-Type: and Accept: headers Typically runs on TCP/IP, port 80 as default
But may use other reliable transports Behavior
HTTP/1.0 (RFC 1945) behavior Open socket, request, response, close socket
HTTP/1.1 (RFC 2616) behavior Persistent connections Good for user Good for network
Core Web Technologies
HTTP(HyperText Transfer Protocol) generic, stateless protocol governs the transfer of files across a network developed at CERN (Central European Research
Network), they also came up with the name WWW, later W3C
supports access to SMTP,FTP and other protocols was designed to support hypertext
Core Web Technologies
Exchanged information, can be static or dynamic Every resource, accessible over the Web has a
URL(Uniform resource locator) HTTP mechanism is based on client/server model
typically using TCP/IP sockets
Core Web Technologies
since Version 1.1 HTTP requires servers to support persistent connections, to minimize overhead associated with opening and closing connections.
Typical methods on the server side are:• OPTIONS
send information about the communication options• GET
retrieve document or document produced by a program• POST
Append or attach information• PUT
Store information• DELETE
Delete the resource indicated in the request
Core Web Technologies
Another limitation HTTP is stateless• Does not provide storing of information between
requests• No indication of any relationship between two different
requests
cookies, small data structures that a web server requests the HTTP client to store on the local machine,
are used to maintain state information
e.g. cookies store recently view items on a web shop
HTTP Messages (1)
HTTP-message = Request | Response ; HTTP/1.1 messages generic-message = start-line
*(message-header CRLF) CRLF [ message-body ] start-line = Request-Line | Status-Line message-header = field-name ":" [ field-value ] Request-Line = Method SP Request-URI SP HTTP-Version CRLF Method = "OPTIONS" ; Section 9.2 | "GET" ; Section 9.3 | "HEAD" ; Section 9.4 | "POST" ; Section 9.5 | "PUT" ; Section 9.6 | "DELETE" ; Section 9.7 | "TRACE" ; Section 9.8 | "CONNECT" ; Section 9.9 | extension-method
’host:’ mandatory
An Example GET Interaction
HTTP Messages (2) HTTP-message = Request | Response ; HTTP/1.1 messages
Response = Status-Line ; Section 6.1 *(( general-header ; Section 4.5 | response-header ; Section 6.2 | entity-header ) CRLF) ; Section 7.1 CRLF [ message-body ] ; Section 7.2
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
Status-Code = "100" ; Section 10.1.1: Continue | "101" ; Section 10.1.2: Switching Protocols | "200" ; Section 10.2.1: OK | "201" ; Section 10.2.2: Created | "202" ; Section 10.2.3: Accepted ... | "300" ; Section 10.3.1: Multiple Choices | "301" ; Section 10.3.2: Moved Permanently ... | "400" ; Section 10.4.1: Bad Request | "401" ; Section 10.4.2: Unauthorized | "402" ; Section 10.4.3: Payment Required | "403" ; Section 10.4.4: Forbidden | "404" ; Section 10.4.5: Not Found | "405" ; Section 10.4.6: Method Not Allowed ... | "500" ; Section 10.5.1: Internal Server Error | "501" ; Section 10.5.2: Not Implemented ... | extension-code
Informational
Success
Redirection
Client Error
Server Error
HTTP Methods GET
Retrieves the resource identified by the request URI. May encode request parameters in URI
HEAD Identical to GET except that a message-body must not be returned. E.g., for
testing validity, recent modification, accessibility of links POST
Request that server accepts entity enclosed in request as new subordinate of URI, e.g., to annotate of resources, append to a database, ...
PUT Requests for a resource to be stored under the URI
DELETE Removes the resource identified by the request URI
OPTIONS Returns the HTTP methods the server supports
CONNECT Reserved for proxies that can dynamically switch to a tunnel
TRACE Returns the header fields sent with the TRACE request, e.g., for testing
Proxies
An intermediary program which acts as both a client and a server
Caching E.g., GET cacheable E.g., HEAD not cacheable
(well, sort of) The Web is stateless so stale
caches a problem Age: sum of time resident at
caches + time on network Used for reliable cache
expiration But proxy MAY still return
stale resource with warning
More on Reliability and Web TechnologyMostly focussed on security
Authentication and confidentiality Secure Socket Layer (SSL)
Privacy
Will get back to this next time on web services
Web Browsers
One of the first problems web Browsers were originally intended only to display static documents, returned by HTTP calls
Difficult to build sophisticated application specific clients for web browsers
Applets
One answer to this problem Applets Java programs, can be embedded in an
HTML documentWhen the document is downloaded, the program
is executed by the JVM, presented in the browser, turning the browser into a client by sending the client code as an applet
• Limitations download the code• Advantage complexity
CGI(Common Gateway Interface)
Web servers must be able to server up content from dynamic sources How can a Web server respond to a request by
invoking an application that will automatically generate a document to be returned
One of the first approaches to solve this problem, was CGI, a standard mechanism that enables HTTP servers, to interface with external applications, which can serve as „gateways“ to the local information system
CGI
How does CGI work it assigns programs to URLs, so that when the URL is
invoked, the program is executed
CGI programs often serve as an interface between a database and a Web server, allowing users to submit complex queries over the DB through predefined URLs
When the Web server receives request for the URL, it will run a program, that will act as a client of the database and submit the query executing and packs the result into a HTML document returned to remote browser
Servlets
Performance CGI programs involve a certain overhead
Separate process for each instance takes time, requires a context switch in the operating system
Multiple request results – multiple process
To avoid this overhead, Jave servlets can be used instead
The idea is exactly the same as in CGI programs, but the implementation differs.
Servlets
How do they work? Execution and result is the same, but servlets
are invoked directly by embedding servlet-specific information within an HTTP request
run as threads of the Java server process, moreover they run as a part of the Web server
eliminates overhead
Summary
Web technology underlies web services HTTP is the basic transport XML a cornerstone in web services definition