curs tehnologii web

1 - INTRODUCTION

1 - INTRODUCTION1.1 communication protocolsA communication protocol is a set of rules that the end points in a telecom link use when they communicate. A protocol is specified in an industry or international standard. All internet related protocols are defined within the frame of IETF (Internet Engineering Task Force) via a mechanism called RFC (Request For Comments). Each (potential) protocol is defined by such a document. For a comprehensive list of all the RFCs, check the official site www.ietf.org (which present the RFCs in .txt format) or www.cis.ohiostate.edu/cgi-bin/rfc (which presents the RFCs in .html format, with up to date links embedded in the documents).

1.2 the OSI modelOSI stands for Open System Interconnection, an ISO (International Standard Organization) standard for worldwide communications that defines a structured framework for implementing protocols in seven layers. Control is passed from one layer to the next, starting at the application layer at the source node, proceeding to the lower layers, over the links to the next node and back up the hierarchy until the destination node is reached. The structure of the message which is the object of this exchange gets modified along the way, each step down into the layer hierarchy adding a new wrapper around the existing message (usually, consisting of a protocol specific header), while each step up removes the wrapper specific to the layer below. The seven layers in the OSI model are:

Nr. 1

Layer Application

Description Supports application and end user processes. Provides application services for file transfers, e-mail and other network software services. Translates data from application to network format and vice-versa. May also provide compression and encryption services. Sets up, manages and terminates connections between communication partners. It handles session and connection coordination.

Protocol examples DHCP, DNS, FTP, Gopher, HTTP, IMAP4, POP3, SMTP, SNMP, TELNET, TSL (SSL), SOAP APF, ICA, LPP, NCP, NDR, XDR, X.25 PAD

2

Presentation

3

Session

ASP, NetBIOS, PAP, PPTP, RPC, SMPP, SSH, SDP

4

Transport

Provides data transfer between the end DCCP, SCTP, TCP, UDP, points of the communication partners WTLS, WTP, XTP and is responsible for error recovery and flow control.

1

1 - INTRODUCTION5 Network Responsible for source to destination DDP, ICMP, IPSec, IPv4, delivery of packages, including routing IPv6, IPX, RIP through intermediate nodes. Provides quality of service and error control. Transfers data between adjacent network nodes and handles errors occurred at the physical level Translates communication requests from the data link layer into transmissions and receptions of electronic signals at hardware level. ARCnet, ATM, CDP, Ethernet, Frame Relay, HDLC, Token Ring 10BASE-T, DSL, Firewire, GSM, ISDN, SONET/SDH, V.92

6

Data Link

7

Physical

For a detailed description of these layers, check the site: http://www.geocities.com/SiliconValley/Monitor/3131/ne/osimodel.html .

1.3 sockets - basicsA socket is a logical entity which describes the end point(s) of a communication link between two IP entities (entities which implement the Internet Protocol). Sockets are identified by the IP address and the port number. Port numbers range from 0 to 65535 (2^16 1) and are split into 3 categories: 1. well known ports - ranging from 0 to 1023 these ports are under the control of IANA (Internet Assigned Number Authority), a selective list is shown in the table below: Port number 1 5 7 15 20 21 22 23 25 41 42 43 53 57 67 68 BOOTP BOOTP Secure Shell Telnet Simple Mail Transfer Protocol (SMTP) Graphics ARPA Host Name Server Protocol WHOIS Domain Name System (DNS) Mail Transfer Protocol (MTP) WINS Remote Job Entry (RJE) Echo NETSTAT FTP - data FTP control UDP protocol TCP protocol TCPMUX Other

2

1 - INTRODUCTION69 79 80 107 109 110 115 118 123 137 138 139 143 156 161 162 179 194 213 IPX SQL services Network Time Protocol (NTP) NetBIOS Name Service NetBIOS Datagram Service NetBIOS Session Service Internet Message Access Protocol (IMAP) SQL service Simple Network Management Protocol (SNMP) SNMP Trap Border Gateway Protocol (BGP) Internet Relay Chat (IRC) TFTP Finger HTTP Remote Telnet Post Office Protocol 2 (POP2) POP3 Simple FTP (SFTP)

2. registered ports - ranging from 1024 to 49151 registered by ICANN, as a convenience to the community, should be accessible to ordinary users. A selective list of some of these ports is listed below:

Port number 1080 1085 1098 1099 1414 1521 2030 2049 2082 3306 3690 3724 4664

UDP protocol WebObjects RMI activation RMI registry

TCP protocol SOCKS proxy

Other

IBM WebSphere MQ Oracle DB default listener Oracle services for Microsoft Transaction Server Network File System CPanel default MySQL DB system Subversion version control system World of Warcraft online gaming Google Desktop Search

3

1 - INTRODUCTION5050 5190 5432 5500 5800 6000/6001 6881-6887 6891-6900 6901 8080 8086/8087 8501 9043 14567 24444 27010/27015 28910 33434 Battlefield 1942 NetBeans IDE Half-Life, Counter-Strike Nintendo Wi-Fi Connection traceroute Kaspersky AV Control Center Duke Nukem 3D WebSphere Application Server BitTorrent Windows Live Messenger File transfer Windows Live Messenger Voice Apache Tomcat Yahoo Messenger ICQ and AOL IM PostgreSQL DB system VNC remote desktop protocol VNC over HTTP X11

3. dynamic (private) ports, ranging from 49152 to 65535

1.4 posix socketsTo create a client socket, two calls are necessary. The first one creates a file descriptor (fd) which is basically a number which identifies an I/O channel (not different from the file descriptor resulted from a fopen() call which opens a file). The prototype of this call is the following: int socket(int family, int type, int protocol); The family parameter specifies the address family of the socket and may take one of the following values, the list itself depending on the implementation platform: AF_APPLETALK AF_INET most used, indicates an IP version 4 address AF_INET6 - indicates an IP version 6 address AF_IPX AF_KEY AF_LOCAL AF_NETBIOS AF_ROUTE

4

1 - INTRODUCTION AF_TELEPHONY AF_UNSPEC

The type parameter specifies the socket stream type and may take the following values:

SOCK_STREAM SOCK_RAW SOCK_DGRM

The value of the protocol parameter is set to 0, except for raw sockets. The second call connects the client to the server. Here is the signature of the connect() call. int connect(int sock_fd, struct sockaddr * server_addr, int addr_len); To create a server socket, four calls are necessary. Here are the prototypes of these calls: int int int int socket(int family, int type, int protocol); bind(int sock_fd, struct sockaddr * my_addr, int addr_len); listen(int sock_fd, int backlog); accept(int sock_fd, struct sockaddr * client_addr, int * addr_len);

A few remarks. Why not binding the client socket to a particular port, as well? Well, nobody stops us from invoking the bind() function on a client socket, but this is not exactly relevant. While the server port has to be known, because the client must know both the IP address (or the URL, if that is the case) and the port of the server, it is not important to know the port of the client. The assignment of a port to a client socket is done by the operating system, and this solution is quite satisfactory.

5

2 - HTTP

2 - HTTP2.1 what is httpHTTP stands for HyperText Transfer Protocol while hypertext means text contatining links to another text. HTTP was created by by Tim Berners-Lee in 1990 at CERN as a mean to store scientific data. It quickly evolved into the preferred communication protocol over the internet. The first oficial version HTTP 1.0 dates from 05/95 and is the object of RFC 1945 (www.cis.ohio-state.edu/cgi-bin/rfc/rfc1945.html). It is authored by Tim Berners-Lee, Roy Fielding and Henrik Nielsen. The second (and last, so far) version, namely HTTP 1.1, was the object of several RFCs, of which we mention RFC 2068 (01/97), RFC 2616 (06/99), RFC 2617 (06/99) and RFC 2774 (02/00). For a complete specification of the different HTTP versions, check the official HTTP site www.w3.org/Protocols . As a site for understanding how HTTP works, we recommend www.jmarshall.com/easy/http.

2.2 the structure of http transactionsHTTP follows the client server model. The client sends a request message to the server. The server answers with a response message. These messages may have different contents, but they also have some common structural elements, as follows: 1. an initial line 2. zero or more header lines 3. a blank line (CR/LF) 4. an optional message body Header1: value1 ... Headern: valuen

2.3 the initial request lineContains 3 elements, separated by spaces:

a command (method) name (like GET, POST, HEAD, ...) a file specification (path) (the part of the URL after the host name) the HTTP version (usually, HTTP/1.0).

6

2 - HTTPHere is an example of an initial request line: GET /path/to/the/file/index.html HTTP/1.0

2.4 http commands (methods)As of HTTP 1.1, there are 8 HTTP commands (methods) that are widely supported. Here is their list: 1. GET 2. HEAD 3. POST 4. CONNECT 5. DELETE 6. OPTIONS 7. PUT 8. TRACE Three other commands are listed, as well, in the HTTP 1.1 specification, but lack of support makes them obsolete. These commands are: LINK UNLINK PATCH

The HEAD command is identical to the GET command in all respects but one. The only difference is that the response must not have a body. All the information requested is returned in the header section of the response.

2.5 the GET and POST methodsThe GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI. If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process, unless that text happens to be the output of the process. The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. POST is designed to allow a uniform method to cover the following functions: - Annotation of existing resources; - Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles;

7

2 - HTTP- Providing a block of data, such as the result of submitting a form, to a data-handling process; - Extending a database through an append operation. The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI. The posted entity is subordinate to that URI in the same way that a file is subordinate to a directory containing it, a news article is subordinate to a newsgroup to which it is posted, or a record is subordinate to a database. The action performed by the POST method might not result in a resource that can be identified by a URI. In this case, either 200 (OK) or 204 (No Content) is the appropriate response status, depending on whether or not the response includes an entity that describes the result.

2.6 differences between GET and POST1. The method GET is intended for getting (retrieving) data, while POST may involve anything, like storing or updating data, or ordering a product, or sending E-mail 2. When used for form data submission, GET attaches this data to the URL of the request, after the ? character, as a sequence of name=value pairs, separated by the character & or ; On the other side, form data submitted by POST may be encoded either as above (using application/x-www-form-urlencoded content type), or in the message body, (encoded as multipart/form-data). 3. A POST request requires an extra transmission to retrieve the message body, while a GET request allows data sent via the URL to be processed immediately.

2.7 the initial response (status) lineContains 3 elements, separated by spaces (although the reason phrase may contain spaces, as well):

the HTTP version of the response a response status code (a number) a response status reason phrase (a human readable response status)

Here is an example of an initial response line: HTTP/1.0 404 Not Found

2.8 the status codeA three-digit integer, where the first digit identifies the general category of response:

1xx indicates an informational message only 2xx indicates success of some kind

8

2 - HTTP

3xx redirects the client to another URL 4xx indicates an error on the client's part 5xx indicates an error on the server's part 200 OK - the request succeeded, and the resulting resource (e.g. file or script output) is returned in the message body. 404 Not Found - the requested resource doesn't exist. 301 Moved Permanently 302 Moved Temporarily 303 See Other (HTTP 1.1 only) - the resource has moved to another URL (given by the Location: response header), and should be automatically retrieved by the client. This is often used by a CGI script to redirect the browser to an existing file. 500 Server Error - an unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly.

The most common status codes are:

A complete list of status codes is in the HTTP specification (the URL was mentioned in the firs section of this chapter) (section 9 for HTTP 1.0, and section 10 for HTTP 1.1).

2.9 header linesA header line consists of two parts, header name and header value, separated a semicolon. The HTTP 1.0 version specifies 16 headers, none of them mandatory, while the HTTP 1.1 version specifies 46 of them, out of which, one (Host) is mandatory. Although the header names are not case sensitive, header values are. A couple of examples of header lines: User-agent: Mozilla/3.0Gold Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT Header lines which begin with spaces or tabs are parts of the previous header line.

2.10 the message bodyAn HTTP message may have a body of data sent after the header lines. The most common use of the message body is in a response, that is, where the requested resource is returned to the client, or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server. If an HTTP message includes a body, the header lines of the message are used to describe the body. In particular,

the Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/jpg. the Content-Length: header gives the number of bytes in the body.

9

2 - HTTP

2.11 mime types/subtypesMIME stands for Multipurpose Internet Mail Extensions. Each extension consists of a type and a subtype. RFC 1521 (www.cis.ohio-state.edu/cgi-bin/rfc/rfc1521.html) defines 7 types and several subtypes, although the list of admissible subtypes is much longer. Here is the list of the seven types, together with the subtypes defined in this particular RFC. 1. text, with subtype plain 2. multipart, with subtypes mixed, alternative, digest, parallel 3. message, with subtypes rfc822, partial, external-body 4. application, with subtypes octet-stream, postscript 5. image, with subtypes jpeg, gif 6. audio, with subtype basic 7. video, with subtype mpeg

2.12 an example of an http transactionTo retrieve the file at the URL http://web.info.uvt.ro/path/file.html first open a socket to the host web.info.uvt.ro, port 80 (use the default port of 80 because none is specified in the URL). Then, send something like the following through the socket: GET /path/file.html HTTP/1.0 From: [email protected] User-Agent: HTTPTool/1.0 [blank line here] The server should respond with something like the following, sent back through the same socket: HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 Happy birthday! (more file contents) . . . After sending the response, the server closes the socket.

10

3 - HTML

3 - HTML3.1 what is html?HTML stands for HyperText Markup Language. HTML describes how text, images and other components are to be displayed in a browser, using a variety of tags and their related attributes. The first version of HTML, namely HTML 1.0, appeared in summer 1991 and was supported by the first popular web browser, Mosaic. The first official version HTML 2.0 - was approved as a standard in September 1995 (as RFC 1866 (www.cis.ohio-state.edu/cgi-bin/rfc/rfc1866.html) and was widely supported. A newer standard, HTML 3.2 (3.0 was not widely accepted) appeared a W3C recommendation in January 1997. Version 4.0 introduces the Cascading Style Sheets. The newest version of HTML is 4.01. It is a revision of 4.0 and was accepted in December 1997. However, a working draft for a new version, namely HTML 5 was published in June 2008. From 1999 on, HTML is part of a new specification XHTML. The XHTML 1.0 draft was released in 01.99. The latest version (XHTML 2.0) dates from 08.02 and is not intended to be backwards compatible. For a complete specification of the different HTML versions, check the official HTML site www.w3c.org/Markup . As a practical reference site use www.blooberry.com/indexdot/html . Other helpful sites - www.htmlgoodies.com/tutors, www.jmarshall.com/easy/html .

3.2 language definitionHTML is a system for describing documents. It is a special version of SGML (Standard Generalized Markup Language an ISO standard (ISO 8879)). All markup languages defined in SGML are called SGML applications and are characterized by: 1. An SGML declaration what characters and delimiters may appear. The SGML declaration of the latest version of HTML (4.01) can be found at this address: http://www.w3.org/TR/1999/PR-html40-19990824/sgml/sgmldecl.html. Since it fits in a couple of pages, we can afford to have a look at this declaration.

curs tehnologii web

Documents