lis650 lecture 6 http and apache thomas krichel 2004-03-12

50
LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Upload: anthony-duffy

Post on 27-Mar-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

LIS650 lecture 6http and apache

Thomas Krichel

2004-03-12

Page 2: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

today

• http• semantic web• apache introduction

Page 3: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

http

• Stands for the hypertext transfer protocol. This is the most important application layer protocol on the Internet today, because it provides the foundation for the world wide web.

• defined in Fielding, Roy T., James Gettys, Jeffrey C. Mogul, Paul J. Leach, Tim Berners-Lee ``Hypertext Transfer Protocol -- HTTP/1.1'' (1999), RFC 2616

Page 4: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

history

• 1990: version 0.9 allows for transfer of raw data.• 1996: rfc1945 defines version 1.0. by adding

attribute:value headers.• 1999: rfc 2616

– adds support for

• hierarchical proxies• caching, • virtual hosts and some• Support for persistent connections

– is more stringent.

Page 5: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

http resource identification

• identification of resources is assumed through Uniform Resource Identifiers (URI).

• As far as http is concerned, URIs are string.

• http can use ``absolute'' and ``relative'' URIs.

• A URL is a special case of a URI.

Page 6: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

rfc about http

An application-level protocol for distributed, collaborative, hypermedia information systems.

HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet systems, including those supported by the SMTP, NNTP, FTP, Gopher, and WAIS protocols. In this way, HTTP allows basic hypermedia access to resources available from diverse applications.

Page 7: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

overall operation: client side

Client sends request, required items are– method– request URI– protocol version

• optional items are– request modifiers– client information

Page 8: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

overall operation server side

• Server sends response, required items are– status line– protocol version– success or error code

• optional items are– server information– body

Page 9: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

http assumes transport

• http assumes that there is a reliable way to transport data from one host on the Internet to another one.

• All http requests and responses are separate TCP connections. The default is TCP port 80, but other ports can be used.

Page 10: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Absolute http URL

• the absolute http URL is

http://host[:port][[abs_path][?query]]

• If abs_path is empty, it is /.

• The scheme name "http" and the host name are case-insensitive.

• Characters other than those in the ``reserved'' and ``unsafe'' sets of RFC 2396 are equivalent to their ``%HEX HEX'' encoding.

• optional components are in [ ]

Page 11: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

character sets• A character set is a method used with one of more tables

to convert a sequence of binary digits into a sequence of characters.

• http shares the same registry as the MIME multimedia email extensions. It is based at the IANA, at

http://www.isi.edu/innotes/iana/

assignments/media-types/media-types

• The default character set is ISO-8859-1.

Page 12: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

http messages• There are two types of messages.

– Requests are sent form the client to the server.

– Responses are sent from the server to the client.

• The generic format is the same as for email messages:– start line

– message headers

– empty line

– body

• Empty lines before the start line are ignored.

• The request's start line is called the request-line.

• The response start line is called the status-line.

Page 13: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

The request headers• Accept: Accept-Charset:• Accept-Encoding: Accept-Language:• Authorization: Expect:• From: Host: • If-Match: If-Modified-Since:• If-None-Match: If-Range:• If-Unmodified-Since: Max-Forwards:• Proxy-Authorization: Range:• Referer: TE:• User-Agent:

Page 14: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

The status line

• The status line is a set of lines that are of the form

• HTTP-Version Status-Code Reason-Phrase• The status code is a 3-digit number used by the

computer.• The reason line is a friendly note for a human to

read.

Page 15: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Status code classes

• 1 Informational: Request received, continuing process

• 2 Success: The action was successfully received, understood, and accepted

• 3 Redirection: Further action must be taken in order to complete the request

• 4 Client Error: The request contains bad syntax or cannot be understood

• 5 Server error: The request is valid but can not be executed by the server

Page 16: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Error codes• 100 Continue • 101 Switching Protocols • 200 OK • 201 Created • 202 Accepted • 203 Non-Authoritative Information • 204 No Content • 205 Reset Content • 206 Partial Content

Page 17: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Error codes II

• 300 Multiple Choices • 301 Moved Permanently • 302 Found • 303 See Other • 304 Not Modified • 305 Use Proxy • 307 Temporary Redirect

Page 18: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Error codes III

• 400 Bad Request • 401 Unauthorized • 402 Payment Required • 403 Forbidden • 404 Not Found• 405 Method Not Allowed• 406 Not Acceptable• 407 Proxy Authentication Required• 408 Request Time-out

Page 19: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Error codes IV

• 409 Conflict• 410 Gone• 411 Length Required• 412 Precondition Failed• 413 Request Entity Too Large• 414 Request-URI Too Large• 415 Unsupported Media Type• 416 Requested range not satisfiable• 417 Expectation failed

Page 20: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Error codes V

• 500 Internal Server Error• 501 Not Implemented• 502 Bad Gateway• 503 Service Unavailable• 504 Gateway Time-out• 505 HTTP Version not supported

Page 21: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Response headers• Accept-Ranges:

• Age:

• Etag:

• Location:

• Proxy-Authenticate:

• Retry-After:

• Server:

• Vary:

• WWW-Authenticate:

Page 22: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Entity headers, common to response and request

• Allow:• Content-Encoding:• Content-Language:• Content-Length:• Content-Location:• Content-MD5:• Content-Range:• Content-Type:• Expires:• Last-Modified

Page 23: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

The body

• The entity-body (if any) sent with an HTTP request or response is in a format and encoding defined by the entity-header fields.

• When an entity-body is included with a message, the data type of that body is determined via the header fields Content-Type and Content-Encoding

Page 24: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

GET and HEAD method

• The GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI. If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process.

• The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response.

Page 25: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Conditional & partial GET• The semantics of the GET method change to a

``conditional GET'' if the request message includes an– If-Modified-Since– If-Unmodified-Since– If-Match– If-None-Match– If-Range header

• The semantics of the GET method change to a ``partial GET'' if the request message includes a Range header field. A partial GET requests that only part of the entity be transferred

Page 26: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

The POST method• The POST method is used to request that the

origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. POST is designed to allow a uniform method to cover the following functions:– Annotation of existing resources;– Posting a message to a bulletin board, newsgroup,

mailing list, or similar group of articles;– Providing a block of data, such as the result of

submitting a form, to a data-handling process;– Extending a database through an append operation.

Page 27: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

PUT and DELETE methods

• The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity should be considered as a modified version of the one residing on the origin server.

• The DELETE method requests that the origin server delete the resource identified by the Request-URI.

Page 28: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

example status: redirect

• If you use Apache, you can create a file .htaccess (note the dot!) with a line

redirect 301 old_url new_url• old_url must be a relative path from the top of

your site• new_url can be any URL, even outside your site • This works on wotan by virtue of configuration

set for apache for your home directory. Examples– redirect 301 /~krichel http://openlib.org/home/krichel– redirect 301 Cantcook.jpg http://www.foodtv.com

Page 29: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

The Semantic Web

• The W3C has been developing a new architecture that applies knowledge representation technology to the WWW.

• Using the Resource Description Framework (RDF), Statements are made using a Subject, Predicate and Object (very similar to Lisp and other predicate based languages).

• Each Subject, Predicate or Object are Resources in the URI sense and are identified by URIs within an RDF Statement using XML Namespaces.

Page 30: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

The Semantic Web

• The combination of Web Services and the Semantic Web should give the Web the ability to turn any existing Web Resource into a full node in a purposefully built knowledge representation system with a functional component that allows that knowledge to be acted on.

• And both are based on the simple Uniform Resource Identifier.

Page 31: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

example

• This statement says that the Resource identified by the URI ‘http://openlib.org/home/krichel’ was created by the person ‘Thomas Krichel’:

<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <Description about="http://openlib.org/home/krichel"> <Creator xmlns="http://description.org/schema/">Thomas Krichel</Creator> </Description> </RDF>

Page 32: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Apache

• Is a free, open-source web server that is produced by the Apache Software Foundation, see http://www.apache.org

• It has over 50% of the market share.

• It runs best on UN*X systems but can run an a Mickeysoft OS as well.

• I will cover it here because it is freely available.

• I am covering version 1.3

Page 33: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Apache in debian

• /etc/apache/httpd.conf in set main configuration file.

• /etc/init.d/apache action, where action is one of– start– stop– restart

is used to fire the daemon up or down.• The daemon runs user www-data

Page 34: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Virtual host

• On a single installation of apache serveral web servers can be supported.

• That means the server can behave in a different way according to how it is being addressed.

• The easiest way to implement addressing a server in different was is through DNS host names.

Page 35: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Directives in httpd.conf

• The configuration directives are grouped into three basic sections:– Directives that control the operation of the Apache server process

as a whole (the 'global environment').– Directives that define the parameters of the 'main' or 'default'

server, which responds to requests that aren't handled by a virtual host. These directives also provide default values for the settings of all virtual hosts.

– Settings for virtual hosts, which allow Web requests to be sent to different IP addresses or hostnames and have them handled by

the same Apache server process.

Page 36: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Server type

• On a UN*X machine, the server can either be fired up on its own, or it can be run as part of the overall Internet daemon inetd.

• Usually “standalone” is used.

Page 37: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Server root

• Sets the directory where apache finds its own configuration files.

• If log files names are not given as absolute paths, they will be placen in the server root directory.

Page 38: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Timeout

• This set s the number of seconds that the server waits for the result of a request to be computed before sending a timeout.

• On wotan this is set to 300 seconds, this is rather a long time, the user will have gone for coffee by then.

Page 39: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Listen

• Tells the server which port and ip address to listen to. This can be used to have the server only to respond to requests to a certain IP address or to listen to a non-standard port, i.e. Not port 80

Page 40: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Loadmodule

• To extend apache, modules have written. They have to be loaded explicitly:

• LoadModule module file

• Where module is the name of the module and file is the name of the file that contains the module

• Looking at this gives you vital information about what the server can do.

Page 41: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Server directives• User

– Gives the user name apache runs under

• Group– Gives the group name the server runs under

• ServerAdmin– Email of a human who runs the default server

• ServerName– The name of the default server

• DocumentRoot– The top level directory of the default server

Page 42: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Directory options

• Many options for a directory can be set with• <directory name> instructions<directory>• Name is the name of a directory.• Instructions can be a whole lot of stuff

Page 43: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Directory instructions

• Options sets global options for the directory, it can be – None– All– Or any of

• Indexes (form directory indexes?)• Includes (all server side includes?)• FollowSymlinks (allow to follow server-side includes)• ExecCGI (allow cgi-scripts?)• MultiViews

Page 44: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Access control

• Can be part of <directory> to set directory level access control

• Example– Allow from friendly.com– Deny from evil.com

• Sometimes you have to set the order, example– Order allow, deny

Page 45: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Authentication

• This is used to enable password access. In that case the authentication is handled by a file .htaccess in the directory.

• The AllowOverride instruction is used to state what the user can do within the .htaccess file. Depending on its values, you can password protect a web site.

• We will not discuss this further here.

Page 46: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Userdir

• This sets the directory that is created by the user in her home directory to be accessed by requests to ~user.

• On wotan, we have• UserDir public_html• That is the default, actually.

Page 47: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Set up permission for user home directories

<Directory /home/*/public_html> AllowOverride FileInfo AuthConfig Limit Options +Includes Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec <Limit GET POST OPTIONS PROPFIND> Order allow,deny Allow from all </Limit> <Limit PUT DELETE PATCH PROPPATCH MKCOL COPY MOVE

LOCK UNLOCK> Order deny,allow Deny from all </Limit></Directory>

Page 48: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Logs

• The web server logs every transaction.• The are severeal types of logs that used to be

kept separately, in early days. • 209.73.164.50 - - [26/Jan/2003:09:19:51 -0500]

"GET /~ramon/videos/ntsc175.html

HTTP/1.1" 206 808• Additional information may be kept in the referer

and user agent log. • The referer log may have some interesting

information on who links to your pages.

Page 49: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

Virtual hosts

• Most apache directive can be wrapped in a <virtualhost> </virtualhost> grouping.

• This implies that the only hold for the virtual host. Example, from wotan

<VirtualHost *>

ServerAdmin [email protected]

DocumentRoot /home/connect/public_html

ServerName connections2003.liu.edu

ErrorLog /var/log/apache/connections2003-error.log

CustomLog /var/log/apache/connectios2003-access.log common

</VirtualHost>

Page 50: LIS650 lecture 6 http and apache Thomas Krichel 2004-03-12

http://openlib.org/home/krichel

Thank you for your attention!