1 web servers herng-yow chen. 2 outline survey many different types of software and hardware web...

Post on 26-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Web Servers

Herng-Yow Chen

2

Outline Survey many different types of software an

d hardware web servers. Describe how to write a simple diagnostic

web server in Perl. Explain how web servers process HTTP tra

nsactions, step by step.

3

Different types of web servers General-purpose software web server Web server appliances Embedded web servers

4

Jobs of web servers Implement HTTP and the related TCP

connection handling. Manage the server-slide resource and

provide administrative features to configure, control, and enhance the web service.

5

Jobs of Operating System Manages the hardware details of the underl

ying computer system Provide TCP/IP network support Provide filesystems to hold web resources Provide process management to control co

mputing activities.

6

General-purpose software web server

General-purpose software web servers run on standard, network-enabled computer system.

Open source software (such as Apache or W3C’s Jigsaw).

Commercial software (such as Microsoft’s and iPlanet’s web servers).

Web server software is available for just about every computer and operating systems.

7

General-Purpose Software Web Servers

In September 2004, the Netcaft survey (http://news.netcraft.com/archives/web_server_survey.html)

8

Web server appliances Web server appliances are prepackaged software/hardwa

re solutions. The vendor preinstalls a software server onto a vendor-chosen computer platform and preconfigures the software.

Sun/Cobalt RaQ web appliance(http://www.cobalt.com)

Toshiba Magnia SG10 (http://www.toshiba.com) IBM Whistle web server application (http://www.whistle.com)

Appliance solutions remove the need to install and configuration software and often greatly simplify administration. However, the web server often is less flexible, feature-rich, and the server hardware is not easily upgradable.

9

Embedded web servers Embedded servers are tiny web servers intended

to be embedded into consumer products (e.g., printers or home appliances).

Allow users to administer their consumer devices using a convenient web browser interface. IPic match-head sized web server

(http://www-ccs.cs.umass.edu/~shri/iPic.html) NetMedia SitePlayer SP1 Ethernet web server

(http://www.siteplayer.com)

10

A Minimal Perl Web server Type-o-serve – a minimal Perl web server

used for HTTP debugging http://www.http-guide.com/tools/type-o-serv

e.pl

11

A Minimal Perl Web ServerGET /blah.txt HTTP/1.1

Accept: */*

Accept-language: en-us

Accept-encoding: gzip, deflate

User-agent: Mozilla/4.0

Host: www.csie.ncnu.edu.tw:8080

Connection: Keep-alive

HTTP/1.0 200 OK

Connection: close

Content-type: text/plain

Hi there!

% ./type-o-serve.pl 8080

<<Request From 'www.csie.ncnu.edu.tw'>>

GET /blah.txt HTTP/1.1

Accept: */*

Accept-language: en-us

Accept-encoding: gzip, deflate

User-agent: Mozilla/4.0

Host: www.csie.ncnu.edu.tw:8080

Connection: Keep-alive

<<Type Response followed by '.’>>

HTTP/1.0 200 OK

Connection: close

Content-type: text-plain

Hi there!

HTTP request message

Type-o-serve dialog

HTTP response message

12

What do web servers do?

1. Set up connection

2. Receive request

3. Process request

4. Access resource

5. Construct response

6. Send response

7. Log transaction

13

What Real Web Servers Do

client Network interface

TCP/IP network stack

Operating system

Object Storage

User space

(5)Create response

HTTP server software process(3)Process

request

(1)Set up connection

(4)Access resource(7) Log

transaction

(6)Send response

(2)Receive request

14

Step 1: accepting client connections

Handling new connections Exacting client IP from a new TCP connection

Client hostname identification Using “reverse DNS”

Determining the client user through ident Some web servers support the IETF ident prot

ocol

15

Handling new connection When a client requests a TCP connection to the

web server, the web server establishes the connection and determines which client is on the other side of the connection, extracting the IP address from the TCP connection. (e.g., using getpeername call in UNIX socket)

The server is free to reject and immediately close connections, because the client IP is unauthorized or is known malicious client.

Once a new connection is established and accepted, the server adds the new connection to its list of existing connections and prepares to watch for data on the connection.

16

Client host identification Most web servers can be configured to convert client IP a

ddresses into client hostnames, using “reverse DNS.” The hostname information is used for detailed access con

trol and logging. Note that hostname lookups can take a long time, slowing

down web transactions. Many high-performance web servers either disable hostname resolution or enable it only for particular content.

Ex: Configuring Apache to lookup hostnames for HTML and CGI resourcesHostnameLookups off<Files ~ “\. (html | htm | cgi)$”>

HostanmeLookups on</Files>

17

Determining the client user through ident

The ident protocol let servers find out what username initiated an HTTP connection.

The username information is particularly useful for logging – the 2nd field of the popular Common Log Format contains the ident username of each HTTP request. (RFC931, the updated ident specification is documented by RFC 1413).

If a client supports the ident protocol, the client listens on TCP port 113 for ident requests.

18

Determining the Client User Through ident

Web serverMary

HTTP connection

ident connection

Port 80

Port 80Port

113

Port 4236

4236, 80:USERID:UNIX:MARY

(b)Server establishes ident connection4236, 80

(c)Server sends request

(a) Mary establishes new HTTP connection

(d)Client returns ident response

19

Ident protocol (cont.) Ident can work inside organizations, but it does n

ot work well across public Internet for the following reasons.

Many client PC don’t run the identd identification protocol daemon software.

The ident protocol significantly delays HTTP transactions. Many firewalls won’t permit incoming ident traffic. The ident protocol is insecure and easy to fabricate. The ident protocol doesn’t support virtual IP address well. There are privacy concerns about exporting client usernames.

Enable ident lookup in Apache IdentityCheck on Common Log Format log files typically contain typhens (-) in the 2

nd filed if no ident information is available.

20

Step 2: Receiving request messages As the data arrives on connections, the server

reads out the data and start parsing the request message. Parse the request line looking for the request method,

the specified URI, and the version number. Read the message headers, each ending in CRLF. Detects the end-of-headers blank line, ending in

CRLF. Reads the request body, if any (length specified by

Content-Length header) Internet Representations of Messages

Some web servers also store the request message in internal data structures that make the message easy to manipulate.

21

Receiving Request Messages

Internet

GET /specials/hychen.gif HTTP/1.0CRLF

Accept: image/gifCRLF

Host: www.j

Request message being read from network

serverclient

LF CR LF CR moc.erawdrah-seo

22

Internal Representations of MessageGET /specials/saw-blade.gif HTTP/1.0CRLF

Accept: image/gifCRLF

Host: www.joes-hardware.comCRLF

CRLF

specials/saw-blade.gif

www.joes-hardware.com

Image/gifName:Host

Name:Accept

Value: ●

Value: ●

method: 1

version:1.0

uri: ●

header count: 2

headers: ●

body: -

Parse

23

Different web server architectures

Single-threaded web servers Multi-process and multi-threaded web

servers Multiplexed I/O web servers

Non-blocking network accessing Multiplexed multi-threaded web servers

24

Connection Input/Output Processing Architectures

25

Step 3: Processing requests Once the web server has received a

request, it can process the request using method, resource, headers, and optional body.

Some method (e.g., POST) require entity body data in the request message. A few methods (e.g., GET) forbid entity body data in the request message.

26

Step 4: Mapping and Accessing resources

Docroot Virtually hosted docroots User home directory docroots Directory Listings Dynamic content resource mapping Server-Side Include (SSI) Access Control

27

Docroots Web servers support different kinds of resource mapping, b

ut the simplest form of mapping uses the request URI to name a file in the web server’s filesystem.

Typically, a special folder in the web server filesystem is reserved for web content. The folder is called the document root, or docroot.

The web server takes the URI from the request message and appends it to the document root. The docroot setting in apache servers

DocumentRoot /usr/local/httpd/files

Servers must be careful not to let relative URLs back up out of a document root and expose other parts of the filesystem. E.g., http://www.csie.ncnu.edu.tw/../

28

Docroots

GET /specials/hychen.gif HTTP/1.0

Host: www.csie.ncnu.edu.tw

Internet

client

Object Storage

Web serverRequest URI: /specials/hychen.gif Server resource: /usr/local/httpd/files/specials/hychen.gif

Request message

/usr/local/httpd/filesdocroots

29

Virtually hosted docroots Virtually hosted web servers host multiple

web site on the same web server, giving each site its own distinct document root on the server.

A virtual hosted web server identifies the correct document root to use from the IP or hostname in the Host header.

30

Apache’s virtual host configuration <VirtualHost www.joes-hardware.com>

ServerName www.joes-hardware.com DocumentRoot /docs/joe TransferLog /log/joe.access_log ErrorLog /logs/joe.error_log

</VirtualHost>

<VirtualHost www.marys-hardware.com> ServerName www.marys-hardware.com DocumentRoot /docs/mary TransferLog /log/mary.access_log ErrorLog /logs/mary.error_log

</VirtualHost>

31

Virtually hosted docroots

/docs/joe

/docs/mary

www.joes-hardware.com

www.marys-antiques.com

GET /index.html HTTP/1.0

Host: www.joes-hardware.com

GET /index.html HTTP/1.0

Host: www.marys-antiques.com

Internet

client

Request message A

Request message B

32

User home directory docroots

/home/bob/public_html

www.joes-hardware.com

www.marys-antiques.com

GET /~bob/index.html HTTP/1.0

GET /~betty/index.html HTTP/1.0

Internet

client

Request message A

Request message B

/home/betty/public_html

33

User home directory docroots Another common use of docroots gives people private we

b site on a web server. A typical convention maps URIs whose paths begin with a

slash and tilde (/~) followed by a username to a private document root for that user.

The private docroot is often the folder called public_html inside that user’s home directory, but it can be configured differently (e.g., in the NCNU web server, we use WWW as the user’s private document root.)

In apache’s configuration, UserDir public_html

34

Directory listings A web serer can receive request for directory

URLs, where the path resolves to a directory, not a file.

Most web servers can be configured to take a few different actions when a client requests a directory URL: Return an error. Return a special, default, “index file” instead of the

directory. Scan the directory, and return an HTML page

containing the contents.

35

Directory Listings (continued) Most web servers look for a file named index.htm

l or index.htm inside a directory to represent that directory.

In apache configuration DirectoryIndex index.html index.htm home.html home.

html index.cgi

Disable the automatic generation of directory index files with the apache directive: Option -Indexes

36

Dynamic content resource mapping Web server also can map URIs to dynamic resou

rces – that is, to programs that generate content on demand.

In fact, a whole class of web servers called application servers connect web servers t sophisticated backend applications.

The web server need to be able to tell when a resource is a dynamic resource, where the dynamic content generator program is located, and how to runt he program.

37

Dynamic content … In apache’s configuration

ScriptAlias /cgi-bin/ /usr/lcoal/etc/httpd/cgi-programs/ AddHandler cgi-script .cgi

CGI is an early, simple, and popular interface for executing server-side applications. Modern application servers have more powerful and server-side dynamic content support, including Active Server Pages, java servlets, and PHP.

38

Dynamic Content Resource Mapping

serverclient

Internet

39

Server-Side Includes (SSI) Many web servers also provide support for

server-side includes. If a resource is flagged as containing server-side

includes, the server processes the resource contents before sending them to the client.

The content are scanned for certain special patterns, which can be variable name or embedded scripts. The special patterns are replaced with the values of variables or the output of executable scripts.

This is an easy way to create dynamic content.

40

Access controls Web servers also can assign access controls to

particular resource.

When a request arrives for an access-controlled resource, the web server can control access based on the IP address of the client, or it can issues a password challenge to get access to the resource.

We will see more details in the later lecture, chapter 12 (HTTP authentication).

41

Step 5: Building Responses Once the web server has identified the

resource, it performs the action described in the request method and returns the response message, which contains status code, response header, and a response body.

Response Entities MIME Typing Redirection

42

Response entities If the transaction generated a response

body, the content is sent back with the response message, which usually contains: a Content-Type header, i.e. MIME typing a Content-Length header, describing body size The actual message body content

43

MIME typing The web server is responsible for determining the

MIME type of the response body. There are many ways to configure servers to

associate MIME types with resources: mime.types: extension-based type association Magic typing: content-based association, scanning a known

patterns Explicit typing: force particular files or directory contents to

have a MIME types, regardless of the file extension or contents. Type negotiation: server is configured to store a resource in

multiple document formats. In a client-server negotiation process the server can determine the “best” format to use. (chapter17)

44

MIME Typing

www.csie.ncnu.edu.tw

GET /specials/hychen.gif HTTP/1.1

Host: www.csie.ncnu.edu.tw

HTTP/1.1 200 OK

Content-type: image/gif

Content-length: 8572

client

hychen.gif fileHTTP request message contains the command and the URI

45

Redirection Web servers sometimes return redirection respon

ses (indicated by a 3XX return code) instead of success messages. The Location response header contains a URI for the new or preferred location of the content. Redirections are useful for: Permanently moved resources Temporarily moved resources URL augmentation Load balancing Server affinity Canonicalizing directory names

46

300-399: Redirection Status Code

Status code Reason Phrase300 Multiple Choices

301 Moved Permanently

302 Found

303 See other

304 Not Modified

305 Use Proxy

306 (Unused)

307 Temporary Redirect

47

Step 6: Sending Responses The servers may have many connections to many clients,

some idle, some sending data to the server, and some carrying response data back to the clients.

The servers needs to keep track of connection state and handle persistent connections with special care.

For non-persistent connections, the server is expected to close its side of connection when the entire message is sent.

For persistent connections, the connection may stay open, in which case the server needs to be extra cautious to compute the Content-Length header correctly, or the client will have no way of knowing when a response ends (c.f., Chapter 4).

48

Step 7: Logging Finally, when a transaction is complete, the

web server notes an entry into a log file, describing the transaction performed.

Most web servers provide several configurable forms of logging. (Later lectures, Chapter 21, for details)

49

Reference: Web server http://www.apache.org

The apache web site http://www.w3c.org/Jigsaw

Jigsaw- W3C’s Server http://www.ietf.org/rfc/rfc1413.txt

RFC 1413, “Identification Protocol,” By M. St. Johns.

top related