lecture 10, 20-755: the internet, summer 1999 1 20-755: the internet lecture 10: web services iii...

41
Lecture 10, 20-755: The Internet, Summer 1999 1 20-755: The Internet Lecture 10: Web Services III David O’Hallaron School of Computer Science and Department of Electrical and Computer Engineering Carnegie Mellon University Institute for eCommerce, Summer 1999

Post on 22-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Lecture 10, 20-755: The Internet, Summer 1999 1

20-755: The InternetLecture 10: Web Services III

David O’Hallaron

School of Computer Science and

Department of Electrical and Computer Engineering

Carnegie Mellon University

Institute for eCommerce, Summer 1999

Lecture 10, 20-755: The Internet, Summer 1999 2

Today’s lecture

• Anatomy of a simple Web server (40 min)

• Break (10 min)

• Advanced server features (45 min)

Lecture 10, 20-755: The Internet, Summer 1999 3

Anatomy of Tiny: A simple Web server

#!/usr/local/bin/perl5 -w use IO::Socket; # # tiny.pl - The Tiny HTTP server #

Lecture 10, 20-755: The Internet, Summer 1999 4

Tiny: configuration

# # Configuration # $port = 8000; # the port we listen on $htmldir = "./html/"; # the base html directory $cgidir = "./cgi-bin/"; # the base cgi directory $server = "Tiny Web server 1.0"; # server info

Lecture 10, 20-755: The Internet, Summer 1999 5

Tiny: error messages

# # Error messages # # Terse error messages go in the response header %terse_errors = ( "403", "Forbidden", "404", "Not Found", "501", "Not Implemented", ); # Verbose error messages go in the response message body %verbose_errors = ( "403", "You are not allowed to access this item", "404", "Tiny couldn't find the requested item on the server", "501", "Tiny does not support the given request type", );

Lecture 10, 20-755: The Internet, Summer 1999 6

Tiny:Create a listening socket

# # Create a TCP listening socket file descriptor # # LocalPort: list on port $port # Type : use TCP # Resuse : reuse address right away # Listen : buffer at most 10 requests # $listenfd = IO::Socket::INET->new(LocalPort => $port, Type => SOCK_STREAM, Reuse => 1, Listen => 10) or die "Couldn't listen on port $port: $@\n";

Lecture 10, 20-755: The Internet, Summer 1999 7

Tiny:main loop structure

# # Loop forever waiting for HTTP requests # while(1) { # Wait for a connection request from a client $connfd = $listenfd->accept(); # Determine the domain name and IP address of this client # Parse the request line (after stripping the newline) # Parse the URI # Parse the request headers # OPTIONS method # HEAD method # GET method # misc: POST, PUT, DELETE, and TRACE methods}

Lecture 10, 20-755: The Internet, Summer 1999 8

Tiny: error procedure# # error - send an error message back to the client # $_[0]: the error number # $_[1]: the method or URI that caused the error # sub error { local($errno) = $_[0]; local($errmsg) = "$errno $terse_errors{$errno}"; print $connfd <<EndOfMessage; HTTP/1.1 $errmsg Content-type: text/html <HTML> <HEAD><TITLE>$errmsg</TITLE></HEAD> <BODY bgcolor="#ffffff"> <H1>$errmsg</H1> $verbose_errors{$errno}: <PRE> $_[1] </PRE> <HR> The Tiny Web Server </BODY> </HTML> EndOfMessage }

Lecture 10, 20-755: The Internet, Summer 1999 9

Tiny:get client’s name and address

# Determine the domain name and IP address of this client $client_sockaddr = getpeername($connfd); ($client_port, $client_iaddr) = unpack_sockaddr_in($client_sockaddr); $client_port = $client_port; # so -w won't complain $client_name = gethostbyaddr($client_iaddr, AF_INET); ($a1, $a2, $a3, $a4) = unpack('C4', $client_iaddr); print "Opened connection with $client_name ($a1.$a2.$a3.$a4)\n";

Lecture 10, 20-755: The Internet, Summer 1999 10

Tiny:parsing the request line

# Parse the request line (after stripping the newline) chomp($line = <$connfd>); ($method, $uri, $version) = split(/\s+/, $line); print "received $line\n";

Lecture 10, 20-755: The Internet, Summer 1999 11

Tiny:parsing the URI

# # Parse the URI # # Either the URI refers to a CGI program... if ($uri =~ m:^/cgi-bin/:) { $is_static = 0; # extract the program name and its arguments ($filename, $cgiargs) = split(/\?/, $uri); if (!defined($cgiargs)) { $cgiargs = ""; } # replace /cgi-bin with the default cgi directory $filename =~ s:^/cgi-bin/:$cgidir:o; }

Lecture 10, 20-755: The Internet, Summer 1999 12

Tiny:Parsing the URI

# ... or the URI refers to a file else { $is_static = 1; # static content $cgiargs = ""; # replace the first / with the default html directory $filename = $uri; $filename =~ s:^/:$htmldir:o; # use index.html for the default file $filename =~ s:/$:/index.html:; } # debug statements like this will help you a lot print "parsed URI: is_static=$is_static, filename=$filename, cgiargs=$cgiargs\n";

Lecture 10, 20-755: The Internet, Summer 1999 13

Tiny:parsig the request headers

# # Parse the request headers # $content_length = 0; $content_type = "text/html"; while (<$connfd>) { # read request header into $_ # Delete CR and NL chars s/\n|\r//g; # delete CRLF and CR chars from $_ # Determine the length of the message body # search for "Content-Length:" at beginning of string $_ # ignore the case if (/^Content-Length: (\S*)/i) { $content_length = $1; }

Lecture 10, 20-755: The Internet, Summer 1999 14

Tiny:parse the command line (cont)

# determine the type of content (if any) in msg body # search for "Content-Type:" at beginning of string $_ # ignore the case if (/^Content-Type: (\S*)/i) { $content_type = $1; } # If $_ was a blank line, exit the loop if (length == 0) { last; } }

Lecture 10, 20-755: The Internet, Summer 1999 15

Tiny:OPTIONS

# # OPTIONS method # if ($method eq "OPTIONS") { $today = gmtime()." GMT"; $connfd->print("$version 200 OK\n"); $connfd->print("Date: $today\n"); $connfd->print("Server: $server\n"); $connfd->print("Content-length: 0\n"); $connfd->print("Allow: OPTIONS HEAD GET\n"); $connfd->print("\n"); }

Lecture 10, 20-755: The Internet, Summer 1999 16

Tiny:HEAD

# # HEAD method # elsif ($method eq "HEAD") { # we're dissallowing HEAD methods on scripts if (!$is_static) { error(403, $filename); } else { $today = gmtime()." GMT"; head_method($filename, $uri, $today, $server); } }

Lecture 10, 20-755: The Internet, Summer 1999 17

Tiny:HEAD (cont)

# # process the HEAD method on static content # $_[0] : the file to be processed # $_[1] : the uri # $_[2] : today's date # $_[3] : server name # sub head_method { local ($filename) = $_[0]; local ($uri) = $_[1]; local ($today) = $_[2]; local ($server) = $_[3]; local $modified; local $filesize; local $filetype;

Lecture 10, 20-755: The Internet, Summer 1999 18

Tiny:HEAD (cont)

# make sure the requested file exists if (!(-e $filename)) { error(404, $uri); } # make sure the requested is readable elsif (!(-r $filename)) { error(403, $uri); }

Lecture 10, 20-755: The Internet, Summer 1999 19

Tiny: HEAD (cont)

# serve the response header but not the file else { # determine file modifcation date $modified = gmtime((stat($filename))[9])." GMT"; # determine filesize in bytes $filesize = (stat($filename))[7]; # determin filetype (default is text) if ($filename =~ /\.html$/) { $filetype = "text/html"; } elsif ($filename =~ /\.gif$/) { $filetype = "image/gif"; } elsif ($filename =~ /\.jpg$/) { $filetype = "image/jpeg"; } else { $filetype = "text/plain"; }

Lecture 10, 20-755: The Internet, Summer 1999 20

Tiny:HEAD (cont)

# print the response header $connfd->print("HTTP/1.1 200 OK\n"); $connfd->print("Date: $today\n"); $connfd->print("Server: $server\n"); $connfd-> print("Last-modified: $modified\n"); $connfd-> print("Content-length: $filesize\n"); $connfd->print("Content-type: $filetype\n"); print("\n"); # CRLF required by HTTP standard } # end of else} # end of procedure

Lecture 10, 20-755: The Internet, Summer 1999 21

Some Tiny issues

• How would you serve static and dynamic content with GET?

• How would you serve dynamic content with POST?

• How safe are your CGI scripts?– hint: consider the impact of allowing “..” in URIs.

Lecture 10, 20-755: The Internet, Summer 1999 22

Break time!

Fish

Lecture 10, 20-755: The Internet, Summer 1999 23

Today’s lecture

• Anatomy of a simple Web server (40 min)

• Break (10 min)

• Advanced server features (45 min)

Lecture 10, 20-755: The Internet, Summer 1999 24

Cookies

• An HTTP session is a sequence of request and response messages between a client and a server.

• Regular HTTP sessions are stateless– Each request/response pair is independent of the others

• Cookies are a mechanism for creating stateful sessions (RFC 2109)

– Allows servers and CGI scripts to maintain state information (e.g., which items are in a shopping cart) during a session.

• Based on HTTP Set-Cookie (server->client) and Cookie (client->server) headers.

Lecture 10, 20-755: The Internet, Summer 1999 25

Cookies

serverclientrequest 1 Client initiates request

to server.

serverclientresponse 1

(Set-Cookie)

Server includes a Set-Cookieheader in the HTTP response that contains info (the cookie)the identifies the user.

The client stores the cookieon disk.

Lecture 10, 20-755: The Internet, Summer 1999 26

Cookies

serverclientrequest 2(Cookie)

Next time the client sendsa request to the server, itincludes the cookie as aCookie header in the HTTPrequest message.

serverclientresponse 2

(Set-Cookie)

The server incorporates anyrelevant new info fromrequest 2 into the Set-Cookieheader in response 2.

Lecture 10, 20-755: The Internet, Summer 1999 27

Cookie example(from RFC 2109)

• Initially the client has no stored cookies.

• Client -> server– POST /acme/login HTTP/1.1

– [form data]

– user identifies self in form data

• Server -> client– HTTP/1.1 200 OK

– Set-Cookie: Customer=“WILY_COYOTE”; path= “/acme”

– cookie identifies user

– client stores cookie for the next request to this server

Lecture 10, 20-755: The Internet, Summer 1999 28

Cookie example (cont)

• Client -> server– POST /acme/pickitem HTTP/1.1

– Cookie: Customer=“WILY_COYOTE”; $Path = “/acme”

– [form data]

– User selects an item for a “shopping basket”

• Server -> client– HTTP/1.1 200 OK

– Set-Cookie: Part_Number=“Rocket_Launcher_0001” path=“/acme”

– Server remembers that shopping basket contains an item

Lecture 10, 20-755: The Internet, Summer 1999 29

Cookie example (cont)

• Client -> server– POST /acme/shipping HTTP/1.1

– Cookie: Customer=“WILY_COYOTE”; $Path=“/acme” Part_Number=“Rocket_Launcher_0001”; $Path=“/acme”

– [form data]

– user selects a shipping method from form

• Server -> client– HTTP/1.1 200 OK

– Set-Cookie: Shipping=“FedEx”; path=“/acme”

Lecture 10, 20-755: The Internet, Summer 1999 30

Cookie example (cont)

• Client -> server– POST /acme/process HTTP/1.1

– Cookie: Customer=“WILY_COYOTE”; $Path=“/acme”; Part_Number=“Rocket_Launcher_0001”; $Path=“/acme”; Shipping=“FedEx”; $Path=“/acme”

– [form data]

– user chooses to process order

• Server -> client– HTTP/1.1 200 OK

– transaction complete

Lecture 10, 20-755: The Internet, Summer 1999 31

Cookies

• Cookies are groups by the URI pathname in the request headers (in this case /acme)

• The server adds cookies to the client in the response headers.

• The server an implicitly delete cookies by setting an expiration data in the Set-Cookie header (not shown in previous example)

Lecture 10, 20-755: The Internet, Summer 1999 32

Applications and implications of cookies

• Click tracking– can be used to correlate a user’s activity at many

different sites.

– Doubleclick.com pays a web site to place an <img src=> tag on the site’s page.

– Causes an advertising banner and a cookie from Doubleclick.com to be loaded into the client when the site’s page is referenced.

– Firms like Doubleclick maintain a unique id per client machine, but have no way to determine the user’s name or other info unless the user supplies it.

Lecture 10, 20-755: The Internet, Summer 1999 33

Applications of cookies

• Content customization– Cookies can be used to remember user preferences and

customize content to suit those preferences.

– Firms like Doubleclick can record past browsing patterns and target advertising based on the reference pattern and where they are currently browsing.

Lecture 10, 20-755: The Internet, Summer 1999 34

Refer links• User looking at page

www.cs.cmu.edu/~droh/755/foo.html clicks a link to kittyhawk.cmcl.cs.cmu.edu/bar.html

• Browser sends a referer (sic) header to identify the source page of the request

GET /bar.html HTTP/1.1Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*Referer: http://www.cs.cmu.edu/~droh/755/foo.htmlAccept-Language: en-usAccept-Encoding: gzip, deflateUser-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)Host: kittyhawk.cmcl.cs.cmu.edu:8000Connection: Keep-Alive

Lecture 10, 20-755: The Internet, Summer 1999 35

Applications of refer links

• Allows advertisers to gauge the effectiveness of ads they place on other sites.

• Allows the kind of 3rd party referral businesses like BeFree.com.

Lecture 10, 20-755: The Internet, Summer 1999 36

Log filesextissnj1.foo.com - - [14/Jul/1999:20:14:38 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.05 [en] (WinNT; I)"inet-fw1-o.foo.com - - [15/Jul/1999:02:58:10 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.06 [en] (WinNT; U)"internet5.foo.com - - [15/Jul/1999:16:35:59 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.04 [en]C-c32f404p (Win95; I)"tmpce001.foo.com - - [16/Jul/1999:16:04:18 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.06 [en] (Win95; I)"hqinbh2.foo.com - - [22/Jul/1999:16:03:51 -0400] "GET /people/faculty/dohallaron/droh.quake.gif HTTP 1.0" 200 14336 "http://www.ecom.cmu.edu/people/faculty/dohallaron/" "Mozilla/4.6C-CCK-MCD [en] (X\

Lecture 10, 20-755: The Internet, Summer 1999 37

Implications of logs

• Contain a great deal of personal information about the browsing patterns of people inside and outside a site.

• Important issue?– Who has access to logs?

– How is the log information being used?

Lecture 10, 20-755: The Internet, Summer 1999 38

Virtual hosting

• Virtual hosting allows one web server to serve requests for multiple domains.

• Allows ISPs to provide customers with their own “vanity” sites.

– Each eCommerce student has their own virtual Web server running at <andrewid>.student.ecom.cmu.edu.

– e.g., http://zak.student.ecom.cmu.edu

– equivalent to http://euro.ecom.cmu.edu/~zack

Lecture 10, 20-755: The Internet, Summer 1999 39

Virtual hosting:How it works

• Configure DNS so that all virtual hosts have the same IP address

» e.g., each eCommerce student site has the IP address 128.2.218.2 (same as euro.ecom)

» verify this yourself with nslookup

• Server maintains a list of (domain name, directory tree) pairs in a hash.

• Server sets base html and cgi directories according to the target domain name.

Lecture 10, 20-755: The Internet, Summer 1999 40

Virtual hosting

www

cgi-bin html

~zak

www

cgi-bin html

~elenak

www

cgi-bin html

~mansoo

serverRequests to 128.2.218.2

zak.student.ecom.cmu.edu elenak.student.ecom.cmu.edu

Lecture 10, 20-755: The Internet, Summer 1999 41

Server-side includes

• Server mechanism that inserts dynamic or static content directly into an HTML document.

some html<!--#INCLUDE VIRTUAL="message.txt"-->some more html

some html<!--#INCLUDE VIRTUAL=”cgi-bin/printenv.pl"-->some more html