lecture 10, 20-755: the internet, summer 1999 1 20-755: the internet lecture 10: web services iii...
Post on 22-Dec-2015
214 views
TRANSCRIPT
Lecture 10, 20-755: The Internet, Summer 1999 1
20-755: The InternetLecture 10: Web Services III
David O’Hallaron
School of Computer Science and
Department of Electrical and Computer Engineering
Carnegie Mellon University
Institute for eCommerce, Summer 1999
Lecture 10, 20-755: The Internet, Summer 1999 2
Today’s lecture
• Anatomy of a simple Web server (40 min)
• Break (10 min)
• Advanced server features (45 min)
Lecture 10, 20-755: The Internet, Summer 1999 3
Anatomy of Tiny: A simple Web server
#!/usr/local/bin/perl5 -w use IO::Socket; # # tiny.pl - The Tiny HTTP server #
Lecture 10, 20-755: The Internet, Summer 1999 4
Tiny: configuration
# # Configuration # $port = 8000; # the port we listen on $htmldir = "./html/"; # the base html directory $cgidir = "./cgi-bin/"; # the base cgi directory $server = "Tiny Web server 1.0"; # server info
Lecture 10, 20-755: The Internet, Summer 1999 5
Tiny: error messages
# # Error messages # # Terse error messages go in the response header %terse_errors = ( "403", "Forbidden", "404", "Not Found", "501", "Not Implemented", ); # Verbose error messages go in the response message body %verbose_errors = ( "403", "You are not allowed to access this item", "404", "Tiny couldn't find the requested item on the server", "501", "Tiny does not support the given request type", );
Lecture 10, 20-755: The Internet, Summer 1999 6
Tiny:Create a listening socket
# # Create a TCP listening socket file descriptor # # LocalPort: list on port $port # Type : use TCP # Resuse : reuse address right away # Listen : buffer at most 10 requests # $listenfd = IO::Socket::INET->new(LocalPort => $port, Type => SOCK_STREAM, Reuse => 1, Listen => 10) or die "Couldn't listen on port $port: $@\n";
Lecture 10, 20-755: The Internet, Summer 1999 7
Tiny:main loop structure
# # Loop forever waiting for HTTP requests # while(1) { # Wait for a connection request from a client $connfd = $listenfd->accept(); # Determine the domain name and IP address of this client # Parse the request line (after stripping the newline) # Parse the URI # Parse the request headers # OPTIONS method # HEAD method # GET method # misc: POST, PUT, DELETE, and TRACE methods}
Lecture 10, 20-755: The Internet, Summer 1999 8
Tiny: error procedure# # error - send an error message back to the client # $_[0]: the error number # $_[1]: the method or URI that caused the error # sub error { local($errno) = $_[0]; local($errmsg) = "$errno $terse_errors{$errno}"; print $connfd <<EndOfMessage; HTTP/1.1 $errmsg Content-type: text/html <HTML> <HEAD><TITLE>$errmsg</TITLE></HEAD> <BODY bgcolor="#ffffff"> <H1>$errmsg</H1> $verbose_errors{$errno}: <PRE> $_[1] </PRE> <HR> The Tiny Web Server </BODY> </HTML> EndOfMessage }
Lecture 10, 20-755: The Internet, Summer 1999 9
Tiny:get client’s name and address
# Determine the domain name and IP address of this client $client_sockaddr = getpeername($connfd); ($client_port, $client_iaddr) = unpack_sockaddr_in($client_sockaddr); $client_port = $client_port; # so -w won't complain $client_name = gethostbyaddr($client_iaddr, AF_INET); ($a1, $a2, $a3, $a4) = unpack('C4', $client_iaddr); print "Opened connection with $client_name ($a1.$a2.$a3.$a4)\n";
Lecture 10, 20-755: The Internet, Summer 1999 10
Tiny:parsing the request line
# Parse the request line (after stripping the newline) chomp($line = <$connfd>); ($method, $uri, $version) = split(/\s+/, $line); print "received $line\n";
Lecture 10, 20-755: The Internet, Summer 1999 11
Tiny:parsing the URI
# # Parse the URI # # Either the URI refers to a CGI program... if ($uri =~ m:^/cgi-bin/:) { $is_static = 0; # extract the program name and its arguments ($filename, $cgiargs) = split(/\?/, $uri); if (!defined($cgiargs)) { $cgiargs = ""; } # replace /cgi-bin with the default cgi directory $filename =~ s:^/cgi-bin/:$cgidir:o; }
Lecture 10, 20-755: The Internet, Summer 1999 12
Tiny:Parsing the URI
# ... or the URI refers to a file else { $is_static = 1; # static content $cgiargs = ""; # replace the first / with the default html directory $filename = $uri; $filename =~ s:^/:$htmldir:o; # use index.html for the default file $filename =~ s:/$:/index.html:; } # debug statements like this will help you a lot print "parsed URI: is_static=$is_static, filename=$filename, cgiargs=$cgiargs\n";
Lecture 10, 20-755: The Internet, Summer 1999 13
Tiny:parsig the request headers
# # Parse the request headers # $content_length = 0; $content_type = "text/html"; while (<$connfd>) { # read request header into $_ # Delete CR and NL chars s/\n|\r//g; # delete CRLF and CR chars from $_ # Determine the length of the message body # search for "Content-Length:" at beginning of string $_ # ignore the case if (/^Content-Length: (\S*)/i) { $content_length = $1; }
Lecture 10, 20-755: The Internet, Summer 1999 14
Tiny:parse the command line (cont)
# determine the type of content (if any) in msg body # search for "Content-Type:" at beginning of string $_ # ignore the case if (/^Content-Type: (\S*)/i) { $content_type = $1; } # If $_ was a blank line, exit the loop if (length == 0) { last; } }
Lecture 10, 20-755: The Internet, Summer 1999 15
Tiny:OPTIONS
# # OPTIONS method # if ($method eq "OPTIONS") { $today = gmtime()." GMT"; $connfd->print("$version 200 OK\n"); $connfd->print("Date: $today\n"); $connfd->print("Server: $server\n"); $connfd->print("Content-length: 0\n"); $connfd->print("Allow: OPTIONS HEAD GET\n"); $connfd->print("\n"); }
Lecture 10, 20-755: The Internet, Summer 1999 16
Tiny:HEAD
# # HEAD method # elsif ($method eq "HEAD") { # we're dissallowing HEAD methods on scripts if (!$is_static) { error(403, $filename); } else { $today = gmtime()." GMT"; head_method($filename, $uri, $today, $server); } }
Lecture 10, 20-755: The Internet, Summer 1999 17
Tiny:HEAD (cont)
# # process the HEAD method on static content # $_[0] : the file to be processed # $_[1] : the uri # $_[2] : today's date # $_[3] : server name # sub head_method { local ($filename) = $_[0]; local ($uri) = $_[1]; local ($today) = $_[2]; local ($server) = $_[3]; local $modified; local $filesize; local $filetype;
Lecture 10, 20-755: The Internet, Summer 1999 18
Tiny:HEAD (cont)
# make sure the requested file exists if (!(-e $filename)) { error(404, $uri); } # make sure the requested is readable elsif (!(-r $filename)) { error(403, $uri); }
Lecture 10, 20-755: The Internet, Summer 1999 19
Tiny: HEAD (cont)
# serve the response header but not the file else { # determine file modifcation date $modified = gmtime((stat($filename))[9])." GMT"; # determine filesize in bytes $filesize = (stat($filename))[7]; # determin filetype (default is text) if ($filename =~ /\.html$/) { $filetype = "text/html"; } elsif ($filename =~ /\.gif$/) { $filetype = "image/gif"; } elsif ($filename =~ /\.jpg$/) { $filetype = "image/jpeg"; } else { $filetype = "text/plain"; }
Lecture 10, 20-755: The Internet, Summer 1999 20
Tiny:HEAD (cont)
# print the response header $connfd->print("HTTP/1.1 200 OK\n"); $connfd->print("Date: $today\n"); $connfd->print("Server: $server\n"); $connfd-> print("Last-modified: $modified\n"); $connfd-> print("Content-length: $filesize\n"); $connfd->print("Content-type: $filetype\n"); print("\n"); # CRLF required by HTTP standard } # end of else} # end of procedure
Lecture 10, 20-755: The Internet, Summer 1999 21
Some Tiny issues
• How would you serve static and dynamic content with GET?
• How would you serve dynamic content with POST?
• How safe are your CGI scripts?– hint: consider the impact of allowing “..” in URIs.
Lecture 10, 20-755: The Internet, Summer 1999 23
Today’s lecture
• Anatomy of a simple Web server (40 min)
• Break (10 min)
• Advanced server features (45 min)
Lecture 10, 20-755: The Internet, Summer 1999 24
Cookies
• An HTTP session is a sequence of request and response messages between a client and a server.
• Regular HTTP sessions are stateless– Each request/response pair is independent of the others
• Cookies are a mechanism for creating stateful sessions (RFC 2109)
– Allows servers and CGI scripts to maintain state information (e.g., which items are in a shopping cart) during a session.
• Based on HTTP Set-Cookie (server->client) and Cookie (client->server) headers.
Lecture 10, 20-755: The Internet, Summer 1999 25
Cookies
serverclientrequest 1 Client initiates request
to server.
serverclientresponse 1
(Set-Cookie)
Server includes a Set-Cookieheader in the HTTP response that contains info (the cookie)the identifies the user.
The client stores the cookieon disk.
Lecture 10, 20-755: The Internet, Summer 1999 26
Cookies
serverclientrequest 2(Cookie)
Next time the client sendsa request to the server, itincludes the cookie as aCookie header in the HTTPrequest message.
serverclientresponse 2
(Set-Cookie)
The server incorporates anyrelevant new info fromrequest 2 into the Set-Cookieheader in response 2.
Lecture 10, 20-755: The Internet, Summer 1999 27
Cookie example(from RFC 2109)
• Initially the client has no stored cookies.
• Client -> server– POST /acme/login HTTP/1.1
– [form data]
– user identifies self in form data
• Server -> client– HTTP/1.1 200 OK
– Set-Cookie: Customer=“WILY_COYOTE”; path= “/acme”
– cookie identifies user
– client stores cookie for the next request to this server
Lecture 10, 20-755: The Internet, Summer 1999 28
Cookie example (cont)
• Client -> server– POST /acme/pickitem HTTP/1.1
– Cookie: Customer=“WILY_COYOTE”; $Path = “/acme”
– [form data]
– User selects an item for a “shopping basket”
• Server -> client– HTTP/1.1 200 OK
– Set-Cookie: Part_Number=“Rocket_Launcher_0001” path=“/acme”
– Server remembers that shopping basket contains an item
Lecture 10, 20-755: The Internet, Summer 1999 29
Cookie example (cont)
• Client -> server– POST /acme/shipping HTTP/1.1
– Cookie: Customer=“WILY_COYOTE”; $Path=“/acme” Part_Number=“Rocket_Launcher_0001”; $Path=“/acme”
– [form data]
– user selects a shipping method from form
• Server -> client– HTTP/1.1 200 OK
– Set-Cookie: Shipping=“FedEx”; path=“/acme”
Lecture 10, 20-755: The Internet, Summer 1999 30
Cookie example (cont)
• Client -> server– POST /acme/process HTTP/1.1
– Cookie: Customer=“WILY_COYOTE”; $Path=“/acme”; Part_Number=“Rocket_Launcher_0001”; $Path=“/acme”; Shipping=“FedEx”; $Path=“/acme”
– [form data]
– user chooses to process order
• Server -> client– HTTP/1.1 200 OK
– transaction complete
Lecture 10, 20-755: The Internet, Summer 1999 31
Cookies
• Cookies are groups by the URI pathname in the request headers (in this case /acme)
• The server adds cookies to the client in the response headers.
• The server an implicitly delete cookies by setting an expiration data in the Set-Cookie header (not shown in previous example)
Lecture 10, 20-755: The Internet, Summer 1999 32
Applications and implications of cookies
• Click tracking– can be used to correlate a user’s activity at many
different sites.
– Doubleclick.com pays a web site to place an <img src=> tag on the site’s page.
– Causes an advertising banner and a cookie from Doubleclick.com to be loaded into the client when the site’s page is referenced.
– Firms like Doubleclick maintain a unique id per client machine, but have no way to determine the user’s name or other info unless the user supplies it.
Lecture 10, 20-755: The Internet, Summer 1999 33
Applications of cookies
• Content customization– Cookies can be used to remember user preferences and
customize content to suit those preferences.
– Firms like Doubleclick can record past browsing patterns and target advertising based on the reference pattern and where they are currently browsing.
Lecture 10, 20-755: The Internet, Summer 1999 34
Refer links• User looking at page
www.cs.cmu.edu/~droh/755/foo.html clicks a link to kittyhawk.cmcl.cs.cmu.edu/bar.html
• Browser sends a referer (sic) header to identify the source page of the request
GET /bar.html HTTP/1.1Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*Referer: http://www.cs.cmu.edu/~droh/755/foo.htmlAccept-Language: en-usAccept-Encoding: gzip, deflateUser-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)Host: kittyhawk.cmcl.cs.cmu.edu:8000Connection: Keep-Alive
Lecture 10, 20-755: The Internet, Summer 1999 35
Applications of refer links
• Allows advertisers to gauge the effectiveness of ads they place on other sites.
• Allows the kind of 3rd party referral businesses like BeFree.com.
Lecture 10, 20-755: The Internet, Summer 1999 36
Log filesextissnj1.foo.com - - [14/Jul/1999:20:14:38 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.05 [en] (WinNT; I)"inet-fw1-o.foo.com - - [15/Jul/1999:02:58:10 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.06 [en] (WinNT; U)"internet5.foo.com - - [15/Jul/1999:16:35:59 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.04 [en]C-c32f404p (Win95; I)"tmpce001.foo.com - - [16/Jul/1999:16:04:18 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.06 [en] (Win95; I)"hqinbh2.foo.com - - [22/Jul/1999:16:03:51 -0400] "GET /people/faculty/dohallaron/droh.quake.gif HTTP 1.0" 200 14336 "http://www.ecom.cmu.edu/people/faculty/dohallaron/" "Mozilla/4.6C-CCK-MCD [en] (X\
Lecture 10, 20-755: The Internet, Summer 1999 37
Implications of logs
• Contain a great deal of personal information about the browsing patterns of people inside and outside a site.
• Important issue?– Who has access to logs?
– How is the log information being used?
Lecture 10, 20-755: The Internet, Summer 1999 38
Virtual hosting
• Virtual hosting allows one web server to serve requests for multiple domains.
• Allows ISPs to provide customers with their own “vanity” sites.
– Each eCommerce student has their own virtual Web server running at <andrewid>.student.ecom.cmu.edu.
– e.g., http://zak.student.ecom.cmu.edu
– equivalent to http://euro.ecom.cmu.edu/~zack
Lecture 10, 20-755: The Internet, Summer 1999 39
Virtual hosting:How it works
• Configure DNS so that all virtual hosts have the same IP address
» e.g., each eCommerce student site has the IP address 128.2.218.2 (same as euro.ecom)
» verify this yourself with nslookup
• Server maintains a list of (domain name, directory tree) pairs in a hash.
• Server sets base html and cgi directories according to the target domain name.
Lecture 10, 20-755: The Internet, Summer 1999 40
Virtual hosting
www
cgi-bin html
~zak
www
cgi-bin html
~elenak
www
cgi-bin html
~mansoo
serverRequests to 128.2.218.2
zak.student.ecom.cmu.edu elenak.student.ecom.cmu.edu