web server management: running apache on red … server management: running apache on red hat linux...

61
Web Server Management: Running Apache on Red Hat Linux Table of Contents Prerequisites ....................................................................... 2 Other programs .................................................................... 5 Installation ........................................................................ 9 Linux system configuration ......................................................... 13 General configuration .............................................................. 17 Web pages’ MIME types ........................................................... 23 Access logs ....................................................................... 25 Log rotation ...................................................................... 29 Aliases ........................................................................... 31 Handling directories ............................................................... 33 Automatic indexing ............................................... 34 Default directory index files ........................................ 39 Writing HTTP rather than HTML ................................................... 40 Users’ own web pages ............................................................. 44 Delegating the controls for certain pages ............................................. 45 Access control by client IP address .................................................. 49 Access control by user authentication ................................................ 54 Virtual hosts ...................................................................... 60 1

Upload: nguyenlien

Post on 28-May-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

Web Server Management: RunningApache on Red Hat Linux

Table of Contents

Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Other programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Linux system configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13General configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Web pages’ MIME types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Access logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Log rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Handling directories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Automatic indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Default directory index files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Writing HTTP rather than HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Users’ own web pages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Delegating the controls for certain pages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Access control by client IP address. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Access control by user authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Virtual hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

1

Prerequisites

Figure 1. Prerequisites

• Network

• Hardware

• Software

• Wetware (people!)

Figure 2. Prerequisites: Network

• Permanent and direct IP access

• Vulnerable periods?

• Support?

• 24hrs/day, 365 days/year?

• Holiday/Illness cover?

Figure 3. Prerequisites: Hardware

• Macs, PCs, Suns, ...

• Hardware support? (24x7?)

• Backups?

• Disc space

• Network speed

• Memory

• Processor power

2

Figure 4. Prerequisites: Software

• Permanently running daemon

• Software support?

• Service rates?

• DNS lookup rates?

• CGI?

Figure 5. Prerequisites: Wetware

• Checking logfiles

• Changing configuration files

• Software updates & patches

• Data files

• Backups

• Holiday/Illness cover

To run a webserviceyou need four things: a connection to the outside world (network), a machine torun the service from (hardware), a program to run it with (software) and people to maintain both theserver and the data it serves (‘wetware’).

Your network needs to be a permanent connection and your server needs to have a constant IP address.You neeed to know what the support is for your network, who to contact in case of problems, when thevulnerable periods are, etc. (The CUDN has Monday to Saturday 0800–0930,1700–1900 and Sunday0000 to Monday 0800 for its vulnerable periods.)

Your machine must have the power to support the number of hits the server will get. Note that itmust be powerful to cope with thepeakdemand and not just the mean or modal demand. The mostimportant element of your hardware for a web server is the network card; buy a good one. Next mostimportant is the amount of RAM. The CPU comes last in the list. Unless you are planning to run verycomputationally expensive CGI programs you don’t need the latest, greatest, fastest chip in the world.

Clearly you neeed a good web server program. The program described in this course is Apache. It hasall the facilities you could want (and then some), is free. (and that’s ‘free’ as in speech as well as inbeer. It is also the most widely used web server on the Internet, being used by >65% of active sites.(Source: Netcraft Web Server Survey [http://www.netcraft.com/survey/], February 2002.)

3

Finally, it is important to realise that to provide a web service rather than just a server you needpeople.Pages get dated, users’ needs change, links pointing out of your site go stale, dump tapes need to bechanged and error reports need to be addressed.

4

Other programs

Figure 6. Support tools

• Editors

• HTML checkers

• Graphics manipulators

• Scanners etc.

• Log file analyser

• CGI programs

Figure 7. Support tools: Text editors

• Plain text editor

• Configuration files

• HTML data files

• emacs, vi, pico

Figure 8. Deprecated support tools: HTML editors

• There exist specialist HTML editors

• Inflexible & incomplete

• Poor quality HTML

• Plain text editors still pretty good

• Avoid MS Word like the plague

5

Figure 9. Support tools: HTML checkers

• Check HTML syntax

• Check HTML quality

• Check links still work

• weblint

• cron job

Figure 10. Support tools: Graphics manipulators

• Best all-rounder isgimp—the GNU Image Manipulation Program

• Also ee—Electric Eyes

• Bother available as Red Hat packages.

Figure 11. Support tools: Scanners etc.

• Flat bed scanners

• Digital cameras

A web server serves out web pages. However, to populate the web site the pages need to be writtenand checked and log files may need to be analysed.

Firstly you will need to write the web pages, or possibly edit those submitted by others. The authorstill regards a plain text editor (emacs, vi or perhaps evenpico -w) as the best tool for editing webpages. Contrary to popular belief, the dedicated web authoring tools are still not very good. Of thevarious authoring packages by far the worst is Microsoft Word’s ‘save as HTML’ feature. The qualityof HTML generated by this is appalling and it should be avoided like the plague.

The HTML in the page still needs to be checked for syntax, link integrity and accessibility. This istrue whether or not a dedicated HTML authoring package was used; indeed, if one was used then a‘second opinion’ is all the more important. The text itself should also be checked for spelling andgrammar, but beware the rather over-simplistic grammar rules in some word processors.

6

In addition to the text there are all the other media formats, with static graphics the most common.The GNU Image Manipulation Program (the GIMP),gimp, is a massively powerful GUI imagemanipulator that starts at the level of Adobe Photoshop but takes things much further, including havinga scripting language. For simple viewing, croping, rescaling and format conversion the Electric Eyesprogram,ee, is considerably simpler to learn and use. Images can be initially created interactively(e.g. with the GIMP), with a digital camera or by scanning in photographs.

Figure 12. Support tools: CGI programs

• Common Gateway Interface

• Not covered in this course

• SSI

• SSIexec

• PHP

• perl CGI module

• python CGI module

The other support you may need is for CGI programs. First you need to make a decision: are yougoing to permit the running of programs on the server? While this course is not going to reviewthe technologies in any depth it should give some idea of the spectrum available and the dangersasssociated with them.

The author is aware ofonly onevulnerability of a system through the web server itself. He is, however,aware ofmanybreak-ins via the CGI programs run by web servers. Static pages are vastly more secure.

The simplest of this style of program is the ‘server side include’ facility. This allows you to addcertain tags to a web page which are not valid HTML but which are transformed by the server intovalid HTML with dynamic content. A common example is the SSI tag that says when the page waslast modified. However, consider whether you want the ‘last updated’ tag to be the last time you fixedyour spelling (automatic version) or the last time you changed the content (manual version). Slightlybeyond this is the ‘server side include executable’ where the tag runs an external program to generatethe content. It is at this point, where an extra program is run by the web server, that you need to startbeing very careful about security. (It is possible to turn off the SSI executable feature while retainingthe weaker SSI functionality.)

PHP takes this one stage further, offering a scripting language embedded in the HTML to providepowerful functionality and logic. The Perl and Python CGI modules take the page author away fromHTML all together. The CGI modules are presented with a URL (and some input data for POSTqueries) and have to write their own HTTP as well as HTML, in the format described for the ‘as is’pages in the section called “Writing HTTP rather than HTML”. The modules provide simple functioncalls for most of this though.

7

Figure 13. Support tools: Secure access

• ssh: Replacement forrsh, rlogin , rcp

• Maching daemon:sshd

• Red Hat package

• Unix Support’s CD

Finally, unless you plan to work exclusively at the console (you don’t) you will need secure networkaccess to your server. Don’t usetelnet or the ‘r -commands’ (rlogin , rsh, rcp and rsync) but theirsecure analogues provided by the ‘ssh’ suite of programs.

Red Hat Linux version 7.0 and above ship with an SSH system. Also, Unix Support provides a CD[http://www-uxsup.csx.cam.ac.uk/CD/] withsshclients for most platforms including a Red Hat Linuxpackaging of the software suite for the Intel platform. The CD is free from the CS Reception.

8

Installation

Figure 14. Example server

• 3Com 3c905B, 700MHz Athlon, 256MB RAM, 20GB disc

• Red Hat Linux 7.3

• Apache v1.3.23

The example server we are going to use for this course is a 700MHz Athlon with 256MB of RAM a1GB disc and a 3Com 3c905B card. This is adequate for a production server. If it was very heavilyused I would increase the disc size. The RAM and the CPU are perfectly adequate.

We will be running Red Hat Linux 7.3. Typically we would not be running X on the web server but wewill for this example because we will be our own client too. We will run with Apache 1.3.23 which isthe version shipped with Red Hat Linux 7.3.

Figure 15. Apache installation

• As root

• Unix Support’s NFS server

• Mount Red Hat mirror

• Locate Apache package

• Install Apache package

• Unmount Red Hat mirror

Figure 16. Apache installation: Mounting the mirror

• Unix Support mirror:nfs-uxsup.csx.cam.ac.uk

• Red Hat mirror:/linux/redhat

# mount -o ro nfs-uxsup.csx.cam.ac.uk:/linux/redhat /mnt# cd /mnt/updates/7.3/en/os/i386/# ls -l apache-*-rw-r--r-- ... apache-1.3.23-14.i386.rpm-rw-r--r-- ... apache-devel-1.3.23-14.i386.rpm-rw-r--r-- ... apache-manual-1.3.23-14.i386.rpm

9

Figure 17. Apache installation: Examining the package

# rpm --query --info --package apache-1.3.23-14.i386.rpmName : apache Relocations: (not relocateable)Version : 1.3.23 Vendor: Red Hat, Inc.Release : 14 Build Date: Wed 19 Jun 2002 16:55:48

Install date: (not in-stalled) Build Host: daffy.perf.redhat.comGroup : System Environment/Daemons Source RPM: apache-1.3.23-14.src.rpmSize : 1248999 License: Apache Soft-ware LicensePackager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>Summary : The most widely used Web server on the Internet.Description :Apache is a powerful, full-featured, efficient, and freely-availableWeb server. Apache is also the most popular Web server on theInternet.

Figure 18. Apache installation: Examining the package

# rpm --query --list --package apache-1.3.23-14.i386.rpm/etc/httpd/conf/etc/httpd/conf/httpd.conf

.../etc/rc.d/init.d/httpd.init

.../var/www/var/www/html/var/www/html/index.html/var/www/icons/var/www/icons/a.gif

.../usr/man/man8/httpd.8

.../usr/sbin/httpd

...

10

Figure 19. Apache installation: Installing the package

# rpm --install apache-1.3.23-14.i386.rpm# cd# umount /mnt

• This has not started the server.

• Please remember to unmount the mirror.

We install Apache asroot and then configure it so thatroot will not be needed subsequently forthe configuration or administration of the server except to shut it down or restart it.

We use the network file system (NFS) to mount Unix Support’s mirror of the Red Hat distribution.Within it (/mnt/updates/7.3/en/os/i386/ ) are all the software packages, including Apache(apache-1.3.12-2.i386.rpm ).

We examine the Apache package for information and a listing of its contents and finally we install it.Once we’ve done the installation we unmount the file server.

This installation has not started the server but has arranged that it will be started on the next reboot.(Though we don’t need to and won’t reboot just to start it.)

Figure 20. Apache installation: Configuration file layout

+--- conf/ ---+--- *.conf| +--- access.log

/etc/httpd/ ---+--- logs -> /var/log/httpd/ ---+| +--- error.log+--- modules -> /usr/lib/apache

Figure 21. Apache installation: Data file layout

+--- cgi-bin/ empty|

/var/www/---+--- icons/ --- *.gif|+--- html/ --- index.html default

11

Figure 22. Apache installation: System file layout

• /usr/sbin : Binaries

• /usr/man : Manual pages

• /etc/rc.d : Startup/Shutdown scripts

• /etc/logrotate.d : Log rotation

The files installed come in three classes: the configuration files (/etc/httpd/ ,/etc/logrotate.d/apache ) that the server managers need access to, the data files(/var/www/ ) that the web page authors and editors need access to and the system files (everythingelse) that we aren’t going to touch. We will define groups to keep these categories apart.

If you were prepared to do all the updates to the web pages asroot and had no special require-ments such as access controls then you could just run the program now. The website exists under/var/www/html/ and I wish you much happiness with rogether. However a small amount of work(typically about 15 minutes) will make everything a lot easier and safer.

12

Linux system configuration

Figure 23. Configuring the operating system

• Package provides a user and group for the daemon

• We need to add a group for the apache administrators

• And at least one group for the web authors

• Avoid use ofroot

• Log rotation

Figure 24. Configuring the O/S: User & groups

# groupadd -r webadmins# groupadd -r webeditor# vi /etc/group

Figure 25. Configuring the O/S: File permissions as installed

# ls -ld /var/www /etc/httpd /var/log/httpddrwxr-xr-x 3 root root 1024 Jun 27 12:09 /etc/httpddrwxr-xr-x 5 root root 1024 Jun 27 12:09 /var/wwwdrwxr-xr-x 2 root root 1024 Jun 27 16:36 /var/log/httpd

• Only root can make modifications.

13

Figure 26. Configuring the O/S: File permissions

• Change the group towebadmins :

# chgrp -R webadmins /etc/httpd /var/log/httpd /etc/logrotate.d/apache# chgrp -R webeditor /var/www

• Let the group write to the directories:

# chmod -R g+w /var/www /etc/httpd /var/log/httpd /etc/logrotate.d/apache

• Make the group ownership ‘setgid’:

# find /var/www /etc/httpd /var/log/httpd -type d -exec chmod g+s {} \;

Figure 27. Configuring the O/S: File permissions—as changed

# ls -ld /var/www /etc/httpd /var/log/httpd /etc/logrotate.d/apachedrwxrwsr-x 3 root webadmins 1024 Jun 27 12:09 /etc/httpd-rw-rw-r-- 1 root webadmins 172 Jun 27 12:09 /etc/logrotate.d/apachedrwxrwsr-x 5 root webeditor 1024 Jun 27 12:09 /var/wwwdrwxrwsr-x 2 root webadmins 1024 Jun 27 12:09 /var/log/httpd

• The daemon will run as userapache .

• How can the daemon write its log files?

• It startslife and opens the log files as userroot .

14

Figure 28. Being a webadmin

• A fresh login will pick up membership of groupwebadmins .

• This gives access to existingwebadmins -writable files.

• Files created insetgiddirectories will be owned by groupwebadmins

• Check your permissions mask

We create system groups for the administration of the server and the management of the web pages.On a small server these can be combined. On large servers you may well want multiple groups tomanage various subsets of the web pages. We specifically want to avoid requiringroot access toreconfigure the web server.root will be used to start or stop the server andnothing else.

We set the permissions on the data and configuration directories so that members of the relevant groupcan make changes (g+w on files and directories) and any files or subdirectories created will havematching group ownership (g+s on directories).

The chmod (change the mode (permissions) of a file system object) andchgrp (change the groupownership of a file system object) commands (and thechown command which changes the userownership of a file system object, though we aren’t using that command here) have a-R option tomake them behave recursively. Every file system object beneath the named directories will have theirmode or group modified.

The find command is slightly trickier. We want to apply theg+s mode change to everydirectorybeneath the named directories but we don’t want to apply it to the files. Thefind command shownstarts at each of the three directories listed and checks each file system element beneath them, testingto see if the element in question is a directory (‘-type d ’). If it is then it executes a command (‘-exec ... \; ’) and that command is ‘chmod g+s dir ’. (‘ {} ’ is replaced by the name of thefile system element being considered.)

Figure 29. Starting the server

# /etc/rc.d/init.d/httpd startStarting httpd: [ OK ]

While we’re here, we shall describe the manual stopping of the server, which we will hardly ever need,and the manual restarting of the server which we will use frequently in this course to bring in a newconfiguration file. Restarting is just stopping and starting wrapped into a single command.

15

Figure 30. Restarting or stopping the server

# /etc/rc.d/init.d/httpd restartShutting down http: [ OK ]Starting httpd: [ OK ]

# /etc/rc.d/init.d/httpd stopShutting down http: [ OK ]

16

General configuration

Figure 31. Configuring the service

• As a webadmin,notasroot !

• Directory: /etc/httpd/conf/

• Directory and contents are group-writable bywebadmins

• httpd.conf : Configuration file

• srm.conf & access.conf : Obsolete & empty

• Directory: /etc/logrotate.d/

• apache : Controls the rotation of the log files.

• File is writable by members of groupwebadmins .

Red Hat’s packaging of Apache’s configuration files echoes an obsolete format of having threedistinct configuration files in the/etc/httpd/conf/ directory. In this course we will put allour configuration in the single file:/etc/httpd/conf/httpd.conf and we will write this filefrom scratch to better learn what it all means.

The only other configuration file we will need is the log rotation file in/etc/logrotate.d/apache .We need to change this only if we change either the log files being kept or the duration they are keptfor. These two reasons take on extra significance given the lunacy of the 1998 Data Protection Act.The client machine names or addresses that appear in logs and your record of what they have fetchedmay constitute personal data. We will return to this file in the section called “Log rotation”

Figure 32. httpd.conf: Running the daemon

ServerType standaloneServerRoot /etc/httpdDocumentRoot /var/www/htmlPort 80User apacheGroup apacheServerAdmin [email protected] www. inst .cam.ac.ukErrorLog /var/log/httpd/error_logLogLevel infoOptions None

17

Figure 33. Syntax: Running the daemon

• ServerType standaloneThe daemon will not rely oninetd to launch it on demand but will run permanently.

• ServerRoot /etc/httpdAny files refered to in this configuration file will either be fully qualified or resolved relative tothis directory.

• DocumentRoot /var/www/htmlThe documents to be served are found in this directory.

• Port 80This is the standard port of WWW services. It isprivilegedon a Unix system so must be opened byroot . Once opened, the port can be passed to unprivileged services (e.g. running userapache ).Ports 8000 and 8080 are commonly used ports for completely unprivileged servers.

• User apacheGroup apache

We created a user and group specifically for the webserver. These two lines tell the server to usethem. The server can change its user and group ids only if it is started asroot .

• ServerAdmin [email protected] error messages displayed to the client can contain a contact email address. This is where itis defined.

• ServerName www. inst .cam.ac.ukYou may not need this line. If your machine’s real name isboring. inst .cam.ac.uk butthere is a DNS record pointingwww.inst .cam.ac.uk to it as well then you want the server toidentify itself aswww.inst .cam.ac.uk . This is how you override the machine’s host name.

• ErrorLog /var/log/httpd/error_logAny error messages will be logged to the file/var/log/httpd/error_log .

• LogLevel infoAn error in Apache comes with a severity rating. This directive specifies what the minimum levelto log is.

• Options NoneApache has various options, almost all of which default to ‘on’. We will turn them off so we areforced to meet them explicitly in this course.

18

Figure 34. Syntax: Suboptions to LogLevel

• emergEmergencies—system is unusable. e.g ‘Child cannot open lock file. Exiting. ’

• alertAlert—Action must be taken immediately. e.g ‘getpwuid: couldn’t determineuser name from uid. ’

• critCritical condition—Any different from alert ? e.g ‘socket: Failed to get asocket, exiting child ’

• errorError condition—effects a single transfer, not the system as a whole. e.g ‘Premature end ofscript headers ’

• warnWarning e.g ‘child process 1234 did not exit, sending another SIGHUP ’

• noticeNotice—Normal but significant condition. e.g ‘caught SIGTERM, shutting down ’

• infoInformational messages e.g ‘Server seems busy, (you may need to increaseStartServers, or Min/Max SpareServers). ’

• debugDebugging messages e.g ‘Opening config file /etc/httpd/conf/httpd.conf ’

19

Figure 35. Pool of daemons

• Single initially launched daemon.

• Runs asroot

• Answers no requests

• Maintains a “pool” of child daemons

• Pool of child daemons that do the real work.

• These do the real work

• Run as userapache

• Answer a certain number of requests and then die

• Parameters for experts only!

Figure 36. httpd.conf: Parameters for daemon pool

PidFile /var/run/httpd.pidLockFile /var/lock/httpd.lockScoreBoardFile /var/run/httpd.scoreboardTimeout 300KeepAlive OnMaxKeepAliveRequests 100KeepAliveTimeout 15MinSpareServers 5MaxSpareServers 20StartServers 8MaxClients 150MaxRequestsPerChild 100

20

Figure 37. Apache’s functionality

• Our server has very little functionality.

• It serves all documents as ‘text/plain ’.

• It can only log errors.

• We can add functionality as we need it.

• ‘Modules’

We can run a web server with just the configuration lines we have met so far. It will be not verygood, to say the least. Its principal failing is that it has no concept of the MIME content types ofthe objects it serves and dishes everything up as MIME content type ‘text/plain ’. If we lookat http://localhost/index.html we see the HTML source (because the browser has beentold that the document is of typetext/plain ). We need to add some functionality to Apache: theability to determine what MIME content type a document is.

Apache’s functionality comes in a set of files called ‘modules’. We start by clearing any defaultmodules built into the system by default. Without this line many modules would be available bydefault. Partly because this is a lesson and partly because all good system administrators are controlfreaks (regarding the systems, not the users!) the only modules used here will be the ones we explicitlyadd. Themod_so.c module is built in to the Apache binary. But because we have cleared the modulelist it is not turned on by default. This is the module that allows us to load extra modules that are notbuilt into the binary.

Figure 38. httpd.conf: Initialising the modules

# Start with an empty module list

ClearModuleListAddModule mod_so.c

Figure 39. Syntax: Starting up the module system

• ClearModuleListLose all information about modules in use.

• AddModule mod_so.cUse themod_so.c module. Because it is built in to the binary we don’t need to specify theexternal file the module lives in.

21

Figure 40. httpd.conf: Following symbolic links

Options +FollowSymLinks

The server at the moment also doesn’t respect symbolic links, refusing to follow them either forpages or directories. Following symbolic links is an option under Apache and, as you will recall, weturned off all options so that we would notice them. There are two options relevant to symbolic links:FollowSymLinks andSymLinksIfOwnerMatch .

Figure 41. Syntax: Option suboptions for symbolic links

• Options +FollowSymLinksThe web server will follow symbolic links.

• Options +SymLinksIfOwnerMatchThe web server will follow symbolic links if the owner of the link (typically its creator) and theowner of the target of the link are the same.

TheOptions directive has a catch. If we give the line

Options FollowSymLinks

then this completely overrides any previousOptions lines andFollowSymLinks becomes theonlyoption in force. For this reason, we use the modifier syntax

Options +FollowSymLinks

whichaddsthe option to the set of options in force.

22

Web pages’ MIME types

Figure 42. httpd.conf: Adding support for MIME types

LoadModule mime_module modules/mod_mime.soAddModule mod_mime.c

TypesConfig /etc/mime.typesDefaultType text/plain

AddEncoding x-compress ZAddEncoding x-gzip gz tgz

Now we see our first use of anexternalmodule. The syntax for the process is rather obscure. This isunfortunate but nothing we can’t handle.

Figure 43. Syntax: Loading an external module

• LoadModule mime_module modules/mod_mime.soThis line says that the filemodules/mod_mime.so (resolved relative to theServerRootdefinition at the start of the configuration file) contains a module calledmime_module . Thismodule is added to the list of modules that the server knows about. As yet the server won’t use themodule; it just knows where to get it should it be called upon to use it.

• AddModule mod_mime.cThis line tells the server to look through all the modules it knows about (either built-in orlocated withLoadModule directives) looking for a module whose original source file was calledmod_mime.c (stupid, but that’s how they chose to do it) and activate it.

When a module is activated some commands are added to the set permitted in the configurationfile. The three directives used here (‘TypesConfig’, ‘ DefaultType’ and ‘AddEncoding’) are allprovided bymime_module module and would be invalid without the precedingLoadModule andAddModule lines.

Next we will consider those extra commands that themod_mimemodule adds. Unless the module isloaded and added before these commands are used they will result in a syntax error.

23

Figure 44. mod_mime: Directives

• TypesConfig /etc/mime.typesRed Hat ships with a file called/etc/mime.types (part of themailcap package) whichidentifies the file name extensions used for various MIME content types on the system. This lineinstructs the web server to use that file to identify MIME content types of files.

• DefaultType text/plainThis says that if the server cannot determine the MIME content type of the file it is about to sendthen it should presumetext/plain .

• AddEncoding x-compress ZThis declares that any file whose name ends in ‘.Z ’ should be declared as having MIME encodingtype ‘x-compress ’ (i.e. it is compressed) and the file name without the.Z suffix should be usedto determine the underlying MIME content type.

Figure 45. Some lines from /etc/mime.types

# MIME type Extensionapplication/activemessageapplication/andrew-inset ezapplication/applefileapplication/mac-binhex40 hqxapplication/octet-stream bin dms lha lzh exe classapplication/postscript ai eps psapplication/x-dvi dviapplication/x-javascript jsimage/gif gifimage/jpeg jpeg jpgimage/x-xwindowdump xwdmessage/partialmessage/rfc822model/vrml wrl vrmltext/plain asc txttext/html html htm

24

Access logsAt the moment we are only logging errors. There is an independent mechanism to log transfers and itcomes as a module. Furthermore, we have no means to deal with the log files generated. This sectionwill address the first issue and the following will address dealing with log files once we’ve got them.

Figure 46. httpd.conf: Logging transfers

LoadModule config_log_module modules/mod_log_config.soAddModule mod_log_config.c

HostnameLookups OnIdentityCheck Off

CustomLog /var/log/httpd/access_log "%t %h \"%r\" %>s %B"

Figure 47. mod_log_config: Directives

• CustomLog filename " format "Log to the file with the given format. Multiple log files may be defined.

• HostnameLookups OnConvert IP addresses to hostnames.

• IdentityCheck OnDo anident lookup for each incoming request.

Figure 48. mod_log_config: Logging escape sequences

• %t: Time of the request

• %h: Remote hostname

• %r: First line of the request

• %s: Status code

• %B: Data bytes sent

The CustomLog directive takes two arguments. The first is the file name to log to and the secondis the format of the log itself. The format line consists of a series of ‘escape sequences’ (the thingsstarting with percentage characters). Each of these is replaced by some piece of information about the

25

request or the server’s response to it. There is no reason why you should not have more than one logfile; you just have multipleCustomLog lines each defining a different log file.

The simple escape sequence is ‘%X’ for some value of ‘X’. See the figure for the most useful examples.It is possible to log an arbitrary header from the query or response. For the server it is usually of moreuse to see the incoming headers. See the syntax description for some examples. Of most use in logfiles is the referring page. For example, you could strip out just those log lines with status code 404(page not found) and check the refering page. If it’s an internal page you can fix the link and if it’sexternal you can contact the webmaster responsible.

The%hcode requires the server to perform a lookup in the DNS to turn the IP address of the incomingrequest into a name. This is not an expensive operation, but if your web site is very heavily usedyou may want to avoid it. There are two ways to go about this. You can use%a instead. Thisjust logs the IP address and attempts no lookup. Alternatively you can use%hbut set the directiveHostnameLookups Off . Under these circumstances%hbehaves like%a. However, if you wantto do access control based on client host name you must haveHostnameLookups On , hence theprovision of%a.

The%l escape also requires some explanation. Theident protocol provides a means for the serverto ask of the client the name of the user on the client (or some tag uniquely identifying the user)who is making the connection. This is only possible if the client system is running the correspondingident server. This server is quite common on multi-user systems and almost unknown on single-usersystems. Again, the load is small for a lightly loaded web server but potentially severe for a heavilyloaded one. (Far more so than for the hostname lookups.)

Finally we need to explain the ‘%>s’ construction. We will see in a later section that some modulesrun a page through quite intricate processing. ‘%s’ is the status code for the processing of the queryand ‘%>s’ is the status code finally returned to the client. The latter is typically what we really want.The figure below lists the most commonly seen status codes. The full set can be found in RFC 2616[http://www-uxsup.csx.cam.ac.uk/netdoc/rfc/rfc2616.txt].

Figure 49. Common status codes

200 OK

301 Moved Permanently

307 Temporary Redirect

400 Bad Request

401 Unauthorized

403 Forbidden

404 Not Found

500 Internal Server Error

505 HTTP Version Not Supported

26

Figure 50. mod_log_config: Common logging escape sequences

• %a: Client’s IP address

• %B: Bytes sent, excluding HTTP headers.

• %f: The name of the file served.

• %h: Remote hostname, or IP address is hostname lookups are off.

• %l : Remote logname fromidentd if IdentityCheck is on.

• %r: The first (typically only) line of the request.

• %s: Status code of the request.

• %T: Number of seconds taken to service the request.

• %t: Time of the request.

• %U: The URL requested.

• %u: The userid used if this is a page that requires userid/password.

• %{header }i : Argument ofheader in the incoming request

• %{header }o : Argument ofheader in the outgoing response

The escape sequences can be more involved than this. Full details are in the Apache documentation[http://www.apache.org.uk/docs/mod/mod_log_config.html].

The%i logging option records the value of an incoming, request header. The most commonly usefulheaders are given below.

Figure 51. HTTP request headers

• Authorization : Access rights to restricted pages.

• From: E-mail address of the user making the request. (Often blank.)

• If-Modified-Since : Only send the data if necessary.

• Referer : The URL of the referring page.

• User-Agent : The web client. Many lie.

27

Figure 52. Some example log lines

[17/Apr/2000:10:10:25 +0100] hostname "GET /index.html HTTP/1.0" 200 1316[17/Apr/2000:10:11:00 +0100] hostname "GET /bogus.html HTTP/1.0" 404 0[17/Apr/2000:10:12:00 +0100] hostname \

"GET http:// elsewhere /index.html HTTP/1.0" 200 1316[17/Apr/2000:10:30:23 +0100] hostname \

"GET /cgi-bin/phf?Qalias=x%0a/bin/cat/%20/etc/passwd HTTP/1.0" 404 0

The figure has four example log lines in the format defined in our configuration file.

[17/Apr/2000:10:10:25 +0100] hostname "GET /index.html HTTP/1.0" 200 1316

The first line shows a succesful transfer of the URLhttp:// machine /index.html . Note thatthe client need only request the local part of the URL having determined what machine to connect toitself.

[17/Apr/2000:10:11:00 +0100] hostname "GET /bogus.html HTTP/1.0" 404 0

The second line shows an unsuccessful transfer request. The file being looked for does not exist (statuscode 404). Note that the logged number of bytes sent back is 0.

[17/Apr/2000:10:12:00 +0100] host-name "GET http:// elsewhere /index.html HTTP/1.0" 200 1316

The third line is an example of someone trying to use the server as aproxyserver. If a request comes infor a fully qualified URL some servers (and Apache if you configure it appropriately) will act as a webclient, fetch that URL and pass it back to you. By default Apache does not do this. Instead, it ignoresthehttp:// elsewhere component and treats it as a request for the local URL/index.html .Note that this request generates a status code 200 and returns 1316 bytes—exactly the same numberas in line one.

[17/Apr/2000:10:30:23 +0100] hostname \"GET /cgi-bin/phf?Qalias=x%0a/bin/cat/%20/etc/passwd HTTP/1.0" 404 0

The fourth line is an example of an unsuccesful hacking attempt. Thephf script has a hole permittingarbitrary shell commands to be run. Note that these would have run as the userapache which has nospecial privilege, but it is still a way in.

The Data Protection Act (1998).The Data Ptotection Comissioner’s office has advised us thatmachine names and IP addresses that can be used to identify an individual (e.g. that of the computerin a student’s room) may constitute personal data in the meaning of the DPA(98). Until there is anexpensive test case and some ignorant, senile, senior judge pronounces precedent we won’t know forcertain.

28

Log rotationIn this section we consider what we can do with the logs and, in particular, how to stop them growingout of control.

Figure 53. /etc/logrotate.conf

# rotate log files weeklyweekly

# keep 4 weeks worth of backlogsrotate 4

# send errors to rooterrors root

# create new (empty) log files after rotating old onescreate

# RPM packages drop log rotation information into this directoryinclude /etc/logrotate.d

Figure 54. /etc/logrotate.d/apache—as shipped

/var/log/httpd/access_log /var/log/httpd/error_log {missingoksharedscriptspostrotate

/bin/kill -HUP ‘cat /var/run/httpd.pid 2>/dev/null‘ 2> /dev/null || true

endscript}

Red Hat Linux provides a service called ‘log rotation’ which provides a uniform mechanism to stop logfiles growing out of control over time. At regular intervals (nightly, weekly and monthly are all com-mon) the log fileerror_log , say, is renamed toerror_log.1 . If there was a previously existingerror_log.1 it is renamed toerror_log.2 , error_log.2 to error_log.3 and so on upto some limit. The default frequency of this operation is defined in the file/etc/logrotate.confto be weekly and the number of log files kept is set to default to 4.error_log.3 is discarded ratherthan renamed toerror_log.4 . A newerror_log is created.

The directory /etc/logrotate.d/ contains the rotation instructions specificto the log files for a particular package. The log files for theapache pack-age are kept in the file /etc/logrotate.d/apache . These are given as/var/log/httpd/error_log and /var/log/httpd/access_log . The emptybrackets after the/var/log/httpd/error_log line means that there is no special actionneeded after the error log file has been rotated. The three lines in the brackets after the/var/log/httpd/access_log line identify a (single line) shell script that should be run

29

after the access log file has been rotated. This sends the HUP signal to the web daemon which causesit to reopen all its log files so that it is now logging to the newly created log files rather than the.1versions.

While this course does not consider theanalog log analysis program, we will remark that the logrotation script is a good place to run it from. Each time the system rotates a log file,analoggets toprocess it. We might also want to address the DPA(98) issues here by insisting that the log files not beworld-readable. Thecreateline stipulates that when the logs files are rotated a new, empty one is to becreated which is read/write toroot , read-only to members of groupwebadmins and not readableat all by anyone else.

Figure 55. /etc/logrotate.d/apache—as modified

/var/log/httpd/access_log /var/log/httpd/error_log {missingoksharedscriptscreate 0640 root webadminspostrotate

/bin/kill -HUP ‘cat /var/run/httpd.pid 2>/dev/null‘ 2> /dev/null || true

endscript}

30

Aliases

Figure 56. Resolving a URL to a file via an alias

By default, the ‘local part’ of any URL is converted to a file name by simply resolving it as a filename relative toServerRoot , which is /var/www/html/ on a Red Hat Linux installation.So, for example, the URLhttp:// server /wombat/index.html would resolve to the file/var/www/html/wombat/index.html .

However, sometimes we want a URL to point out of theServerRoot directory tree. Forexample we can see that the Red Hat Linux Apache installation puts a collection of GIF filesin /var/www/icons/ which is not below /var/www/html/ . We might want the URLhttp:// server /icons/new.gif to resolve to the file/var/www/icons/new.gif whichit won’t by default.

We can accomplish this in two ways: either we create a symbolic link from/var/www/html/icons to /var/www/icons/ or we tell Apache to override theServerRoot setting in certain regards. As this is an Apache course, we will do the latter.

31

Figure 57. httpd.conf: Aliases in Apache configuration

# Aliases

LoadModule alias_module modules/mod_alias.soAddModule mod_alias.c

Alias /icons/ /var/www/icons/

As before, to add functionality to Apache we need a module. In this case it is themod_aliasmodule. This module adds a number of keywords to the configuration syntax but we need only onefor now. In the slide theAlias directive maps a set of URLs with local parts starting/icons/ to thedirectory/var/www/icons/ .

32

Handling directories

Figure 58. Access log: Failing to read a directory

[27/Apr/2000:15:47:11 +0100] hostname "GET /index.html HTTP/1.0" 200 2537[27/Apr/2000:15:48:09 +0100] hostname "GET / HTTP/1.0" 404 0

• http:// server /index.html works

• http:// server / doesn’t

At the moment, while our web server can handle files, determine their MIME content and encodingtypes from their names’ extensions and log their transfer, it still can’t handle URLs that resolve todirectories. Attempts to get such a URL (e.g. the top level URL for the site as a whole) give 404errors. This is clearly unacceptable.

There are two ways to handle this and most sites implement both.

The first is to provide automatic indexing. Given a URL corresponding to a directory, the web serverwill create an HTML web page giving a list of all the entries in that directory. These can be annotatedwith icons (or their ALT text) to identify the corresponding MIME content types. They can be labelledwith sizes, titles etc. or left completely plain. We will start with the basic functionality (and therelevant module) and slowly add in some flashier functions.

The other approach is to nominate one or more filenames so that if such a file exists within a directorythen that file will be displayed instead. The nameindex.html is traditional for this, but is notcompulsory.

33

Automatic indexing

Figure 59. httpd.conf: Module for automatic indexing

# Automatic indexing of directory URLs

LoadModule autoindex_module modules/mod_autoindex.soAddModule mod_autoindex.c

Options +Indexes

Figure 60. Browser’s view of automatic indexing

Index of /* Parent Directory* index.html* poweredby.png

If we simply add the automatic indexing module and enable automatic indexing with anOptionstatement then we see lists of contents for directory URLs (includingindex.html ). Notice thatthe three links shown are one directory, one HTML file and a graphic in PNG format but there is noindication of the MIME content type in the page shown. Each entry is simply preceded by a bullet.

Figure 61. httpd.conf: Fancy indexing

IndexOptions +FancyIndexing

Figure 62. Browser’s view of fancy indexing

Index of /

Name Last modified Size Description__________________________________________________________________

Parent Directory 25-Apr-2000 14:00 -index.html 25-Apr-2000 18:08 2kpoweredby.png 01-Mar-2000 18:37 1k

_____________________________________________________________

34

Figure 63. httpd.conf: Fancy indexing options

IndexOptions +SuppressLastModified +ScanHTMLTitles

Figure 64. Browser’s view of fancy indexing options

Index of /

Name Size Description__________________________________________________________________

Parent Directory -index.html 2k Test Page for the Apache Web Server on Re>poweredby.png 1k

_____________________________________________________________

Themod_autoindex module adds a large number of directives to the allowed set. We’ll start withjust IndexOptions. This allows us to modify the displayed format. Almost always it is passed theFancyIndexing suboption which turns on the “long form” listing seen in the figure. In conjunctionwith this are a number of other options to modify this long form of the output, as shown in the figure.The figure below below lists the more useful options toIndexOptions.

Figure 65. httpd.conf: Adding icons to the fancy listing

IndexOptions IconWidth IconHeight

AddIconByType (HTM,/icons/layout.gif) text/htmlAddIconByType (TXT,/icons/text.gif) text/*AddIconByType (IMG,/icons/image2.gif) image/*AddIconByType (MOD,/icons/world2.gif) model/*AddIconByType (SND,/icons/sound2.gif) audio/*AddIconByType (VID,/icons/movie.gif) video/*

We can very usefully augment the automatic listings by adding icons (or the corresponding alternativetext) to the lines of output depending on the MIME content types of the files. The directiveAddIconByType is provided for this purpose. Its first argument is a pair: the ALT text and theicon. Its second argument is the MIME contents type or types it should be used for. Note that wildcards can be used for the MIME content subtype.

Whenever an image is included in a page it should have itsWIDTHandHEIGHTparameters explicitlyspecified but Apache doesn’t have the facility to parse the image files it serves to determine thesenumbers automatically so a compromise is made. All the icons shipped with Apache are the samesize. TheIndexOptions parametersIconHeight and IconWidth instruct Apache to include these

35

values (which are wired in to the module’s source). All the Apache icons have width 20 pixels andheight 22 pixels. If you choose to replace the icons you are strongly recommended to make them allthe same size and to use the line

IndexOptions IconWidth= X IconHeight= Y

in thehttpd.conf file, to supply their values.

In this example I use one icon for HTML pages (by far the most common, we might expect) andanother icon for all the other text subtypes. If the distribution of your MIME content types is differentyou might choose a different strategy. One place where this might make sense is with the applicationsubtypes, where lumping them all together as “application content types” is not particularly useful.

Figure 66. httpd.conf: Application subtypes

AddIconByType (_PS,/icons/a.gif) application/postscriptAddIconByType (PDF,/icons/a.gif) application/pdfAddIconByType (HQX,/icons/binhex.gif) application/mac-binhex40AddIconByType (DVI,/icons/dvi.gif) application/x-dviAddIconByType (TEX,/icons/tex.gif) application/x-texAddIconByType (TAR,/icons/tar.gif) application/x-tarAddIconByType (BIN,/icons/binary.gif) application/octet-streamAddIconByType (XXX,/icons/unknown.gif) application/*

There is a vast array of application subtypes. Every application-specific data type can claim oneusing the “x- ” extension subtypes. The mainstream applications have applied for “real” applicationsubtypes. The application types you have on your website should be represented by useful icons (thereare plenty) and the default (unknown.gif in our case) should only be used very rarely. The imagein file /var/www/icons/icon.sheet.gif shows all of them in a single picture.

Figure 67. httpd.conf: Directories

AddIcon (_UP,/icons/back.gif) ..AddIcon (DIR,/icons/folder.gif) ^^DIRECTORY^^AddIcon (---,/icons/blank.gif) ^^BLANKICON^^

Directories don’t have MIME types so we need to explicitly add an icon for these. To do this, we useAddIcon which associates icons with items either by name or by special controls. For example, wecan match on the name “.. ” to provide an icon for the reference to the parent directory. There arealso some special controls, written “^^DIRECTORY^^” and “^^BLANKICON^^ ”, match directoriesand places where no icon would be used (to get the formatting right).

36

Figure 68. Browser’s view of a fully labelled web page

Index of /Name Size Description

__________________________________________________________________________

[_UP] Parent Directory -[HTM] in-

dex.html 2k Test Page for the Apache Web Server on Re>

[DIR] manual/ -[IMG] poweredby.png 1k

_________________________________________________________________

On the subject of formatting, we need to point out a few problems. Because Apache generatesPREformatted pages rather than tables it is important that all the icons be the same size and that all theALT text be the same length (traditionally three characters). It doesn’t appear possible to put spacesin theALT text so I tend to use underscores for spaces and three dashes for the blank icon (because itprecedes a horizontal rule which in text browsers are written with a row of hyphens).

It is possible to modify the widths of the displayed columns. TheIndexOptions directive hassuboptionsNameWidth= x andDescriptionWidth= y . The variablesx andy an be either anexplicit number of characters or an asterisk. In the former case the name column is made as wide asits widest element and the description column is sized to make the whole thing 79 columns wide.

Figure 69. mod_autoindex: IndexOptions suboptions

• FancyIndexing : Turns on the “long” format.

• ScanHTMLTitles : Display the HTML title or web pages as their description. This can beintensive on the disc.

• SuppressDescription : Turn off the description column altogether.

• SuppressLastModified : Turn off the column for the last modification date and time.

• SuppressSize : Turn off the column for the size of documents.

• IconWidth[=X] : Specify the width of all the icons in pixels (defaults to 20).

• IconHeight[=Y] : Specify the height of all the icons in pixels (defaults to 22).

• NameWidth=X : Width in characters of the file name column. An asterisk means “as wide as thewidest element”.

• DescriptionWidth=Y : Width in characters of the “description” or “title scan” column. Anasterisk means that the whole row should be 79 characters wide.

37

Figure 70. httpd.conf: Headers and footers

HeaderName HEADER.htmlReadmeName README.html

Figure 71. Browser’s view of headers and footers

This is some text to go at the top of the page above the listing.Name Size Description

__________________________________________________________________________

[_UP] Parent Directory -[HTM] HEADER.html 1k[HTM] README.html 1k[HTM] in-

dex.html 2k Test Page for the Apache Web Server on Re>

[DIR] manual/ -[IMG] poweredby.png 1k

_________________________________________________________________

In addition to customising the listing itself, we can also append information to the top and bottom ofthe listing. Themod_autoindex module provides two directivesHeaderNameandReadmeNamefor this purpose. TheHeaderNamedirective specifies the name of a file whose contents are placedabove the listing and theReadmeNamea file whose contents go beneath it.

The filenames must correspond to a MIME content text type. If it istext/html then they areincluded directly into the generated HTML directory listing. If they aretext/plain then they areincluded within aPREblock.

Note that the text above the listingreplacesthe original text “Index of /”. Also note that theHEADER.html and README.html files appear in the listing and the last directive from themod_autoindex module we will consider isIndexIgnore. This takes a number of regularexpressions following it. Files that match one or more of these expressions is not listed in the index.

Figure 72. httpd.conf: Suppressing files from the listing

IndexIgnore .??* *~ *# HEADER* README* SCCS RCS CVS

38

Default directory index files

Figure 73. httpd.conf: Default files

# Default files in directory URLs

LoadModule dir_module modules/mod_dir.soAddModule mod_dir.c

DirectoryIndex index.html index.htm

The other approach to dealing with directory URLs is to define a filename such that if that file appearswithin the directory it is displayed instead of the directory itself. Themod_dir module providesprecisely this functionality.

It provides theDirectoryIndex directive which gives a list of names which should be tried. Notethat it can take an absolute local path. In the example quoted if a directory URL was quoted then itsindex.html file would be used if it existed. If it didn’t exist then, if the fileindex.htm existedit would be used. Finally, if neither existed, andmod_autoindex module was loaded then thedirectory listing would be given. If the module was not loaded then a 404 “file not found” error wouldbe given.

If you use both themod_autoindex and mod_dir modules then in the configuration file,mod_autoindex mustprecedemod_dir . If they are placed in the other order then themod_diris ignored. The author has no idea why this is and assumes it is a bug.

39

Writing HTTP rather than HTMLWe saw in the logging section that HTTP (the transfer protocol, not the language of the web pages)has the concept of status codes, with 200 being the ‘OK’ response and 404 being the ‘file not found’response. From time to time, we may want to force the generation of a particular error message orstatus code. There are two ways to go about doing this.

The core Apache system has a directive calledErrorDocument . This lets us specify exactly whatpage will be sent back to accompany a 404, say, status code.

Figure 74. httpd.conf: Setting the 404 error document

ErrorDocument 404 /errors/404.htmlErrorDocument 500 "Oops, server goof."

Figure 75. Syntax: Specifying error messages

• ErrorDocument nnn " text " : If the server generates status codennn then atext/plainpage will be returned with that status code andtext as the text.

• ErrorDocument nnn URL: If the server generates status codennn then the local web pageatURLwill be returned along with status codennn .

This depends on the server generating a specific status code. You will recall that status code 403corresponds to ‘forbidden’. We might want to indicate that trying to fetch a particular URL wasexpressly forbidden rather than just not present. For example, given a directory URL, we might wantto display anindex.html file if one exists but give a 403 status code if one does not. So we need away to generate pages with status codes of other than 200. We could do this just by turning off or onthe indexing option but the mechanisms described here provide more flexibility.

This functionality is provided by a module calledmod_asis . This lets us provide web pages thataren’t HTML or any other MIME type but which are the entire HTTP response to a query. This allowsus to add status codes and other HTTP metadata beyond just the HTML content.

First let’s see what a full HTTP session looks like.

40

Figure 76. Faking a browser with telnet

$ telnet draig.csi.cam.ac.uk 80Trying 131.111.10.224...Connected to draig.csi.cam.ac.uk.Escape character is ’^]’.GET / HTTP/1.0

HTTP/1.1 200 OKDate: Tue, 16 May 2000 08:54:29 GMTServer: Apache/1.3.12 (Unix) (Red Hat/Linux)Last-Modified: Tue, 25 Apr 2000 17:08:10 GMTETag: "f242-9e9-3905d0fa"Content-Length: 2537Connection: closeContent-Type: text/html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><HTML>

<HEAD>...</BODY>

</HTML>

Figure 77. HTTP response headers

• HTTP/1.1 200 OK : The HTTP protocol version number (our query was version 1.0 but theserver is entitled to reply with version 1.1), followed by the status code and a text explanation ofthe status code.

• Date : The timestamp of the response.

• Server : A description of the responding server.

• Last-Modified : When the page was last modified.

• ETag: ‘Entity tag’: a key used to uniquely identify this version of the page for caches etc.

• Content-Length : Number of bytes in the body of the response. (i.e. the HTML page, but notthe HTTP headers.)

• Connection : Whether the TCP connection should be kept open after this transfer to allowfurther requests.

• Content-Type : The MIME content type of the following document

• Blank line: The separator between the headers and the body of the web page.

41

So, if we are going to generate a status code 403, say, then we will need to create that first line andperhaps some others. The module will assist us with many of them, though.

The module works as follows: we create a new, fake MIME content type calledhttpd/send-as-is and associate it with files ending with one or more suffixes (.asis , traditionally). The modulethen causes the server to process these files as nearly raw HTTP rather than as HTML or some otherMIME content type. Becausehttpd/send-as-is is not a true MIME type, we don’t want todefine it in the/etc/mime.types file, so we use theAddType directive of themod_mimemoduleto define it purely within the web server. This gives us a module dependency: themod_asis modulecannot be used without themod_mimemodule already being added.

Figure 78. Adding the mod_asis module

# Send .asis files "as is"

AddType httpd/send-as-is asis

LoadModule asis_module modules/mod_asis.soAddModule mod_asis.c

Now, if we wanted to provide for forbidding directory indexing in certain directories as opposed toproviding anindex.html file, we could provide theDirectoryIndex line

DirectoryIndex index.html index.asis

Then, if a user creates aindex.html file it is treated as usual. If there is noindex.html filebut there is anindex.asis file it is used and send ‘as is’. If there is neither then the directory isautoindexed.

Let’s now look at constructing a plausibleindex.asis file.

Figure 79. A plausible index.asis file

Status: 403 Directory searching is prohibitedContent-Type: text/html

<!DOCTYPE HTML PUBLIC"-//W3C//DTD HTML 4.0 Transitional//EN""http://www.w3.org/TR/REC-html40/strict.dtd">

<HTML><HEAD><TITLE>Security policy violation</TITLE></HEAD><BODY><H1>Security policy violation</H1><P>This web site’s security policy prohibits the autoindexing of thisdirectory. Your request has been logged.</P></BODY></HTML>

42

A more useful page would give links to a search engine and the such like. More importantly, observethe headers at the start of the page, split from the body by the first blank line of the page. (Theline is truly empty; there are no spaces or other whitespace characters in it.) TheStatus: headerintroduces the status code and the explanatory text message. We don’t get to specify the HTTP versionbeing spoken; the server will take care of that for us. Any following lines (before the blank line) thatlook like HTTP headers will be passed through untouched and must be valid HTTP header lines. Theserver will add theServer: , Date: andConnection: lines and we should not write these.

Figure 80. Faking a browser with telnet again

$ telnet draig.csi.cam.ac.uk 80GET /two/ HTTP/1.0

Trying 131.111.10.224...Connected to draig.csi.cam.ac.uk.Escape character is ’^]’.Connection closed by foreign host.HTTP/1.1 403 Directory searching is prohibitedDate: Tue, 16 May 2000 11:30:40 GMTServer: Apache/1.3.12 (Unix) (Red Hat/Linux)Connection: closeContent-Type: text/html

<!DOCTYPE HTML PUBLIC"-//W3C//DTD HTML 4.0 Transitional//EN""http://www.w3.org/TR/REC-html40/strict.dtd">

<HTML><HEAD><TITLE>Security policy violation</TITLE></HEAD><BODY><H1>Security policy violation</H1><P>This web site’s security policy prohibits the autoindexing of thisdirectory. Your request has been logged.</P></BODY></HTML>

If we inspect the access log file we will see the 403 lines there too.

[16/May/2000:12:06:30 +0100] hy-dra.csi.cam.ac.uk "GET /two/ HTTP/1.0" 403 345

This is where we get to see the difference between the logging code ‘%s’ and ‘%>s’. The formerwould log a status code of 200 because the.asis file was processed correctly. The latter shows 403because that is the ultimate status code after all the internal reprocessing is complete.

43

Users’ own web pages

Figure 81. httpd.conf: User directories

# Users’ web pages

LoadModule userdir_module modules/mod_userdir.soAddModule mod_userdir.c

UserDir public_html

Apache contains a mechanism (read module) that allows users to supply their own web pages. It is notuncommon for a web server to offer nothing but these and to have no ‘central’ web pages at all (exceptperhaps for a top levelindex.html file. Themod_userdir module provides a single directive,UserDir. It can be used in a number of ways, however.

Figure 82. user_dir: Remapping http://server/~user/index.html

• UserDir public_htmlMaps URL to~/ user /public_html/index.html .

• UserDir /home/userpagesMaps URL to/home/userpages/ user /index.html .

• UserDir /home/*/webstuffMaps URL to/home/ user /webstuff/index.html .

• UserDir http:// other /home/userpagesMaps URL tohttp:// other /home/userpages/ user /index.html

• UserDir http:// other /*/webstuffMaps URL tohttp:// other / user /webstuff/index.html

44

Delegating the controls for certain pagesSo far, our editing of the filehttpd.conf has set parameters for the entire server. On occasionit is appropriate to have one set of parameters for one set of web pages and another for other parts.We need some way to pass directives applicable just to a certain set of pages. There are a number ofways to describe subsets of pages and Apache supports them all. We will restrict ourselves to just thesimplest in this course.

The simplest is by considering subtrees of the web pages. We can tag a directory with some specialoptions and have those apply to every web page beneath it. For example we might want to restrictaccess to everything under/var/www/html/restricted/ .

We might want to tag multiple directory trees by applying these overrides to every directory thatmatches a regular expression, rather than by specifying its explicit name. For example, anythingunderanydirectory calledrestricted might get special rules.

We might just specify a regular expressions that matches files and apply the rules to any files (asopposed to directories) that match. So any web page calledspecial.html might get nonstandardrules.

Alternatively, rather than specify the restriction by file name (after the URL has been resolved) wemight change the rules according to the URL quoted before this gets mapped onto a file (or directory)name. Again this could be a subtree of URLs or the set of URLs that match a regular expression.

Using the directory structure to control options also permits the placing of special files in the directorystructure to control the trees beneath them (traditionally called.htaccess ). These control files, inturn, might benefit from the filename matching rules to stop their being fetched by clients.

While these all make sense in isolation, thecombinationof rules governing directory trees, URLs,filename regular expressions and URL regular expressions is a recipe for trouble. We are going toapproach this issue from the KISS (‘Keep It Simple, Stupid!’) standpoint and restrict ourselves todirectory subtrees here.

Figure 83. A simple restriction example

• By default:

• index.html files to be respected.

• Automatic indexing permitted.

• Under/var/www/html/fubar/ :

• index.html files to be respected.

• Automatic indexing forbidden.

Our configuration file will run with

45

DirectoryIndex index.html

but we need

Options +Indexes

for the default case and

Options -Indexes

for the /fubar/ subdirectory. The next element of the configuration file we will examine providesprecisely this functionality.

Figure 84. httpd.conf: Restricting options to subdirectories

# DefaultOptions +Indexes

# Subdirectory restriction<Directory /var/www/html/fubar/>Options -Indexes</Directory>

The <Directory> tag limits the application of parts of the configuration to just those files anddirectories beneath/var/www/html/fubar/ .

We start to see a problem here, though. Inevitably, the directory structure will get larger and larger.The set of overrides and rules will get longer and longer. More and more people will need access tothehttpd.conf file. More and more lines will get added to it. This is bad. What is needed is a wayto delegate the controls over a directory tree to the directory itself. This facility exists, using controlfiles in some directories, traditionally called.htaccess . The file can, however, have any name wechoose to give it. However, before we start delegating control we might want to restrict just whatconfigurations in thehttpd.conf file we are prepared to have overridden in the delegated controlfiles.

Figure 85. httpd.conf: Delegation of (some) control

AccessFileName .config

<Directory /var/www/html>AllowOverride AuthConfig FileInfo Indexes</Directory>

The delegated control file was originally used to control access to subtrees of web pages (and we’ll seehow to do this soon) and the name of the directive that sets it (AccessFileName) reflects that history. It

46

is a more general overriding facility, though, so to reinforce that, we’ll use the nameAccessFileNamedirective to set the name of the delgated configuration file to be.config .

The second line specifies what facets of the Apache configuration can be overridden in the.configfiles. This aspect of the Apache control mechanim is not as refined as it might be, unfortunately. Anydirective that appears in thehttpd.conf file and which ‘makes sense’ applied to a directory tree(more precisely, any directive that could appear in a<Directory>...<Directory/> block) can be placedin this subconfiguration file.

Figure 86. Core functionality: Delegation of (some) control

• AccessFileName fnameWithin the document tree the a filefname will override the default behaviour with the behaviourspecified within (insofar as is permitted).

• AllowOverride suboptionsThis directive specifies exactly what aspects of the configuration may and may not be overriddenin the files named by theAccessFileNamedirective.

Figure 87. Core functionality: AllowOverride suboptions

• AuthConfigControl the mechanisms used for authenticating users for access to restricted documents. See thesection on access control for more on this option.

• FileInfoThis permits the use of the directives found in the MIME module to change or add MIME types.

• IndexesThis permits the use of the directives found in the two directory modules.

• OptionsAllow the use of theOptions directive in the delegated control files.

• AllPermit all overrides.

• NonePermit no overrides. Ignore the delegated control files.

Now let’s return to the case study in the slide. We will drop the subdirectory directives entirely andinstead specify what overrides we will permit and in what file. We then have a.config directorythat changes the options again.

47

Figure 88. httpd.conf: Restricting options to subdirectories

# DefaultOptions +IndexesAccessFileName .config<Directory /var/www/html>AllowOverride Options</Directory>

Figure 89. /var/www/html/fubar/.config contents

Options -Indexes

Now we start to see Apache creak at the seams. Note that to change the nature of indexes (usingthe IndexOptions directive) we would need to allow the overrideIndexes. However, becauseturning automatic indexing on or off (Options +Indexesor Options -Indexes) is handled by theOptions directive we have to permit theOptions override. This is unfortunate because there are othersuboptions toOptions that we might not want to delegate. Mercifully, indexing is the exception ratherthan the rule in this regard. In most other cases the controls over delegation do make sense.

The next question to address is how these files nest. If I have a default state ofOptions +Indexesand a file /var/www/html/fubar/.config containingOptions -Indexes, what happens ifI have a file /var/www/html/fubar/snafu/.config with Options +Indexes? As youmight expect thefubar/snafu/.config overrides thefubar/.config file for the contentsof fubar/snafu/ .

48

Access control by client IP addressThere are basically two ways to restrict access to web pages: by the client’s IP address or by makingthe client quote a userid and password. For the time being we will control the entire web site. We canthen use the previous section to control just subsets of the site. In this section we will restrict by IPaddress and in the following section we will describe a superior system.

So, first of all, a warning about IP access restrictions: web proxies can really spoil your day. A webproxy is a system that forwards web requests on to another server. So ifwww.inst .cam.ac.ukrestricts access to clients insidecam.ac.uk it is vulnerable to proxies withincam.ac.uk .If randompc.example.com makes a direct request it will be rejected. However, ifran-dompc.example.com makes a request of a web proxy,proxy. college .cam.ac.uk , thenthe proxy forwards the request towww.inst .cam.ac.uk . The latter sees a query comingfrom within cam.ac.uk and honours it. The proxy then forwards ther answer back toran-dompc.example.com . The moral of this tale is to use client address restriction only when yourestrict to a set of machines you control (enough to restrict proxies on them). Don’t use it blindly.

To give an example of how hard this is the Computing Service discovered it was running an unintendedproxy which allowed the CS minutes (restricted to the CS internal network by IP address) to be readfrom any machine in the world if they knew about our proxy. The CS friendly probing suite nowprobes for web proxies so you won’t be surprised by yours the way we were by ours.

And now, we will demonstrate how to add client address access restrictions in the Apache configura-tion file using themod_access module.

Figure 90. httpd.conf: Access restrictions

# Access control by IP address

LoadModule access_module modules/mod_access.soAddModule mod_access.c

order deny,allowallow from .csi.cam.ac.ukdeny from allallow from .csx.cam.ac.uk

Theorder line is read first. The ‘deny,allow ’ argument specifies that

1. initially all requests will be honoured

2. then all thedeny lines will be applied

3. then all theallow lines will be applied

49

regardless of the order the lines appear in.

In the example given, access is permitted to the site from clients in the two domainscsi.cam.ac.uk and csx.cam.ac.uk but no others. Note the use of the leading dot toindicate that, for example,.csx.cam.ac.uk is a domain and not a hostname. Also note that foraccess control by domain name to work you need to haveHostnameLookupsset toOn.

Figure 91. Request from randompc.example.com

1. Initial state: Access allowed

2.deny from all : Access denied

3.allow from .csi.cam.ac.uk : Inapplicable—No change

4.allow from .csx.cam.ac.uk : Inapplicable—No change

5.Final state: Access denied

Figure 92. Request from ghoul.csi.cam.ac.uk

1. Initial state: Access allowed

2.deny from all : Access denied

3.allow from .csi.cam.ac.uk : Applicable—Access allowed

4.allow from .csx.cam.ac.uk : Inapplicable—No change

5.Final state: Access allowed

50

51

Figure 93. mod_access: allow directives

• order deny,allow

1. Initially all access allowed,

2. then apply alldeny lines,

3. then apply allallow lines.

• order allow,deny

1. Initially all access denied,

2. then apply allallow lines,

3. then apply alldeny lines.

• allow from all

• All requests are allowed.

• allow from host.inst.cam.ac.uk

• Requests from the host are allowed. RequiresHostnameLookups On.

• allow from .inst.cam.ac.uk

• requests from hosts within the domain are allowed. RequiresHostnameLookups On.

• allow from 131.111.11.84

• Requests from the host are permitted.

• allow from 131.111.11.0/255.255.255.0

• Requests from any IP address starting131.111.11. are allowed.

• allow from 131.111.11.0/24

• Requests from any IP address starting131.111.11. are allowed. (The first three numberscorrespond to the first 24 bits of the IP address quoted.)

52

Figure 94. mod_access: deny directives

• deny from ...

• As perallow from ...

53

Access control by user authenticationAs said before, the author advises very strongly against restricting access to.cam.ac.uk . You maybe able to get away with restrictions toinst .cam.ac.uk if you rule your institution with an ironfist. Often it is far more useful is to require authorised users to authenticate themselves against theserver. The mechanisms for this are split over a variety of modules and core Apache functionalitydepending on how to want to run the authentication. We will take the simplest approach here andauthenticate against a text password file. Equivalent modules exist for authenticating against morecomplex databases. This becomes necessary if the database gets too big and linear text file searchingtoo slow.

Figure 95. httpd.conf: Restricting access to authenticated users

LoadModule auth_module modules/mod_auth.soAddModule mod_auth.c

<Directory /var/www/html/restricted>AuthType BasicAuthName wombatAuthUserFile /etc/httpd/conf/passwdrequire valid-user</Directory>

Figure 96. Creating an Apache password file

$ touch /etc/httpd/conf/passwd$ ls -l /etc/httpd/conf/passwd-rw-rw-r-- 1 root webadmin 0 Jun 1 10:12 passwd$ htpasswd /etc/httpd/conf/passwd demouserNew password: dem0userRe-type new password: dem0userAdding password for user demouser

First let’s consider what we’ve done to thehttpd.conf file. We have included a modulemod_authwhose function is to permit checking IDs against a plain text password file. This module provides uswith the AuthUserFile directive which specifies the location of that password file. TheAuthNameandAuthType directives belong to the core Apache functionality because they are independent ofthe supporting database. TheAuthType directive specifies the mechanism that is going to be used totransmit the ID and password. If we are going to use themod_auth module we must specifyBasicas the authentication type because this is the only one widely understood. This sends passwordsunencrypted over HTTP.

Basic authentication is best illustrated by usingtelnet as our web client again.

54

Figure 97. Basic authentication uncovered—1

$ telnet hydra.csi.cam.ac.uk 80Trying 131.111.11.148...Connected to hydra.csi.cam.ac.uk.Escape character is ’^]’.GET /restricted/ HTTP/1.0

HTTP/1.1 401 Authorization RequiredDate: Thu, 01 Jun 2000 10:29:37 GMTServer: Apache/1.3.12 (Unix) (Red Hat/Linux)WWW-Authenticate: Basic realm="wombat"Connection: closeContent-Type: text/html; charset=iso-8859-1

...Connection closed by foreign host.

So our attempt to get the/restricted/ URL fails with a status code 401 ‘Authorization Required’.Note the HTTP header line

WWW-Authenticate: Basic realm="wombat"

On receipt of this status code and header line a sensible browser will prompt the user for anID and password for the server, quoting therealm ‘wombat’. The concept of realms allowsus to split the web site into more than one distinctly controlled area. For one directory tree(/var/www/html/restricted/ we can demand IDs and passwords for one realm (wombat)and for another tree we can demand a different set of IDs and passwords.

The browser will then send back the same request as before but this time quoting the ID andpassword given, Base64 encoded. (The Base64 encoding of ‘demouser:dem0user ’ is‘ZGVtb3VzZXI6ZGVtMHVzZXI= ’.)

55

Figure 98. Basic authentication uncovered—2

$ telnet hydra.csi.cam.ac.uk 80Trying 131.111.11.148...Connected to hydra.csi.cam.ac.uk.Escape character is ’^]’.GET /restricted/ HTTP/1.0Authorization: Basic ZGVtb3VzZXI6ZGVtMHVzZXI=

HTTP/1.1 200 OKDate: Thu, 01 Jun 2000 11:09:15 GMTServer: Apache/1.3.12 (Unix) (Red Hat/Linux)Last-Modified: Thu, 01 Jun 2000 10:28:10 GMTETag: "6b543-144-39363aba"Accept-Ranges: bytesContent-Length: 324Connection: closeContent-Type: text/html

...

The browser will typically remember the userid and password for realm ‘wombat’ and if challengedfor the same realm again won’t reprompt the user.

Figure 99. ID-based access restriction logic

• Authenticate the ID

• Is the ID allowed access?

To date we have just explained how the Basic authentication authenticates a web user. We still haven’treally explained why the user is subsequently let in. There are two sides to the permissions: First, theclient must authenticate themseves to the server as a particular ID. Second, the ID must, of itself, havepermission to access the pages. This second stage is covered with therequire directive. The line inour example file

require valid-user

means that any user from the/etc/httpd/conf/passwd file is allowed access if they can quotethe password.

56

Figure 100. An example /etc/httpd/conf/passwd file

demouser:RGMhGsfmvLQeEbob:ylxjJ83Fx7p8Etom:C6QeAIpNqz9IEdick:yfPWrksACScysharry:tXFkoaIYJqbrk

The password file maintained by thehtpasswdprogram uses the same password hashing algorithmas the traditional Unix password file, but note that you cannot use the system password file for theApache system. This file must be maintained separately. Also note that the IDs used in this filearenot login names. There need be no relation at all between the IDs used for web authentication and thesystem’s login names.

Figure 101. A more refined access control

• /var/www/html/restricted/alpha : Any valid user

• /var/www/html/restricted/beta : tom , dick , harry

• /var/www/html/restricted/gamma : bob , tom

Figure 102. httpd.conf: Finer grained access control

LoadModule auth_module modules/mod_auth.soAddModule mod_auth.c

<Directory /var/www/html/restricted>AuthType BasicAuthName wombatAuthUserFile /etc/httpd/conf/passwd</Directory>

<Directory /var/www/html/restricted/alpha>require valid-user</Directory>

<Directory /var/www/html/restricted/beta>require user tom dick harry</Directory>

<Directory /var/www/html/restricted/gamma>require user bob tom</Directory>

57

In the slide we see an alternative use ofrequires. Here, we set up a single mechanism to authenticatethe clients for the directory/var/www/html/restricted and three different schemes fordetermining who (once authenticated) is allowed in.

Figure 103. httpd.conf: Access control by groups

LoadModule auth_module modules/mod_auth.soAddModule mod_auth.c

<Directory /var/www/html/restricted>AuthType BasicAuthName wombatAuthUserFile /etc/httpd/conf/passwdAuthGroupFile /etc/http/conf/group</Directory>

<Directory /var/www/html/restricted/alpha>require valid-user</Directory>

<Directory /var/www/html/restricted/beta>require group betagrp</Directory>

<Directory /var/www/html/restricted/gamma>require group gammagrp</Directory>

Figure 104. An example /etc/httpd/conf/group file

betagrp: tom dick harrygammagrp: bob tom

There is one level of sophistication above lists of users: lists of groups. In addition to the passwordfile for web IDs to be authenticated there can be a group file assigning these web IDs to web groups.Again, these are completely independent of the Unix login groups and note that the web group file hasa different format from the Unix group file.

It’s worth recalling that anything that appears in a<Directory> block can also appear in the directory’scorresponding.htaccess (or whatever you chose to call it with theAccessFileNamedirective) file.

58

Figure 105. mod_auth: Directives

• AuthType Basic : Specifies the ‘basic’ authentication mechanism.

• AuthName realm : Specifies the ‘security realm’.

• AuthUserFile file : Specifies the web ID password file.

• AuthGroupFile file : Specifies the web group file.

• require valid-user : Any authenticated ID may have access.

• require user user1 user2 : ID must be authenticated and be one ofuser1 or user2 tohave access.

• require group grp1 grp2 : ID must be authenticated and be in groupgrp1 or grp2 tohave acces

59

Virtual hosts

Figure 106. HTTP request headers

GET / HTTP/1.0Connection: Keep-AliveUser-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.14-6.1.1 i686)Host: hydra.csi.cam.ac.ukAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*Accept-Encoding: gzipAccept-Language: es, enAccept-Charset: iso-8859-1,*,utf-8

A header that is part of the HTTP/1.1 spec. but has been a standard extension to HTTP/1.0 in allbrowsers is theHost: header. This identifiesby namethe host the brower was trying to connect to.

At first glance this is pointless. If the browser hadn’t been trying to connect to the server it wouldn’thave been connecting to this instance of Apache in the first place! However, it is possible to havemultiple different names all pointing to the same IP address and hence the same instance of Apache.

There are several ways to do this in the DNS but the most common and easiest is to have a single realname for the IP address (its ‘canonical name’) specified in the DNS by an A record (so called becauseit looks up theaddress corresponding to a name) and one or more aliases. These aliases are othernames defined to be equivalent ot the real name by CNAME records in the DNS (so called becausethey look up thecanonicalnameof the alias).

Figure 107. DNS entries

www-uxsup.csx.cam.ac.uk. 1D IN CNAME nymph.csi.cam.ac.uk.nymph.csi.cam.ac.uk. 1D IN A 131.111.10.245

By explicit inclusion of the originally requested host name it is possible to have a multiplicity ofwebsites each corresponding to different names for the same host. This is managed in the configurationfile with the<VirtualHost> directive.

Figure 108. httpd.conf: Setting up a virtual host

# Virtual host example<VirtualHost cockatrice.csi.cam.ac.uk>DocumentRoot /var/www/cock</VirtualHost>

The slide shows the setting up of a virtual host with the system definitions but with a differentdocument root. You might want to create separate Unix user groups for the control of the contentof the virtual host data and the ‘canonical’ host data.

60

On systems that run multiple virtual hosts, it is very common for the canonical document root to havenothing but a home page saying ‘go to one of these virtual hosts’ and for all the data to be under thedocument trees for the various virtual hosts.

61