introduction to apachebooks.mhprofessional.com/downloads/products/0072223448/...chapter 1...

20
Chapter 1 Introduction to Apache 3

Upload: others

Post on 27-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

Chapter 1Introduction to Apache

3

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:43 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 2: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

This chapter starts introducing you to Apache and shows you how the Apachedevelopers work. If you’ve already downloaded Apache or you’re familiar withthe Apache development process, you can skip this chapter and go directly to

Chapter 2.

What Is Apache?Apache is the most popular web server on the Internet today. First released inDecember 1995, Apache became the leading web server less than one year later. Thisbook is going to lead you through an exploration of the second major release of thisweb server: Apache 2.0. Many changes have occurred between Apache 2.0 and theprevious releases. While an experienced Apache user can pick up 2.0 and immediatelystart to use it, many new features and optimizations exist that people accustomed toApache are likely to overlook. This book can ensure that nothing in Apache 2.0 is missed.

How Web Servers WorkBefore you can understand Apache, you have to understand web servers in general.So, let’s start by looking at what a web server does, and then walk through a generaloverview of how Apache itself works. We can go through the details throughout therest of the book, but it’s best if we all start at the same location.

Perhaps the best place to begin is to look at a standard web page, so you can watchhow a web server works from a user’s perspective. Later in this book, you look at theinternals of Apache but, for now, the goal is to understand what happens when youvisit a web page. Figure 1-1 is a portion of Yahoo.com’s front page, which is theexample page.

At its most basic, a web browser is a piece of software that requests information, anda web server is simply a piece of software that responds to requests for information.Each time you surf to a web page, your browser sends a message to the server. Theserver then finds the information on the server’s hard disk and responds with therequested page, as shown in the following illustration.

Of course, most web sites aren’t this simple. Let’s look at Figure 1-1 again. Noticemany small images are on this page. Images are added to web pages using an HTMLtag, which makes adding any image to your web page easy. However, images aren’t as

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

4 A p a c h e S e r v e r 2 . 0 : T h e C o m p l e t e R e f e r e n c e

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:43 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 3: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

easy for web servers. When your browser makes a request for a page, it gets that HTMLfile back in response. The browser then parses the entire file, looking for tags. If any imagetags are found in the HTML file, the browser must make additional requests to the serverto retrieve those images. In the example you see, this means a request for this one pagewill result in at least 14 requests to the server. To help lessen the workload on the webserver, the browser is allowed to make multiple requests over the same connection.While this makes wonderful sense from a purely technical viewpoint, it means the webpage will appear to load more slowly because each request must wait for the previousrequest to finish before it can start. Of course, anyone who has surfed the web realizesthat when web pages are loaded, multiple images are retrieved at once. This happensbecause web browsers open multiple connections to the server to request the componentsof the page.

When the World Wide Web was in its infancy, just having images on your site wasconsidered a big feat, but those days are long gone. Now, to have a great web site, it mustdynamically customize itself to the person requesting the information. This means youcan no longer simply store the web page on the server’s hard drive. Instead, you need a

C h a p t e r 1 : I n t r o d u c t i o n t o A p a c h e 5

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

AP

AC

HE

OV

ERV

IEW

Figure 1-1. The front page of Yahoo.com—an example of a standard web page

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:44 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 4: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

6 A p a c h e S e r v e r 2 . 0 : T h e C o m p l e t e R e f e r e n c e

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

much larger infrastructure to serve your web site. This tends to mean CGI scripts orJava servlets, referencing a database on the back end, as seen here.

When a request is received for a dynamic page, the server executes the program thatshould generate the page. Assuming everything works correctly, the program sends thepage through the server back to the client. Of course, having that program run on everyrequest makes the server work harder to satisfy those requests. Later in the book, youlearn the best way to lighten the load on your server if you must use dynamic pages.

That covers an easy setup (of course, nothing in the world is ever easy). What happenswhen you have so many requests for your site that you require multiple machines to keepup with the load? You can share a single web site among multiple machines in many ways.The first, and easiest, is to have one machine for each type of content, one for images, onefor static pages, one for Java servlets, and so forth. This is the first step most sites usewhen they’re trying to architect a large site, but this only works for so long. At some point,you need to have multiple machines serving the same content. In general, this is done byhaving one machine sit in front of the computers serving the content. That one machine isresponsible for sending the requests to the computer with the least load on it at thatmoment. In this way, you can replicate your web site across multiple machines and allowthem each to serve a portion of your load, as you can see in the next illustration.

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:44 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 5: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

C h a p t e r 1 : I n t r o d u c t i o n t o A p a c h e 7

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

AP

AC

HE

OV

ERV

IEW

I realize this all looks complex and we’ve barely scratched the surface of what’spossible with Apache. My hope is this book will explain how to make Apache work foryour site in whatever situation you find yourself. This book is best used as a referencefor Apache. As you run into problems while implementing your web site, you can referto specific portions of this book to find the answers.

The Apache Software FoundationThe history of Apache is the history of web servers in general. The first web server everwas developed by the National Center for Supercomputing Applications (NCSA) byRob McCool, but development stalled when Rob left the NCSA in mid-1994. By thistime, many administrators had privately patched their copies of the server to fix bugsand provide extensions to the server itself. A small group of those administratorsorganized around a mailing list and some space on a shared computer. That group ofpeople became known as the Apache Group and began working to produce the bestweb server on the Internet.

Browser

Request

Response

CGIscripts

Load balancer

Staticpages

Images

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:45 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 6: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

Using the NCSA server as a base, the original Apache Group worked hard to improveit, adding all the published bug fixes and as many enhancements as looked useful. Oncethis was tested on their servers, the first Apache server, version 0.6.2, was released. Thishappened in 1995. This was also when the NCSA restarted development on its server.To ensure the best possible quality in both products, two of the NCSA developersjoined the Apache mailing list to share ideas and fixes. This might seem odd to peopleused to a corporate environment. The Apache developers have never been interested inhaving the most popular software. Their interest is to make the best software possible.This goal allows for a tremendous amount of freedom because they have no marketingpeople to contend with, just technical correctness.

Although the first release of Apache was met with wonderful success, all thedevelopers realized the design of the server wasn’t going to work in the long term. Mostof the developers continued to work on version 0.7.x, while one developer, Robert Thau,designed a new architecture. This new server included the now well-known modularstructure and the pool-based memory allocator. The group switched to this new code basein July 1995 and added all the features that had been added to the 0.7.x branch. This wasthen released as version 0.8.8 in August 1995. After that code base was ported to moreplatforms, a new set of documentation was written, and features were added in the formof more standard modules, version 1.0 of Apache was released on December 1, 1995.

In July 1996, seven months later, Apache 1.1 was released. This release included alot of performance improvements, as well as some new features. The biggest additionwas probably the added caching proxy module. Although that module was never wellmaintained, a problem that’s now fixed in 2.0, many people use Apache as a simpleproxy server. This release also included the capability to have one instance of theserver listen to multiple ports.

The next major release was Apache 1.2, which included support for HTTP/1.1and CGI scripts run as a specific user, as well as many other useful features. The mostinteresting information about this version is it was supposed to be the final release ofthe 1.x branch of Apache. As this version was being completed, the group had alreadybegun to discuss the new feature set for Apache 2.0.

Of course, Apache 1.2 wasn’t the final 1.x release. Apache 1.3 was released in May1998. The big reason to release Apache 1.3 was to include support for Windows NT,which, until this time, had been the one major platform that couldn’t run Apache.Unfortunately, the Windows port has never been optimal in the 1.3 series. The serverwas written to run on Unix-like platforms and ports to wildly different platforms havealways suffered.

During this time, while Apache was improving, other projects were created thatextended the web server and grew to take on a life of their own. One of those projectsis mod_perl, a module that embeds a Perl interpreter in the server process. Another ismod_jserv, a project to connect a Java App-server to an Apache web server. While thepopularity of these projects grew, the Apache developers realized legal issues existedthat a group of volunteers couldn’t handle. For example, the source code has alwaysbeen available free of charge. The only requirement to distribution is that the Apache

8 A p a c h e S e r v e r 2 . 0 : T h e C o m p l e t e R e f e r e n c e

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:45 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 7: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

developers are acknowledged and you can’t call your product Apache without ourconsent. Of course, many people have ignored the license and distributed versions ofApache the group didn’t sanction. Another problem the developers were faced with ispeople wanted to donate money to help Apache progress. To help provide legal adviceand to provide a way for people to donate money and hardware to the developers, theApache Group incorporated in June 1999 as a nonprofit organization, known as the ApacheSoftware Foundation (ASF).

The ASF continues to be a meritocracy, allowing anyone to be a member, as long asthat person contributes to at least one of the projects. The ASF has also expanded thetypes of projects it encompasses. Along with the web server project, and the Perl, PHP,and Java projects, is a Tcl project, a set of XML projects, and a project to create a portablerun-time library. The hope is all these projects are useful to the programmingcommunity at large.

Like most Open Source projects recently, the ASF has been faced with the problemof being too popular. Just before the ASF was officially created, both IBM and Sun wantedto join the ASF to help develop web-based solutions. IBM wanted to stop developingits own proprietary web server and adopt Apache as its official web server. Sun wantedto release its Java app server as an Open Source project. The ASF wouldn’t accept thecompanies themselves as members, however, the ASF did let developers from Sun andIBM to join as individual members, who are paid to work on ASF projects. By drawingthis distinction, the companies aren’t given any real control because each developer isacting on her own.

Immediately after beginning work on Apache 1.3, IBM realized performance issuesexisted with Apache and AIX. To help resolve these issues, IBM asked three of itsdevelopers to devote themselves full-time to Apache 2.0. This work began in earnestin December 1999, and has been ongoing ever since.

How Is Apache Developed?Apache is an Open Source project, which is developed by a group of individuals fromaround the world. The Open Source model has become popular recently, with morecompanies adopting the Linux operating system. Although Linux is probably thebest-known Open Source project, it isn’t the best example of how well Open Sourceprojects can compete against commercial products. Apache has been the dominantweb server on the Internet for the past few years, running over 19 million web sites,which is just under 60 percent of the Web. This speaks highly for Apache and Apache 2.0continues this tradition of high-quality software.

Until a few years ago, the Apache developers had never met together in one place todiscuss the future of Apache or its design. All the work was done online, using a mailinglist and web pages to coordinate the effort. The development model itself is relativelystraightforward: all decisions are made on the public mailing list. Everyone is encouragedto subscribe to the mailing list and to participate in the discussion. When a developerhas a change to make, he does one of two things. If the server is in review-then-commit

C h a p t e r 1 : I n t r o d u c t i o n t o A p a c h e 9

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

AP

AC

HE

OV

ERV

IEW

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:45 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 8: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

mode, he posts the change to the list for other people to review and, when it’s deemedacceptable, the change is committed to the repository. If the server is in commit-then-review mode, then the change is committed directly to the repository, which results inan e-mail being sent to the repository mailing list, to be reviewed by the other committers.Nearly all of the time, the server is in commit-then-review mode because it allowsdevelopment to proceed much faster. We only go into review-then-commit mode whenwe absolutely want to get a stable version of the server. Of course, a developer is welcometo post patches for review even if we are in commit-then-review mode. This is encouragedfor large patches and it’s the only way for new developers to get patches into the core.

New committers are expected to post patches to the development list for experienceddevelopers to review. If one of those developers reviews a patch and likes it, she is freeto commit it to the tree. It’s important to realize that most of the core developers arevolunteers, doing this in their spare time. If you post a patch to the development list, youneed to understand it can take weeks before someone even reviews the patch, let alonecommits it. If you post a patch and it isn’t committed within a few weeks, re-post the patchwith a polite note stating you’d appreciate some feedback, so the patch can be applied.People often get upset when patches aren’t committed and they forget the developers aredoing this for fun. No one enjoys being yelled at for not doing well enough at a volunteerjob. It also isn’t enough simply to post a patch and walk away. The key to Open Source isits collaborative nature. If you post a patch, stick around and take part in the development.Having people review the patch makes the code better and, if you keep an open mind, itcan make you a better programmer.

The voting system is at the heart of the Apache development model. Everyone onthe mailing list gets a vote, but only votes from people with commit access are countedtoward the final tally. Some people believe because their vote doesn’t count, theyshouldn’t post it, but nothing could be further from the truth. Many of the committersreview other people’s opinions and consider those when casting their own votes. Eachdeveloper gets one vote on a given topic or patch and there are three possible votes:minus one, zero, and plus one. A plus one vote indicates the developer is in favor ofwhatever is being discussed. A zero vote indicates the developer doesn’t have strongfeelings one way or the other. Sometimes, a zero is prefixed with a plus or minus toindicate that while the developer doesn’t have strong feelings, he definitely leans inone direction. Finally, a minus one vote is a veto. No way exists to override a veto:either the developer who vetoed something must change her mind or the discussionis over. However, a veto is only valid if there’s technical justification for it. No vetohas ever been overridden because it wasn’t backed up with a valid reason, but it’sconsidered bad form to hold up development with an invalid veto. To help streamlinethe development process, we skip voting most of the time. We use lazy voting to allowthings to keep moving. If someone doesn’t vote on a particular topic, that’s consideredan implicit plus one vote.

This voting model works well for Apache because of the level of trust that existsbetween the committers and the core members. We each know everyone else is mostconcerned with what’s best for Apache, even though we might not always agree about

10 A p a c h e S e r v e r 2 . 0 : T h e C o m p l e t e R e f e r e n c e

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:45 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 9: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

C h a p t e r 1 : I n t r o d u c t i o n t o A p a c h e 11

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

AP

AC

HE

OV

ERV

IEW

what that is. The reason this trust exists is we don’t let just anyone have commit access.To get commit access to the source repository usually takes somewhere between threeand six months of consistently good patches. Once you receive commit access, becominga part of the core group of developers takes another three to six months. Without suchstringent rules for giving people commit access, the core developers would use theirveto power more often. Whenever a change is vetoed, however, bad feelings alwaysoccur because this is essentially saying to another developer that he isn’t trusted tomake the right decision. Because everything is based on trust, this is a slippery slopeto traverse.

Who Is Using Apache?Now that you understand the basics of web servers, the people who develop the software,and the model used, you’ll look at who’s using Apache. A common misconception existsthat enterprise companies don’t use Apache—that Apache is relegated to small sites run bycompanies and groups that can’t afford to pay for commercial software. This section givessome examples of sites currently running Apache, and then you learn how you canexamine your favorite site to determine what web server it’s using.

Yahoo!The Apache Software Foundation held the first Apache Conference in September 1999,in San Francisco. As part of this conference, David Filo the cofounder of Yahoo!, wasinvited to discuss how Yahoo! uses Open Source software in its business. At one time,Yahoo! tried to use proprietary operating systems and its own homegrown web server.However, Yahoo! found those solutions couldn’t scale well enough to meet its needs.Yahoo! also decided early on that it couldn’t afford to be beholden to any one companyto solve a problem. If Yahoo! encountered a situation the software couldn’t handle, itwanted to be able to look at the source code and fix the problem on its own.

IBMBecause IBM helps develop Apache, it stands to reason that IBM runs Apache onits site. What’s most likely a shock, though, is that IBM ran Apache before it startedworking on Apache! In fact, IBM ran Apache until a policy was developed stating thatgroups within IBM had to look to IBM solutions before using any other solution.

Amazon.comAmazon.com is the first big success story of the Internet economy and it has been usingApache as its backbone since the beginning. When Amazon.com’s web server fails, itloses money. This means having Amazon.com trust its whole business model to Apacheis a big vote of confidence.

Hotmail.comHotmail.com doesn’t use Apache anymore, but it did run Apache for a long time,even after it was bought by Microsoft. Immediately after being purchased, the site

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:46 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 10: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

administrators tried to migrate the site to IIS and Windows NT. Windows and IIScouldn’t keep up with the load, however, and the servers required rebooting to keepthe site running. Not long after this attempt, the servers were moved back to FreeBSDand Apache. Since that time, IIS and Windows have caught up, and Hotmail has beenmoved to that platform.

Determining What a Site Is RunningWhenever a web server responds to a request it includes the exact version of the serverin the response. Although it is possible to disguise this information, most web siteadministrators do not bother to do so. This means that it is possible to determine whatmost sites on the Internet are using for the web server. The easiest way to do this is touse a site on the Internet, called Netcraft. As well as allowing people to query whichserver a site is using, the Netcraft web site also keeps track of how many sites are usingeach server. The front page of the Netcraft web site appears in Figure 1-2. You can

12 A p a c h e S e r v e r 2 . 0 : T h e C o m p l e t e R e f e r e n c e

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

Figure 1-2. This is the front page of the Netcraft web site, a common place to findout which server a site is running.

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:46 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 11: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

enter any part of a web site address in the dialog box: the site will provide a list of allmatching web sites and a link to determine which web server that site is running.

Netcraft doesn’t always get the results correct, although it is right more often thannot. Netcraft can be fooled into reporting the incorrect server in many ways. If theadministrator configures its server to report the incorrect information, then Netcraftwill report the server type incorrectly. Another potential problem is many smaller sitesaren’t set up in the DNS system to have requests go directly to their computers. Thosesites use a redirector to get requests for their domains to the correct computer. In situationslike this, Netcraft will report the server running on the redirector. For example, myown site, http://www.rkbloom.net, runs the latest version of Apache 2.0. However,Netcraft reports it as Apache 1.3.9, because this is what the web server at my provideris running. The results of querying my site can be found in Figure 1-3.

C h a p t e r 1 : I n t r o d u c t i o n t o A p a c h e 13

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

AP

AC

HE

OV

ERV

IEW

Figure 1-3. The output from requesting www.rkbloom.net

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:46 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 12: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

Downloading ApacheNow that you understand how Apache is developed, the rest of this chapter willcover downloading Apache for the first time. The first thing you must think aboutwhen downloading Apache is whether you want a binary distribution or the sourcecode. If you decide you want a binary distribution, you have to decide if you want acommercial build of Apache or the Open Source build. This section covers all theavailable options for getting Apache to your machine. Before we go into how todownload Apache, however, you need to understand how the Apache developersrelease the software.

Apache 2.0 Release ModelStarting with the tenth release of Apache 2.0, the release model changed drastically.The new model is similar to the one used with Apache 1.0, but radically different thanmost other software projects. The key to this release model is the source tree is neverfrozen; the developers should always be making forward progress. When most softwaregets to the point where it’s about to be released, the developers tend to stop makingchanges, so the source code stabilizes and no more bugs are introduced. With Apache 2.0,the developers decided the bugs would be fixed faster and more reliably if we let peoplecontinue to make whatever changes they wanted. The reasoning is simple: people workon what they enjoy doing. If a bug is in the product, then one of the developers will hitthat bug and he’ll have to fix it because it will delay him from doing work.

To make this possible, we changed how releases are tagged. Instead of advertisingto the development mailing list and notifying developers of how much time they have tocommit all their changes, tags are applied whenever any developer believes the treeis stable enough to be released. At this point, rather than create a tarball, the tag isannounced to the development mailing list and everyone is encouraged to test the treeand report her results.

If the general feeling is positive about the release, a tarball is created and postedto the testers’ mailing list. At this point, the tarball is labeled as an alpha release. As thetesters become more familiar with the release, they’re asked to report on their experiencewith the release. If all the reports are positive and no glaring bugs are found in the release,then the tarball is rebranded as a beta release. After a tarball has been in beta for someperiod, it can be bumped up to General Availability. It’s important that people realize asingle version of Apache 2.0 can be released multiple times, each time at a different level.This lets the developers react to real-world use of the software and it also lets us continueto make improvements without having to stop development to release the software.

Downloading Source from CVSTwo ways exist to get the source code for Apache 2.0. The first method is to get itdirectly from the CVS source code repository. This allows anyone to get the latestdevelopment code base. If you aren’t a developer, you should make sure you only

14 A p a c h e S e r v e r 2 . 0 : T h e C o m p l e t e R e f e r e n c e

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:46 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 13: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

download released versions of the source code. See the following for reasons most peopledon’t want a copy of the latest development tree. You’ll need to have a copy of CVSinstalled on your machine and have it in your path to download Apache from the CVSrepositories. Copies of CVS for most platforms can be found at the CVS web site athttp://www.cyshome.org/downloads.html. The first step to checking out the CVSrepository, is to log in.

%> cvs –d :pserver:[email protected]:/home/cvspublic login

Use anoncvs when prompted for the password. You only need to log in to this machineonce and CVS will keep track of the password you used. Once you successfully log in,you need to check out the actual code. Three packages are required from the ASF in tobuild Apache 2.0: the httpd-2.0 source code, the Apache Portable Runtime (APR), andthe Apache Portable Runtime Utility Library (APR-util). If either of the last two isn’tfound when trying to configure the Apache build process on Unix, the configurationwill fail with an error message telling you to download the project you’re missing. OnWindows, you’ll receive an error message when trying to load the project into MicrosoftVisual C++. To download the source code, after logging in, execute the followingcommands:

%> cvs –d :pserver:[email protected]:/home/cvspublic co httpd-2.0

On Unix:

%> cd httpd-2.0/srclib

On Windows:

%> cd httpd-2.0\srclib

%> cvs –d :pserver:[email protected]:/home/cvspublic co apr

%> cvs –d :pserver:[email protected]:/home/cvspublic co apr-util

These commands work on both Windows and Unix. This downloads the latestdevelopment version of Apache 2.0, which most people don’t want. The latest developmentsource code is modified every day and, while all of the developers strive to ensure italways builds on all platforms, the developers do make mistakes. To make mattersworse, no single Apache developer has access to every operating system (OS) Apacheruns on. This means ensuring Apache can always run anywhere is impossible. If youdownload the development tree and it doesn’t work, the Apache developers are unlikelyto fix the problem immediately unless a simple solution exists. The developers are muchmore likely to fix the problem if you can provide a patch that simply needs to be applied.

To download a specific version of Apache, you must specify the correct CVS tagfor that release. The CVS tags always follow a common format—APACHE_2_0_XX—where XX is the version of Apache 2.0. The easiest way to find the latest version of

C h a p t e r 1 : I n t r o d u c t i o n t o A p a c h e 15

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

AP

AC

HE

OV

ERV

IEW

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:47 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 14: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

Apache 2.0 is to check the web interface to the CVS repository for the STATUS file.Whenever a release is made, the STATUS file is updated to note it. To find the statusfile, surf to http://cvs.apache.org/viewcvs.cgi/httpd-2.0/STATUS and look at the latestversion of the file. Figure 1-4 is an example of this file.

Notice that two different statuses exist for Apache 2.0: rolled and released. If aversion is rolled, then a tarball was created, but it wasn’t released. You don’t want a versionthat says it’s rolled because the Apache developers have already decided this version hasa problem that disqualifies it for a full release. If the STATUS file says it was released,then a tarball was created and made available for general availability. To download aspecific version of Apache, you need to tell CVS which version you want. If you wantto download Apache 2.0.25, you would use the following commands.

%> cvs –d :pserver:[email protected]:/home/cvspublic co –r \

APACHE_2_0_25 httpd-2.0

On Unix:

%> cd httpd-2.0/srclib

On Windows:

%> cd httpd-2.0\srclib

%> cvs –d :pserver:[email protected]:/home/cvspublic co –r \

APACHE_2_0_25 apr

%> cvs –d :pserver:[email protected]:/home/cvspublic co –r \

APACHE_2_0_25 apr-util

Notice all three projects use the same tag in CVS for the time being. When APR isreleased as a separate project, it will no longer use the tag from the web server project.The developers haven’t yet determined how they’ll document which version of APRshould be used with each version of the server. APR should be backward-compatible,however, so the latest version of APR should always work.

Downloading Source as TarballsIf you don’t want to use CVS to download an Apache distribution, you can downloadfull-source code tarballs in two ways. The first method for downloading source codetarballs will get the latest version from CVS. For the same reasons mentioned previously,most people don’t want or require the absolute latest source code. If you do, though,this is possible to get without using CVS yourself.

Every six hours, a script on the Apache computers checks out the latest version ofthe code from the CVS repository and packages it into a snapshot tarball. Important tonote is this script doesn’t build a full tarball. The automated tarball is missing APR andAPR-util. The good news is the same script that creates tarballs for Apache 2.0 alsocreates tarballs for APR and APR-util. The other problem with these tarballs is they’re

16 A p a c h e S e r v e r 2 . 0 : T h e C o m p l e t e R e f e r e n c e

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:47 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 15: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

created on Unix machines. This means people on Windows machines won’t be able touse them because of the differences in text files between Windows and Unix. There’sno way for Windows users to get a source tarball this way.

To download the snapshot tarballs on a Unix machine, point your browser tohttp://cvs.apache.org/snapshots/. In this directory, you can find subdirectories for thehttpd-2.0, apr, and apr-util tarballs. Download the latest of each to your computer.Once you have them all, you need to set things up, so you can build the source fromthose tarballs. The first step is to unzip and untar the httpd-2.0 tarball.

%> gunzip http-2.0_200*

%> tar –xvf httpd-2.0_200*

C h a p t e r 1 : I n t r o d u c t i o n t o A p a c h e 17

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

AP

AC

HE

OV

ERV

IEW

Figure 1-4. The STATUS file as it appears in CVS

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:47 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 16: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

Once you untar the web server itself, you have to untar APR and APR-util in thecorrect location. Just as in the previous, APR and APR-util belong in the srclib directoryunder the httpd-2.0 directory. The following commands finish setting up the sourcecode directories.

%> cd httpd-2.0/srclib

%> mv ../../apr*200* .

%> gunzip apr*

tar –xvf apr*

Downloading ReleasesNow that you’ve seen how to download the source code tarballs the hard way, let’slook at the easy method for downloading the source code. If you’re looking for thelatest release of Apache 2.0, the easiest way to get the source package is to downloadthe source package the developers put together when the release was made. This has theadvantage that it’s complete: everything required for a build is in one place. The othermajor advantage to this method is a Windows Zip file is available, so Windows developerscan also download the source code.

The latest source package can always be found at http://httpd.apache.org/dist/httpd.Once this package has been downloaded, you need to extract the software, so you canbuild it. On Unix, this means you need to unzip and untar the package, as in the previousexplanation, while on Windows. You need to unzip it using either pkzip or WinZip.

Other Required PackagesIf you downloaded the Apache 2.0 source package, you’ll need more tools to compilethe package on your platform. Unix and Windows require different packages, and thissection covers all the different packages required on each platform to complete a build.If any one of these packages is missing, your build of Apache is unlikely to work.

An ANSI C Compiler Regardless of which platform you’re compiling for, you needan ANSI C compiler. The most common compiler on Unix platforms is gcc. Dependingon the version of Unix you’re using, gcc might even come standard on your platform.Some Unix versions, such as Solaris, come with a compiler, but it isn’t ANSI-compliant.If you try to compile Apache with this compiler, it will fail. I want to be very clear aboutthis: you can purchase a compiler for Solaris machines that are ANSI-compliant, butthe one that comes with the operating system doesn’t fit the bill.

Although multiple compilers exist for Windows, Apache still requires MicrosoftVisual C++. We made this decision because the Microsoft compiler generates the bestexecutables of all the Windows compilers. The other problem is detecting which compileris available on a Windows machine. We looked at using the Borland compiler and gcc,through cygwin, on Windows and, while it might work, it isn’t well supported. TheApache developers worked hard, however, to support three different versions of

18 A p a c h e S e r v e r 2 . 0 : T h e C o m p l e t e R e f e r e n c e

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:48 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 17: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

Microsoft Visual C++ 5.0, 6.0, and 7.0. Developers are using all three versions, so they’reall likely to work with Apache for some time to come.

awk Awk is a standard Unix utility Apache takes advantage of to modify files as apart of the build system. While awk isn’t a standard part of Windows, it is required onWindows if you’re going to do your own build. Windows doesn’t have any good tools,by default, to automatically modify a file without any input from the user. If you don’thave awk on Windows, you can download a copy of awk from the cygwin web site athttp://www.cygwin.com. If you can’t get a copy of awk, Apache will still compile, butthe installation will fail. You can copy all the files to the correct location by hand andmodify the default configuration file by hand, but most people prefer to install awkand forget about it. If you aren’t building your own Apache, but are downloading thebinary distribution for Windows, you won’t need to have awk on your machine.

Libtool To make building programs and libraries easier on all the different versionsof Unix available today, Apache 2.0 has moved to using libtool for its build process.Apache 1.3 had custom logic to build dynamic modules correctly on all platforms, butthat was a major source of bugs for Apache 1.3. The developers decided it was better tofocus on building the best web server possible and to use another Open Source projectto build libraries correctly. Apache’s build system will work with either libtool 1.3 or1.4, although I recommend always using the latest version of libtool because of thenumber of bugs that are fixed in each version. For platforms that libtool doesn’tsupport, such as OS/390 and OS/2, the Apache developers have created their ownlibtool look-alikes that will work with our build system. Libtool can be found on theGNU Project’s (FSF) ftp site at ftp://ftp.gnu.org.

Autoconf Autoconf isn’t strictly required to build Apache, but I’ll include it here forcompleteness. If you download a source package that wasn’t released by the Apachedevelopers or you use the source code from CVS, then you require autoconf on Unixmachines. Autoconf is an Open Source tool that creates scripts to examine the machineit’s run on to determine what’s available. Because every version of Unix is slightlydifferent, autoconf is used to determine the features for the version of Unix you’reusing and to modify the build system, so Apache can be built. Apache 1.3 used acustom script to do all this work. Just like libtool, however, the developers decided itmade more sense for them to write a web server and let others write the configurationscript. Autoconf can also be found on the FSF ftp site.

Perl Perl is another package that isn’t strictly required but, if it’s there, the Apachebuild system takes advantage of it. Apache comes with helper scripts written in Perl.If the autoconf script can find Perl on your machine, then it automatically replaces thefirst line in every script to refer to your Perl installation.

Doxygen Doxygen is the final package used when building Apache. This packagealso isn’t required to build a full installation of Apache 2.0. However, the Apache

C h a p t e r 1 : I n t r o d u c t i o n t o A p a c h e 19

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

AP

AC

HE

OV

ERV

IEW

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:48 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 18: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

developers worked hard to document every API in Apache, APR, and APR-util withdoxygen. If you want to extract that documentation, you need to have doxygen installed.In Apache 1.3, some of the developers have begun to document the API online, but thosedocs are often neglected when the source code is modified. To help solve that problemin Apache 2.0, the API documentation is inline with the source code. If a functionprototype is changed in a header file, modifying the doxygen documentation at thesame time is easy. This makes it more likely that the docs will be kept current. Doxygenis an Open Source project and binary downloads for every major platform, includingWindows, can be found at its web site at http://www.stack.nl/~dimitri/doxygen.

Commercial OptionsIf you decide not to compile your own copy of Apache, you still have a couple ofoptions when downloading Apache. This section describes the commercial optionsavailable to Apache users. The other option is to download a binary release from theApache web site, which is covered in Chapter 5, along with how to install the packages.

If you’re looking for a version of Apache that has gone through an official QA cycleand is supported by a commercial company, there are companies that sell Apache. Eachcommercial vendor uses a different installer, so refer to the vendor’s documentation todetermine how to install Apache.

IBM HTTP ServerIBM became involved in Apache in July 1999 by helping to develop the product. Sincethat time, IBM has had many members of its Apache team invited to join the Apachecore team and even more have commit access. You can download a free version of IBMHTTP Server from the WebSphere web site at http://www.ibm.com/websphere.

The biggest downside to IBM’s version of Apache is it made some internalmodifications to Apache 1.3 itself and it’s likely IBM will also make changes to Apache 2.0.These changes were made to allow the IBM server to support its own proprietary modules.Although IBM provides versions of the most-popular Apache modules compiled for usewith its server, you might have problems if you’re going to add less-popular modules toyour server and you aren’t comfortable compiling the code yourself.

IBM HTTP Server supports AIX, Windows NT, zSeries, iSeries, HP/UX, and Linux.

Covalent FastStart and Enterprise ReadyCovalent Technologies has released many products based on the Apache web server.The base package is Covalent FastStart, which is included in all the other packages.FastStart includes many Open Source projects in an easy-to-install package. Includedwith Apache are mod_perl, mod_PHP, Tomcat, Covalent SSL, and mod_DAV.Enterprise Ready Apache includes everything in FastStart, as well as proprietarymodules for Apache and a Management Console.

Covalent has been involved with Apache development since Covalent’s inception.Randy Terbush, the founder of Covalent, is also one of the original eight members of

20 A p a c h e S e r v e r 2 . 0 : T h e C o m p l e t e R e f e r e n c e

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:48 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 19: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

the Apache Group. Covalent has hired many other Apache developers, allowing themto provide support for all versions of Apache. Covalent works hard to ensure it maintainsbinary compatibility with stock Apache by not making any changes to the core server.This means any binary module you download from the Internet will work with aCovalent Apache installation.

Covalent supports Solaris, AIX, HP/UX, Linux, and Windows. Evaluations ofCovalent’s products can be downloaded from its web site at http://www.covalent.net.

Operating System VendorsEvery Unix operating system comes with a web server these days. Most of them areshipping a version of Apache. Even Sun, a part of the iPlanet coalition, ships a copyof Apache on every copy of Solaris sold. Some of those companies have engineersworking on Apache to ensure it works well on their platforms. Although some of theOS vendors sell versions of Apache separate from their OS itself, such as Red Hat, mostdistribute Apache as a part of the OS.

C h a p t e r 1 : I n t r o d u c t i o n t o A p a c h e 21

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8 / Chapter 1

AP

AC

HE

OV

ERV

IEW

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:48 AM

Color profile: Generic CMYK printer profileComposite Default screen

Page 20: Introduction to Apachebooks.mhprofessional.com/downloads/products/0072223448/...Chapter 1 Introduction to Apache 3 Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8

Complete Reference / Apache Server 2.0: TCR / Bloom / 222344-8Blind Folio 22

P:\010Comp\CompRef8\344-8\ch01.vpFriday, June 07, 2002 9:09:48 AM

Color profile: Generic CMYK printer profileComposite Default screen