a two-tiered model for analyzing library website usage statistics, part 1: web server logs

13
A Two-Tiered Model for Analyzing Library Website Usage Statistics, Part 1: Web Server Logs Laura B. Cohen portal: Libraries and the Academy, Volume 3, Number 2, April 2003, pp. 315-326 (Article) Published by The Johns Hopkins University Press DOI: 10.1353/pla.2003.0028 For additional information about this article Access provided by University Of Southern California (9 Apr 2014 08:56 GMT) http://muse.jhu.edu/journals/pla/summary/v003/3.2cohen.html

Upload: laura-b

Post on 23-Dec-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

A Two-Tiered Model for Analyzing Library Website Usage Statistics,Part 1: Web Server Logs

Laura B. Cohen

portal: Libraries and the Academy, Volume 3, Number 2, April 2003,pp. 315-326 (Article)

Published by The Johns Hopkins University PressDOI: 10.1353/pla.2003.0028

For additional information about this article

Access provided by University Of Southern California (9 Apr 2014 08:56 GMT)

http://muse.jhu.edu/journals/pla/summary/v003/3.2cohen.html

Laura B. Cohen 315

portal: Libraries and the Academy, Vol. 3, No. 2 (2003), pp. 315–326. Copyright © 2003 by The JohnsHopkins University Press, Baltimore, MD 21218.

A Two-Tiered Model forAnalyzing Library WebsiteUsage Statistics, Part 1:Web Server Logs

Laura B. Cohen

abstract: The author proposes a two-tiered model for analyzing website usage statistics for academiclibraries: one tier for library administrators that analyzes measures indicating library use, and asecond tier for website managers that analyzes measures that aid in server maintenance and sitedesign. The author discusses the technology of website usage statistics, including several caveatsabout the accuracy of derived counts, and recommends important measures for each tier. Part 1describes Web server logs and the challenges inherent in their analysis. Part 2 presentsrecommendations for conducting log file analysis to obtain meaningful information, benefitingadministrators and site managers.

An academic library website is a key element in an institution’s provision ofinformation resources and services. Assessment of the website is an impor-tant component of the provision of resources and services. Libraries need to

establish whether the website is meeting its mission and is functioning successfully.Creation and maintenance of a site also consume considerable staff time and commit-ment. Therefore, a website’s usage statistics are an important measure of library activ-ity. However, the library community is still in the early stages of developing a coherentapproach to gathering, analyzing, reporting, and applying these statistics. Libraries maybe overlooking these issues entirely, or may be generating usage reports but are notsure what to do with them. The complexity of the topic is little understood by libraryadministrators who may be requesting and reporting usage data. Website managersalso need to be aware of the measures that can be derived and the options for interpret-ing and applying them. This paper has been written with both librarian and websitemanager constituencies in mind.

3.2cohen 4/29/03, 12:25 PM315

A Two-Tiered Model for Analyzing Library website Usage Statistics, Part 1316

The literature in library and information science has paid only scattered attentionto this topic. Moreover, what does exist is incomplete and treats the issue piecemeal.The scholarly literature tends to address the usage measures of page views or usersessions and promotes these as data important to library administrators.1 A notableexception is a 1997 article by John Carlo Bertot and three colleagues, which gives anoverview of the data offered by web server log files and addresses “maintainers, policymakers, and stakeholders” by analyzing the usage statistics for the Government Infor-mation Locator Service (GILS) website.2 Other articles, mainly appearing in the tradeliterature, discuss the vagaries of web server log file interpretation and enumerate mea-sures that can help website administrators plan and design a site.3 In conjunction withthis second approach, current library literature does not adequately address the count-ing problems posed by such phenomena as web crawlers, viruses, cacheing and dy-namic applications, or the challenges of interpreting such data as remote visits and logfile time stamps. In addition, measures rejected by most writers; e.g., total hits to thesite, are overlooked as useful data.

Libraries need to incorporate both approaches and institute a comprehensive pro-gram of gathering and applying website usage statistics. Accordingly, this paper uni-fies the existing literature and proposes a two-tiered model for this activity: One tier, forlibrary administrators, analyzes measures indicating library use, and a second tier, forwebsite managers, analyzes measures that aid in site design and server maintenance.In so doing, this paper presents an in-depth discussion of the technology of websiteusage statistics and enumerates measures of importance for each tier.

Data Needs

Library administrators and website managers need to pay close attention to the usagedata that can be derived from web server log files. By and large, these two groupsrequire a distinct set of data for different purposes. Administrators need usage statisticsfor a number of reasons, including one or more of the following:

• to fulfill reporting requirements, both to external entities and to entities withinthe library and on campus

• to determine the use of website resources and services to justify, to modify or toexpand existing efforts

• to justify expenditures for website development• to verify off-campus use of library resources and services• to identify the clientele of the library’s website and the activity level of these

users• to track usage activity by day of the week and hour of the day in order to

determine staffing needs for user support

Website managers design, organize and maintain the site. If a library runs its own webservers, the site infrastructure and related technologies must also be configured andmaintained. Website managers therefore need usage statistics for a variety of purposes,including one or more of the following:

3.2cohen 4/29/03, 12:25 PM316

Laura B. Cohen 317

• to determine if the web server is meeting usage demands• to ascertain the efficiency of site organization by determining entry and exit pages,

single access pages, and paths through the site• to set standards for HTML coding as well as programming scripts and

applications by tracking browsers and operating systems accessing the site• to identify problems encountered during visitor sessions• to note search engine keywords used prior to visiting the site to establish metadata

practices• to track external pages linking to the site in order to contact other web managers

when their links go bad or to announce changes to the library’s site

This paper discusses the ways in which web server log files can provide these measuresto serve the needs of both library administrators and website managers. As will beshown, these data can be problematic to interpret. However, they provide importantand illuminating information that cannot be derived in any other way. In contrast, logfile data comprise only one measure of library website use and do not evaluate the userexperience or establish outcomes. Libraries should supplement log file data with suchinstruments as user surveys, focus groups, usability studies, interviews, and voluntar-ily submitted feedback in order to derive a user-centered context for the website expe-rience.

Log Files and What They Count

Web server log files generate data when a browser requests a file from the server. This isdistinct from actual usage. For example, a user’s browser might request a particularpage from the server, but the user might (for instance) end up answering the telephonerather than reading the document. Server logs record computer behavior, not visitorbehavior.4 In addition, log analysis provides measures on the file level rather than onthe level of individual links. Librarians wishing to gather usage data on links will needto investigate other means, such as redirect scripts, of taking such measures.

Log files are plain text files and come in many varieties. The current version of thepopular Web Trends Log Analyzer lists nearly forty possible formats.5 Web servers usu-ally have the capacity to generate logs in different formats. Log files themselves canalso be configured to record certain measures.

In the early years of Web servers, the Common Log File format, standardized bythe World Wide Web Consortium (W3C), was in frequent use. This format typicallyconsisted of four separate files:

• Transfer or access log that records a status code, e.g., 200 for a successful filetransfer, the user’s machine IP address, the date and time of access, the filetransferred, the number of bytes sent and received

• Agent log that records visits by browsers and operating systems• Error log that records 404 “file not found” messages, as well as server, application

and other errors• Referrer log that records the address of remote websites from which visits

originate6

3.2cohen 4/29/03, 12:25 PM317

A Two-Tiered Model for Analyzing Library website Usage Statistics, Part 1318

Currently, many servers support the Combined or Extended Log File Format, also stan-dardized by the W3C, which combines these elements into one file. A single line in aCombined Log File can record any of the measures listed above, as shown in the follow-ing example of a successful file transfer.

2002-07-15 00:04:26 66.66.208.26 - 169.226.11.130 GET /hours/ulibsummer.html - 2002891 318 63 HTTP/1.1 library.albany.edu Mozilla 4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) - http://library.albany.edu/hours/

This statement can be broken down into sections and stated in plain English, as follows:

2002-07-15 00:04:26 66.66.208.26 - 169.226.11.130 GET

On July 15, 2002 at 4:26 pm, the client computer located at 66.66.208.26 accessed from theserver located at 169.226.11.130

/hours/ulibsummer.html - 200

the file /hours/ulibsummer.html successfully,

2891 318 63 HTTP/1.1 library.albany.edu

the server sending 2891 bytes to the client and having received from the client a request of 318bytes, taking 63 seconds to transfer the file using HTTP protocol version 1.1 from the host sitelibrary.albany.edu,

Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)

the client using Netscape Navigator 4 installed on a computer with the Windows 2000 operat-ing system,

http://library.albany.edu/hours/

and requesting the file from the referring page http://library.albany.edu/hours/.

Log file analysis software can use such data to count hits, page views and visits.These distinct categories are the building blocks of website usage reporting, and shouldtherefore be clearly understood.

Hits. A hit is any file transferred from the server to a user’s browser. This mightinclude separate image files transferred from a single page as well as individual multi-media files. The transfer of a single page can therefore generate multiple hits.7

Page views. A page view, sometimes called a page access, page hit, page request orpage impression, is a transfer of any page to the browser as a single entity, regardless ofhow many files, e.g., images or multimedia, that it contains.8 When analyzing websiteuse, this is a more desirable measure than hits. This measure, however, is not without

3.2cohen 4/29/03, 12:25 PM318

Laura B. Cohen 319

its problems. Page views can encompass “dummy” requests produced by a virus. Forexample, the Code Red virus registers views of the file default.ida, a file that exploits avulnerability in Microsoft’s Internet Information Server. This file was neither createdfor nor used by a website’s visitors. When this virus was active, these page views couldadd up to hundreds per day on any given site.

Page view counts can also be affected by site design. For example, the presence of ahome button or navigation bar throughout the site, and the number of clicks separatingsequential starting and ending points, will affect the count of total page views.9

Visits. A visit, also called a session, is a series of consecutive requests from an indi-vidual browser. A timeout period must be specified in the log analysis software to de-fine a visit, often set at 30 minutes.10 This means that a 30-minute period of inactivitywill end a visit. The time frame needs to be defined consistently as it will affect the visitcount, e.g., a time frame of 30 consecutive minutes of inactivity will result in a differentnumber of visits than one defined by a 60-minute period of inactivity.

Visits may be tracked by cookies, registered user names, or host names.11 The firsttwo can be problematic. The use of cookies requires expertise and is complicated bytechnical issues, including the fact that users can disable or delete them with their brows-ers. The second option, requiring users to authenticate themselves to access the website,is not a practice generally adopted by academic libraries. Most library websites arefreely accessible to their user communities and the public.12 While it is increasinglycommon for licensed resources requiring authentication to be linked from these sites,most content is available for unrestricted access. For these reasons, the host name, rep-resented in the log as the IP address of the computer from which the user gained accessto the site, is the most common measure employed by libraries to track visits.

Technical Challenges: Undercounts, Overcounts and Other Issues

The gathering and interpretation of website usage statistics derived from log files arebound up in the technological realities of both Web servers and the Internet itself. Thesestatistics are not necessarily what they appear to be. They are, in fact, estimates of websiteuse, but they are indispensable estimates. “Use” in this sense refers to the purposefulactivity of visitors on the library’s site. It bears repeating that Web server log files mea-sure computer behavior. Given this, librarians are faced with the challenge of extrapo-lating counts from the data that yield reasonable representations of human use.

This challenge is complicated by the fact that log file data reflects both overcountsand undercounts. An undercount occurs when less use of the website is recorded in thelog file than actually has taken place. An overcount is the opposite situation in whichthe log file records greater use of the website than actually has taken place. Unfortu-nately, these counts do not cancel each other out, nor can they be assumed to appear inthe same proportion from one Web server to the next.

Undercounts

The major causes of undercounts are caching, which affects hits and page views, and IPaddressing, which affects visits. Both these phenomena are indigenous to the Internetand their effects will be reflected in the measures recorded by Web server log files.

3.2cohen 4/29/03, 12:26 PM319

A Two-Tiered Model for Analyzing Library website Usage Statistics, Part 1320

Hits and Page Views

Caching. Everyone who uses the Web wishes for a quick response to a mouse click. Ittakes time, however, for the data packets that make up a file to travel through a net-work from the Web server to a user’s browser. Further, busy Web servers on crowdednetworks respond to browser requests more slowly than servers responding to fewerrequests on less traveled networks. Caching addresses these issues.

Caching refers to the temporary storage of files that were previously downloadedby a browser. This activity can take place on several levels. On the local level, a user’sWeb browser performs caching by storing a Web page and any component graphicsafter the initial request. It is the first request that registers in the server’s log file. Subsequentrequests will retrieve the page, and much more quickly, from the browser’s cache. Sincethese requests are not made to the server, the log file will not record them. Thus multiplepage views, even those occurring on different days, may not appear in the log. Caching iscomplicated by the fact that browsers can be configured to cache documents for specifiedtime frames or until a certain number of documents use up the space allotted to the cache.13

On the network level, caching can occur if a campus or other local firewall is inplace. Firewalls cache previously requested pages in order to cut down on the networktraffic it must process. ISPs, including America Online, also employ caches, often throughthe use of proxy servers.14 In addition, commercial networks frequently use caches tooptimize the efficient flow of their traffic.

It should be noted that good website design takes advantage of caching in order tolimit unnecessary traffic to the host server and its supporting network. If a site links toa file on multiple pages, all the links should point to a single file stored in one locationrather than to the same file placed in multiple locations. The single file will be cachedand then served from the cache for some portion of the subsequent requests. In sum,what is advantageous to the network and the server is deleterious to the accuracy ofusage counts.

Visits

IP addressing. Computers connected to the Internet are given a unique identifying ad-dress. For example, the address 169.226.11.197 identifies a computer in the domain169.226, which can be translated by a Domain Name Server to the more easily readablename of albany.edu. IP addresses are recorded in the log file with each hit.

When IP addresses are used to count the number of visitors to a site, several factorscome into play that can reduce the accuracy of this data. Counts are accurate when anIP address is assigned to a private workstation used by only one person. This one-to-one relationship, however, is not always the case. An undercount occurs when a singleworkstation represents many users even though only one address is recorded in thelog. Several people also utilize the same IP address assigned by an ISP or representedby a proxy server or firewall.15 For example, a user connecting through an ISP is as-signed an IP address. After disconnecting, the same IP address is assigned to someoneelse.16 The log file will record these as visits from the same user even though a seconduser has come on board. Ultimately, there is no way to determine exactly how manydistinct users are visiting a site when IP addresses are used for visitor identification.

3.2cohen 4/29/03, 12:26 PM320

Laura B. Cohen 321

Overcounts

Overcounts are caused by a variety of factors. As with the undercounts just described,overcounts affect measures of hits and page views as well as visits. Common factorsthat affect hits and page views include viruses, link checkers, Web crawlers, unsuccess-ful requests and frames. IP addressing and visit timeouts can cause an overcount ofvisits.

Hits and Page Views

Viruses. Viruses are an unfortunate fact of life on the Internet. If properly protected, aWeb server will remain unscathed by a virus but its logs may still record the virus’sactivity. Some viruses are easy to detect. Code Red’s request for default.ida is a goodexample. Other viruses are not as transparent in an analysis report, but their activity isrecorded in an inflated measure of total hits to the site. This data is vitally important towebsite managers, but can interfere with an administrator’s goal of measuring websiteuse.

Link checkers. Responsible website administrators run regular link checks of theirsites in order to clean up broken URLs. Link checking programs register page views inthe log file even though these views do not reflect user activity. Many of these programscan be configured to identify themselves to the server or will do this by default. Thisidentity can be found in a log analysis report by locating the section on users or visitingspiders and observing the number of views generated by the software.

Web crawlers. Crawlers, also known as spiders, worms or robots, are sent out bysearch engines to identify pages to include in their indexes. Many library sites are vis-ited daily by these applications. It is a straightforward matter to identify visits by theuser “googlebot” as the Google crawler. Other crawlers are not as easily identified.Good analysis software makes this identification and lists these visits in a separate sec-tion of its report. As is the case with viruses and link checkers, this data representsaccess to the website but not the activity of human users.

Unsuccessful requests. A count of total page views is not the same as a count of allsuccessful page transfers from the server to the browser. Default or custom “File NotFound” pages may also be sent to the browser when a requested page is not at thespecified address or does not exist.17 Since transfer aborts, incorrect requests and bro-ken URLs are common, a count of a site’s total page views can be assumed to be anovercount. Sophisticated log analysis software offers the option to filter reports by sta-tus codes, which represent successful or unsuccessful requests. As will be shown lateron, however, data on unsuccessful requests is useful to Web managers.

Frames. Some websites employ frames, in which a single screen is divided into sec-tions, each containing a separate Web page. The loading of the initial screen registershits on each of the individual files, e.g., if the screen consists of a three-part frame, thetransfer of three files plus any included graphics is recorded in the log.18 This is the caseeven if the individual file in a frame does not carry information specifically requestedby the user, for example, a frame containing a graphical logo or a footer. Frames willskew the counts for hits and page views depending on how extensively they are em-ployed on a site.

3.2cohen 4/29/03, 12:26 PM321

A Two-Tiered Model for Analyzing Library website Usage Statistics, Part 1322

Visits

Visit timeouts. A visit is subject to an overcount if a user exceeds the defined timeoutperiod, e.g., 30 minutes, but subsequently resumes a session. The log analysis softwarewill assign a new visit even though it is the same user making a new request after aperiod of inactivity.IP addressing. Although visitor counts derived from IP addresses are subject to anundercount, this measure may also reflect an overcount. This occurs when users con-nect to the site through an ISP or network that practices dynamic addressing. In thissituation, multiple IP addresses can be assigned to the same user within a single visit.For example, a user may request a Web page through one assigned IP address, andmake a follow-up request with a second assigned address.19 In this case, the log filerecords two users, i.e., IP addresses, when in fact there is only one.

Other Issues with Data Analysis

Many of the phenomena just discussed are reflected in Web server logs by virtue of thetechnology of the Internet. Such factors as caching, dynamic addressing, viruses andWeb crawlers cannot be controlled or modified by those who maintain the website orits log files. In addition to these factors, there are other data recorded in the log thatmust be interpreted with care. This section extends the discussion of data analysis chal-lenges by addressing dynamic applications, IP addresses and visitor identification, andlog file time stamps. Concluding this section is a discussion of issues related to thecounting of local versus remote users.

Dynamic Applications

A growing number of library websites serve at least part of their content dynamically.For example, lists of databases and electronic journals can be stored in database tablesand written to Web page templates when users conduct a search or click a link thatqueries the database. Depending on the underlying program, the resulting filenamemay include parameters containing the request, e.g., http://library.albany.edu/data-bases/search.asp?Letter=E where “Letter=E” is the parameter. This filename is recordedin the log file as a page view, and reveals specific information requested by the user.This type of functionality is distinct from that of fixed, static pages written in HTMLand created manually.

The user manual of the Web Trends analysis software notes that many people re-move these views from general traffic reports and run parameter reports separately.20

Such reports can be extremely valuable in showing the information requested by visi-tors, and their use is encouraged. Also removing all measures of the dynamic pagesfrom the main report, however, can result in a significant undercount of website use,especially if the dynamically generated pages are popular.

Another issue to consider is the architecture of the generated pages. Applicationsmay feature navigational buttons (First, Previous, Next, Last) that allow users to clickthrough a series of pages to see an entire results set. Each time one of these buttons isselected, a new page view is recorded in the log.21 It is only possible to identify these as

3.2cohen 4/29/03, 12:26 PM322

Laura B. Cohen 323

navigational counts, however, if the application was written to generate parameters,i.e., query strings, when the buttons were selected. A good case can be made that thesepage views are worthy of being counted as website use, with or without a parameteridentification. A group of related static pages may require navigation from one to thenext, especially if the Web designers have been diligent about avoiding the construc-tion of lengthy pages. It is therefore recommended that navigational page views becounted in the totals for the site.

Dynamically generated pages can produce potentially different site structures andtherefore different usage statistics than sites consisting of pages that are created manu-ally. Christy Hightower and colleagues contend that keyword searches of database-driven pages probably produce different browsing and query behaviors. For this rea-son, they argue that usage statistics based on these types of page requests cannot becompared with the statistics generated by the use of sites that consist of only staticpages.22 While this rationale is worth considering, it has also been shown that userscommonly employ a website’s search engine in place of site navigation to locate de-sired information.23 Whether a site’s search engine generates more or less use than thesearch feature of an internal database is something that will vary from one site to thenext. User search behavior, and its relationship to site functionality, is a complex issuethat needs more study.

IP Addresses and Visitor Identification

Without user authentication, it is not possible to identify if an off-campus visitor is amember of a library’s affiliated constituency. For this reason, measures of off-campususe must be interpreted with care. For example, a user accessing a site with an AmericaOnline IP address may be anyone. It is more reasonable to interpret most visits fromdifferent states or countries as coming from users outside of a library’s direct user base.On-campus IP addresses create less of a problem unless access to public workstations isnot restricted to affiliated users. In short, it is difficult to identify definitively through IPaddresses all the users visiting a site who represent a library’s affiliated or target audience.

Time Stamps

Log files record the amount of time spent on a particular page before the user eithermakes a subsequent request or leaves the site. This measure actually represents thetime spent downloading the file to the browser and any time thereafter that a user mayspend reading the file. These time measures can be calculated and reported by log analy-sis software. As was noted earlier, a user may download a file but not actually read it.Further, summary counts of visit durations are difficult to interpret. For example, thefact that a certain percentage of visits were two minutes in length might mean thatthese users found what they wanted quickly or else left the site in frustration. More-over, certain pages are designed for brief visits while others contain material that needscareful examination. For these reasons, reported time measures should be consideredan approximation of user behavior. Perhaps the most advantageous use of this data isin the context of particular page analysis or in path analysis, which tracks the views ofa series of pages in sequence.

3.2cohen 4/29/03, 12:26 PM323

A Two-Tiered Model for Analyzing Library website Usage Statistics, Part 1324

Counting Local vs. Remote Users

Academic library websites commonly receive visits from users who are located outsideof the library building and off campus. Some researchers have recommended that in-library or staff access be excluded from consideration in any usage report.24 As a pro-posed standard, the measure of “virtual visits to the library’s website and catalog” con-stitutes the only website statistic (outside of digital collections) included in the ARL E-Metrics Phase II Report. This measure is defined as “user visits to the library’s website orcatalog from outside the physical library premises, regardless of the number of pagesor elements viewed.”25 Although the report recommends also gathering counts of in-library visits accessed from Internet workstations, the external visits alone fit the defi-nition. The authors state:

Use of the website or catalog from outside the library reflects interest in library services.The role of networked services is to expand the reach of libraries beyond their physicalboundaries. This statistic helps describe the significance of networked services use bymeasuring the number of virtual accesses. This will also give an opportunity for thelibrary to compare the demand placed on their networked resources with that for otherpopular information-oriented websites (such as Excite, Lycos, etc.).26

Because of its potential impact on the library community, this statement is worth exam-ining. Excluding in-library visits from usage reports is a common but little-debatedrecommendation. Besides accesses from Internet workstations, reference desk, help deskand other service point workstations also record user-generated accesses of a library’ssite. Staff use behind the scenes may be the more expendable measure, but even here anargument can be made in favor of recording these statistics, at least in separate reports.Staff may utilize the library’s website to provide services to users or to perform jobsthat provide a service. Libraries may want to reconsider their site missions in the lightof any role allocated for staff use. This point is in need of discussion within the librarycommunity. There is also the practical issue of excluding IP ranges within a libraryfrom a report if consecutive assignments have not been made or if assignments arechanged over time. Finally, it is difficult to agree with the E-Metrics contention that thestatistic of virtual visits will help a library compare its resource demands with that ofinformation services such as Excite and Lycos. The latter are meant to attract generalaudiences, as opposed to the more specialized and limited draw of most library websites.In any case, popular information services do not necessarily release reliable figures oftheir use.

In their article on developing benchmarks of library website use, Christy Hightowerand colleagues recommend creating separate reports of page requests generated by on-campus and off-campus users. Their recommendation is a good one, and is endorsed inthis paper, but their rationale contains only a partial truth: that separate counts areuseful because library sites are designed and marketed with local or remote target au-diences in mind.27 This assumes that the target audience is the same as the actual audi-ence. Such an assumption overlooks the reality of the Web as a networked worldwidephenomenon. It is not unusual for users from around the world to visit a site created fora local community. Such resources as subject gateways and instructional modules canattract users regardless of location and affiliation.

3.2cohen 4/29/03, 12:26 PM324

Laura B. Cohen 325

This article will conclude with Part 2 [Log File Analysis] in portal: Libraries and theAcademy 3, 3 (July 2003).

Laura B. Cohen is the Network Services Librarian/Webmaster at the State University of NewYork, Albany; she may be contacted via e-mail at: [email protected].

Notes

1. For useful examples, see: Wonsik Shim, Charles R. McClure, Bruce T. Fraser, John CarloBertot, Arif Dagli and Emily H. Leahy, Measures and Statistics for Research Library NetworkedServices: Procedures and Issues, ARL E-Metrics Phase II Report (Washington, DC: Associationof Research Libraries, 2001). Available: <http://www.arl.org/stats/newmeas/emetrics/phasetwo.pdf> [February 24, 2003]; Christy Hightower, Julie Sih and Adam Tilghman,“Recommendations for Benchmarking Web Site Usage Among Academic Libraries,”College & Research Libraries 59 (January 1998): 61–79; Joanne Ren, Mickey Zemon and KenRodriguez, “A Model for Monitoring Use of Your Library’s Web Site,” College &Undergraduate Libraries 7 (2000): 1–9.

2. John Carlo Bertot, Charles R. McClure, William E. Moen and Jeffrey Rubin, “Web UsageStatistics: Measurement Issues and Analytical Techniques,” Government InformationQuarterly 14, 4 (October 1997): 378–379.

3. For useful examples, see Thomas Dowling, “Lies, Damned Lies, and Web Logs,” LibraryJournal Net Connect Supplement (Spring 2001): 34–35; T. van der Geest, “Evaluating a WebSite with Server Data,” Document Design 1 (1999): 131–132; Kathleen Bauer, “Who GoesThere? Measuring Library Web Site Usage,” Online 24 (January/February 2000): 25–31.

4. van der Geest.5. Web Trends Analysis Suite Advanced Edition v. 7.0 (Portland, OR: NetIQ Corporation,

2002).6. For a useful description of these files, see Rick Stout, Web Site Stats: Tracking Hits and

Analyzing Traffic (Berkeley, CA: Osborne/McGraw-Hill, 1997), 13–49.7. Bertot et al., 375.8. Ibid.; Hightower et al., 73.9. Hightower et al., 71.

10. A 30-minute timeout period is recommended by Shim et al. in the ARL E-Metrics Phase IIReport, 67.

11. Stout, 75.12. While some libraries require authentication to access public workstations, this is usually a

workstation management issue rather than a practice of restricting access to a site.13. Dowling, 34.14. Ibid.15. van der Geest, 131.16. Dowling, 28.17. van der Geest, 131.18. Stout, 65–66.19. Dowling, 34.20. WebTrends Enterprise Suite (Portland, OR: Web Trends Corporation, 1999): 101.21. If a digital collection features a search capability, the ARL E-Metrics Phase II Report regards

these measures as an overcount of searches that should be discarded. See Shim et al., 76.22. Hightower et al., 71–72.23. Jakob Nielson, “Search and You May Find” (Nielson’s Alertbox, July 15, 1997). Available:

<http://www.useit.com/alertbox/9707b.html>[February 24, 2003]; Susan Augustine andCourtney Greene, “Discovering How Students Search a Library Web Site: A Usability CaseStudy,” College & Research Libraries 63 (July 2002): 354–365.

3.2cohen 4/29/03, 12:26 PM325

A Two-Tiered Model for Analyzing Library website Usage Statistics, Part 1326

24. Tova Stabin and Irene Owen, “Gathering Usage Statistics at an Environmental HealthLibrary Web Site,” Computers in Libraries 17 (1997): 34; Roswitha Poll, “PerformanceMeasures for Library Networked Services and Resources,” The Electronic Library 19 (2001):312; Bertot et al., 394; Ren et al., 5.

25. Shim et al., 66.26. Ibid.27. Hightower et al., 72.

3.2cohen 4/29/03, 12:26 PM326