improving database vendors' usage statistics reporting through collaboration between libraries

Improving Database Vendors’ Usage Statistics Reporting 499

499

Improving Database Vendors’ UsageStatistics Reporting throughCollaboration between Librariesand Vendors

Wonsik Shim and Charles R. McClure

Wonsik Shim is an Assistant Professor in the School of Information Studies at Florida State University; e-mail: [email protected]. Charles R. McClure is the Francis Eppes Professor and Director of the Informa-tion Use Management and Policy Institute in the School of Information Studies at Florida State Univer-sity; e-mail: [email protected].

The article reports the results from the Association of Research Libraries(ARL) E-Metrics study to investigate issues associated with the usage sta-tistics provided by database vendors. The ARL E-Metrics study was a con-certed effort by twenty-four ARL libraries to develop and test statistics andmeasures in order to describe electronic resources and services in ARLlibraries. This article describes a series of activities and investigations thatincluded a meeting with major database vendors and the field-testing ofusage statistics from eight major vendors to evaluate the degree to whichthe reports are useful for library decision-making. Overall, the usage statis-tics from the vendors studied are easy to obtain and process. However, thestandardization of key usage statistics and reporting format is critical. Vali-dation of reported statistics also remains a critical issue. This article offers aset of recommendations for libraries and calls for continuous collaborationbetween libraries and major database vendors.

he move to a networked envi-ronment has significantly in-creased the range of servicesand resources that the library

provides its users. The library has becomea twenty-four-hour-a-day access point toinformation services where users obtainservices and resources on their terms andwhen they want such services. Often us-ers do not enter the library physically nordo they interact directly with library staff.The costs of providing these networkedservices and resources can be significant.As a result, library managers are seeking

ways to measure the use of these digitalservices and resources.

One of the results of the networkedinformation provision environment isthat libraries increasingly depend on ex-ternal providers of academic and schol-arly information content and services.Recent statistics estimate that in 2000–2001, research libraries spent, on average,16.25 percent of their materials budget onelectronic resources, a sharp increase froma mere 3.6 percent in 1992–1993.1 This in-formation has traditionally existed in thelibrary as subscription print journals,

500 College & Research Libraries November 2002

print indexes and abstracts, books, andso on. However, there is a big differencein terms of ownership and control be-tween traditional information contentsand digital information contents.

With physical media, the libraryowned the objects and controlled theiruse. For example, the library catalog—beit card catalog or online catalog—repre-sented what the library owned and couldmake available to its users. But with elec-tronic media, the library is only one ofmany access points to the informationresources. As a result, the library hasmuch less control over use.2 The librarycatalog now includes many pointers toexternal information sources that, in somecases, may no longer exist when the usertries to access them.

Figure 1 depicts a simplified view ofthe differences between the traditionallibrary environment and the networkedlibrary environment characterized by theInternet as the primary information de-livery medium and the growing presenceof external electronic information re-sources and services in the library.

FIGURE 1Changed Library Environment3

In the traditional library, most librarymaterials were housed in a physical li-brary building, and users typicallyneeded to come to the library to use itsmaterials and services.4 Availability wasan important concern because of thephysical characteristics of the materials.In the networked library, however, librarymaterials and services increasingly resideoutside the physical library building. Li-braries now depend, in large measure, onthe publishers of electronic journals (e.g.,Elsevier’s Science Direct and AcademicPress’s IDEAL), electronic contentaggregators (e.g., Ebsco and Gale), andother electronic information providers tomeet user demands for resources and ser-vices.5 Availability has become less an is-sue in the networked library environmentbecause the electronic medium allowsseveral people to use the same materialat the same time.6

On an experiential basis, many aca-demic librarians describe the use of theirnetworked information services withterms such as “exponential growth” or“we can’t keep up with demand.” At thesame time, a number of academic librar-ies have seen stagnant or declining sta-tistics of traditional indicators of libraryservice, such as turnstile counts, in-housereference transactions, and circulation.

… the provision of usage statisticsby electronic content providers isproblematic at best.


Librarians need reliable and accurate sta-tistics that will allow them to make goodresource allocation decisions (e.g., cost-benefit analysis, contract negotiation, jus-tification of expenditure), meet user needs(e.g., identifying barriers of access, under-standing user behaviors), and developstrategic plans (e.g., user education, peercomparison) for the development andoperation of electronic services and re-sources.

Although some progress has beenmade over the past several years, mostnotably, the guidelines produced by theInternational Coalition of Library Consor-tia (ICOLC), the provision of usage sta-tistics by electronic content providers isproblematic at best.7

This article focuses on the problem ofacquiring and using the statistics pro-vided by external, fee-based electroniccontent providers and describes the workdone in the Association of Research Li-braries (ARL) E-Metrics Project to stan-dardize the usage statistics and promotedialogue among database vendors and re-search libraries.8

Previous WorkThe growing presence of electronic infor-mation resources and networked servicesprompted interest and research in devel-oping statistics and measures to describethe emerging information environment.The most relevant work is a manual pub-lished by the ALA in 2001.9 Written byJohn Carlo Bertot, Charles R. McClure,and Joe Ryan, the work is based on De-veloping Public Library Statistics andPerformance Measures for the Net-worked Environment, a research projectfunded by the Institute of Museum andLibrary Services (IMLS). Intended prima-rily for public library managers, themanual not only contains step-by-stepprocedures to collect some key usage sta-tistics but also provides a set of issues thatlibrary administrators need to consider incollecting and using those statistics. Manyof the proposed statistics can be easilytransferred into an academic library set-ting.

In an article published in 2000, CarolTenopir and Eleanor Read offered an ex-ample of cross-institutional analysis ofdatabase use.10 Using data from a vendorfor fifty-seven academic institutions,Tenopir and Read found that regardlessof the type of academic library, user de-mands are concentrated on a fairly pre-dictable span of time—“early in the week,at midday, in the month when term pa-pers are due.” The authors also concludedthat, compared with other electronic me-dia such as chat rooms and generalInternet resources, students underutilizeelectronic library databases. At the indi-vidual institution level, Deborah D. Blecic,Joan B. Fiscella, and Stephen E. WiberleyJr. identified ways that libraries can usevendor-supplied usage statistics to under-stand the scope of use.11 Sally A. Rogersalso has provided a good comparison ofprint and e-journal usage at Ohio StateUniversity.12 Recognizing the need forways to compare patron usage of elec-tronic and print materials, Kathleen Bauerproposed the use of indexes to combinemultiple usage indicators of both elec-tronic and print resources.13

A recent compilation by McClure andBertot has provided an overview of awide array of issues surrounding theevaluation of networked information ser-vices in several different contexts, includ-ing usage statistics from database ven-dors.14 As mentioned earlier, the ICOLCguidelines are widely recognized as thede facto standard regarding usage statis-tics supplied by database vendors.

Finally, the discussion of usage statis-tics of database vendors is not completewithout mentioning two very active mail-ing lists that deal with the topic: LibraryLicense ([email protected]),hosted by the Yale University Library; andthe Electronic Resources in Libraries list([email protected]), hostedby the Binghamton University Library.Although these mailing lists do not covervendor statistics exclusively, there havebeen a considerable number of postingsand threads on both regarding the topic.These mailing lists also have been used


as a catalyst to formulate the librarycommunity’s response to major chal-lenges from database vendors.

The current work focuses on issues re-lated to acquiring, processing, and usingvendor usage statistics at research librar-ies under the ARL E-Metrics Project. It isimportant to point out that the E-MetricsProject is one of many initiatives that areworking toward establishing standard-ized, comparable statistics for electroniccontents and services. However, the ARLE-Metrics Project is unique in that it is acooperative effort among a large numberof research libraries and that it seeks theparticipation of major database vendorsin attempting to find solutions.

ARL E-Metrics ProjectUsage statistics in the context of electronicsubscription–based databases mainly re-fer to the indicators of the volume of useraccess to the electronic resources and ser-vices available from database vendors.Examples of those indicators are a countof sessions in a specific database, the timeper session in a specific database, thecount of searches in a specific database,and the count of full-text downloads pertime period per database. In addition,usage statistics can show a variety of in-formation, including success or failure ofuser access (e.g., turn-aways per timeperiod per specific database), user accessmethods (e.g., telnet versus browsers),access levels of one institution comparedagainst peer institutions, cost of access(e.g., cost per downloaded item), andother items pertaining to user behaviors.

According to a survey conducted withthe participants of the February 2000 ARLProject Planning Session on Usage Mea-sures for Electronic Information Re-sources, held in Scottsdale, Arizona, thefollowing problems are associated withusage reports from database vendors: 15

• Reports do not provide detailedinformation about usage. For example,many vendors did not provide usage fig-ures by journal or database title.

• Reports are inconsistent. For ex-ample, vendors use their own terminolo-

gies and do not provide adequate expla-nations to understand the reported sta-tistics.

• Reports are not comparable. Be-cause usage reports come in different for-mats and contain different statistics, it isimpossible to compile accurate statisticswithin the library and to compare withother libraries.

However, the biggest problem withusage reports is that many vendors sim-ply do not provide any data at all.

The ARL E-Metrics Project was a con-certed effort by selected members of theresearch library community to investigatevarious issues and problems related tocollecting and using data on electronicmaterials and services. The project, whichbegan in April 2000 and finished in De-cember 2001, was funded by a group oftwenty-four ARL libraries. Figure 2 iden-tifies the project’s participants.

One of the aims of the E-Metrics Projectwas to engage in a collaborative effort withselected database vendors to establish anongoing means of producing selected de-scriptive statistics on database use, users,and services. A complete project descrip-tion, project reports, and the data collectionmanual are available at the ARL E-MetricsProject site at http://www.arl.org/stats/newmeas/emetrics/index.html.

The E-Metrics Project should beviewed in the context of a number of re-lated initiatives, both national and inter-national, that are under way to assist li-braries in assessing their networked re-sources and services. Although these ini-tiatives take different approaches, focuson different types of libraries, and workwithin various operating environments,they all focus on developing library elec-tronic statistics and performance mea-sures. These efforts include:

• International Coalition of LibraryConsortia (ICOLC): Since the mid-1990s, this international coalition of li-braries—predominantly academic—has been working toward a standardset of definitions for subscriptiononline contents. It published the firstguidelines in November 1998 (see


FIGURE 2ARL E-Metrics Project Participants

University of AlbertaAuburn UniversityUniversity of ConnecticutUniversity of Illinois-ChicagoUniversity of Maryland-College ParkUniversity of Nebraska-LincolnUniversity of PennsylvaniaUniversity of PittsburghUniversity of Southern CaliforniaVirginia Polytechnic Institute and State

UniversityUniversity of Wisconsin-MadisonLibrary of Congress

Arizona State UniversityUniversity of ChicagoCornell UniversityUniversity of ManitobaUniversity of MassachusettsUniversity of Notre DamePennsylvania State UniversityPurdue UniversityTexas A&M UniversityUniversity of Western OntarioYale UniversityNew York Public Library, The Research

Libraries

http://www.library.yale.edu/consor-tia/webstats.html) and a revised ver-sion in December 2001 (see http://www.library.yale.edu/consortia/2001webstats.htm).

• National Information Standards Orga-nization (NISO): NISO is updating itsZ39.7—Library Statistics Standard to in-clude network services and resources sta-tistics and performance measures. Thedraft standard was completed in 2002 (seehttp://www.niso.org/emetrics/current/complete.html).

• National Commission on Libraries andInformation Science (NCLIS): Over the years,NCLIS has continued its work in standard-izing online database usage statistics andreporting mechanisms. This project largelyfocuses on the public library environment(see http://www.nclis.gov).

• Institute of Museum and Library Ser-vices (IMLS): IMLS sponsored a project todevelop national network statistics andperformance measures for public librar-ies. The project resulted in a network sta-tistics manual for public libraries.16

• Project COUNTER (Counting OnlineUsage of Networked Electronic Resources):COUNTER is supported by a group ofpublishers, library associations, and otherlibrary-related national bodies whose pri-mary aim is to formulate an internationalcode of practice (COD) governing the re-cording and reporting of usage statistics.

The release of the first COD is expectedby the end of 2002 (see http://projectcounter.org).

• National Clearinghouse for Library andInformation Center Networked Statistics:Proposed by Charles R. McClure and hisassociates at the Information Use Manage-ment and Policy Institute, Florida StateUniversity, establishment of the clearing-house will facilitate the sharing and dis-semination of primary data, tools, edu-cation, and research regarding statisticsof networked resources and services (seehttp://www.ii.fsu.edu).

One important issue regarding these ini-tiatives is the extent to which the initiativesand organizations coordinate with one an-other. For a host of reasons, including ven-dor cooperation, library reporting require-ments, and library management needs,more coordination and cooperation is nec-essary throughout these projects. The au-thors are involved in a number of projectsmentioned above and, to the extent pos-sible, will cooperate with other groups.

ARL Meeting with Database VendorsA meeting with a select group of largedatabase vendors occurred on March 14,2001, in conjunction with the ACRL an-

It also appears that different vendorsuse different counting mechanisms.


FIGURE 3Database Vendors

Attending the ARL MeetingElsevier/ScienceDirectnetLibraryOCLC/FirstSearchJSTORPro QuestOvidLexis-NexisGale GroupEBSCO

nual meeting in Denver. The goal of thismeeting was to engage the community ofvendors, publishers, and libraries inbuilding consensus for reporting data onthe use of vendor databases and to pro-mote an understanding of what can andcannot be done vis-à-vis the provision ofdata from the vendor community. Themeeting served as a discussion forum for:

• sharing information about the de-velopment and standardization of se-lected statistics that describe users anduses of databases;

• reaching agreement on the impor-tant data elements and definitions;

• engaging vendors in a test of dataelements being designed;

• understanding the issues that affectvendor-supplied statistics describing da-tabase use and users;

• developing a process so that the li-brary community and the vendor com-munity can work together in developingand standardizing a core set of statistics.

A total of nine vendors attended themeeting, as shown in figure 3.

During the meeting, both the vendorand the library representatives agreedthat the reported statistics should bebased on the ICOLC guidelines. It wasnoted that the market is increasingly di-versified in terms of business models,content provided by vendors, and otherfactors. Accordingly, developing a stan-dardized set of statistics that cover all ofthese will continue to be a challenge.

Everyone agreed that technologies andtechnology changes have a lot to do withwhat and how statistics can be collectedand reported. For instance, Z39.50 clientsdo not allow statistics to be collected. So-lutions, such as digital certificates, also aretechnology based. However, in mostcases, the costs of buying and implement-ing these technologies may outweigh anyattempt to justify their use to producemore reliable and detailed data.

It also appears that different vendors usedifferent counting mechanisms. As a result,the compiled statistics have limited reliabil-ity and validity. Additional investigationinto these and related questions is needed.

Overall, the meeting was very usefulin that it brought libraries and vendorstogether and established a dialogue.17 Themeeting also was a necessary first step forthe upcoming field-testing of proposedstatistics developed by the E-Metricsstudy team. As a result of the meeting, allof the vendors present agreed to partici-pate in the vendor statistics field-testing.

Vendor Statistics Field-testingThe primary goal of the field-testing wasto assess usage statistics from major da-tabase vendors in terms of comparabilityof statistics and their definitions, break-down of data, and report formats.

MethodologiesInvitations were sent to several vendors,including those that participated in theARL meeting. All the vendors contacted,twelve in all, agreed to participate in thefield-testing. The invitation explained thegoals and objectives of the field-testingand provided a brief summary of ex-pected deliverables from each participat-ing vendor.

A set of field-testing guidelines wasdeveloped and an electronic copy distrib-uted to the vendors. In addition, projectparticipants (libraries) were contactedand their participation in the field-test-ing was solicited. Because not all field-testing libraries subscribed to all of theservices, three or four vendors were as-signed to each library based on its sub-scription matrix. The intent was to allevi-


ate the burden on the libraries of evaluat-ing too many vendor reports. In addition,from the standpoint of vendors, it seemedto make sense to concentrate on a few li-braries rather than all of the libraries sub-scribing to their services.

The guidelines asked specifically forfour deliverables from each vendor:

1. a monthly report (April 2001) in astandardized text format (specific guide-lines were given for data elements andtheir arrangement);

2. a detailed, step-by-step descriptionof the process used to collect the statis-tics, including the rules and assumptionsapplied in the process;

3. a monthly (April 2001) raw data logfile;

4. issues and suggestions related toproviding usage statistics.

The vendors were asked to send thefield-testing data to their assigned librar-ies and to the authors at Florida StateUniversity by the last week of May 2001.A separate evaluation questionnaire wasdeveloped and distributed to the field-testing libraries.

Field-test FindingsVendor statistics change constantly andcan therefore be considered a moving tar-get. The information presented here is forillustration purposes only and may notcorrectly reflect the current practices andofferings of the database vendors men-tioned in this report.

A total of eight vendorsparticipated in the field-testing. Table 1 shows thedata formats in which thefield test reports were pro-vided by the vendors andthe availability of documen-tation received from thevendors with regard to thedefinitions of the statisticsprovided and informationon how data were collected,filtered, and aggregated.

The majority of vendorsinvestigated provided us-age reports in a text format

as well as other formats. Compared withthe results from the vendor statisticsanalysis during the E-Metrics Project, theevidence indicates that vendors havemade good efforts, especially in the areaof making documentation available.18

Many vendors simply did not have anydocumentation about usage statistics atall when the authors initially analyzedtheir reports. However, many of the ven-dors’ documentation did not provideenough details concerning the definitionsof reported statistics to aid in an under-standing of those statistics.

Table 2 shows key ICOLC statistics in-cluded in each vendor’s field-testing re-port. It is important to understand thatno attempt has been made to validatecompliance with the ICOLC guidelines.Aside from the ICOLC guidelines, thereare many instances where the same sta-tistics from different vendors are notequal measures. An obvious example ishow vendors apply time-out parametersto compute session counts. Vendor docu-mentation indicated a wide range of time-outs (e.g., Gale, 6 minutes; Ebsco, 10 min-utes; and Science Direct, 30 minutes).

A more serious case results from thefact that similar vendors use differentmethodologies to count the same useractivity. As a result, even the most seem-ingly simple statistics, such as searchesand items requested, might not be dulycompared. Is a search to multiple data-base packages (as in the case of Gale or

TABLE 1Vendor Statistics Field-testing Participation

VendorsAcademic PressPro QuestEbscoGale GroupLexis-NexisNetLibraryScience DirectSilverPlatter

Data Formattxt, ExcelExcel, txt, PDFtxtcsvzip (csv), Word, txtzip (txt), csvtxtcsv

Availability ofDocumentation

n.a.YesYesYesYesYesYesn.a.

n.a.: Not available from the vendor during the field-testing.


Ebsco) counted as a single search or as aseparate search for each database chosen?Is browsing a secondary database such asauthor, subject, or journal list counted asa search or a menu selection? Does thevendor take into consideration multiplerequests for the same document in a shorttime period (say, less than 10 seconds) andtreat them as one request or multiple re-quests? Is clicking the next button to re-trieve the next set of results counted asseparate search? The list of questions goeson and on. The answers to all of thesequestions can significantly inflate or de-flate the reported usage counts. Further-more, what happens if a vendor changesits counting methodology and does notdisclose it?

There is widespread suspicion amonglibrarians that even the identically la-beled statistics are counted differently. Aclose examination of vendor documen-tation provided seems to suggest that thesuspicion is not unfounded. Indeed, theanswers to the above-mentioned ques-tions differ among vendors. Another im-portant problem has to do with the fact

that most vendors do not provide de-tailed information to libraries, making itdifficult for librarians to determinewhether two comparable statistics fromtwo different vendors refer to the samething and can be compared accordingly.All of these issues seriously underminethe usefulness of usage statistics andthreaten the validity of data. (The issueof validity is addressed later in this ar-ticle.)

As a result of the fact that the types ofcontent available through vendors areincreasingly diverse and the terms refer-ring to information items have not beenfully standardized, a cross-comparison ofthe items-requested statistic can be diffi-cult. For example, netLibrary, which isgaining increased presence in researchlibraries, does not lend itself easily to thekinds of statistics with which we are nowfamiliar. This presents a challenge if li-braries try to aggregate the total numberof items accessed for cross-vendor com-parison or to gauge the total amount ofinformation transfer from licensed mate-rials available at their institutions.

TABLE 2Key ICOLC Statistics Included in the Vendor Reports (by vendor)

VendorsAcademic Press/IDEALPro QuestEbscoGale Group

Lexis-NexisNetLibrary

Science DirectSilverPlatter

Items RequestedFull text, reference, abstract,table of contentsFull text, abstract, citationFull text, abstractFull text, citation, abstract, hits,views, print stationFull text, document retrievalsPage view, browse, checkout,dictionary useFull text, abstractFull text, abstract

SearchesYes

YesYesYes

YesYes

YesYes

SessionsYes

NoYesYes

NoYes

YesYes

Turn-awaysn/a

n/an/aYes

n/aYes

n/aYes

n/a: Not applicable


The turn-away statistic has been use-ful in determining whether to incrementthe number of simultaneous user licenses.However, the statistic applies only tothose vendors that have such a restriction.Table 2 shows that out of the eight ven-dors, only three have simultaneous userlimits and all three report the turn-awaymeasure.

Table 3 shows a breakdown of reportedstatistics according to the ICOLC-recom-mended categories. It also lists otherbreakdown categories that the vendorsreported. It appears that vendors, in gen-eral, satisfied the title-level (journal, da-tabase, or book) breakdown requirement.The IP (Internet protocol) breakdown re-quirement also was being generally re-spected. But in all cases, the statistics werelumped at the subnet (a group of IP ad-dress block) level rather than at the indi-vidual IP address level. The tabulationmight not have been included in sum-mary statistics anyway because it can bemade available in log files. Unfortunately,most vendors were unable to furnish log

data files because of technical and legalconcerns. Half the vendors currently pro-vide some time-related breakdowns.

Libraries’ Evaluation of VendorReportsOverall, libraries reported that the datafiles were easy to read and process. Themajority of libraries used Microsoft Ex-cel to import and display data files. In onecase, a vendor sent part of the data filesin pdf format, which forced the recipientlibraries to enter the numbers manually.The results show that libraries would pre-fer data formats, notably, text formats thatcan be easily imported into data analysisprograms such as Excel and Lotus 1-2-3without having to spend extra time andeffort to manipulate or enter the data.

Although all participating libraries atleast opened the data files, only a few at-tempted to analyze the data. Thereseemed to be several reasons why librar-ies were hesitant about in-depth analysisof data. One library commented that it didnot test the data because they were the

TABLE 3Breakdown of Statistics in the Vendor Reports (by vendor)

VendorsAcademic Press/IDEALPro QuestEbscoGale GroupLexis-NexisNetLibraryScience Direct

SilverPlatter

By Journal orDatabase Title

Journal title

Database title, journal titleDatabase titleDatabase title, journal titleDatabase titleBook titleJournal title

Database title

IPYes

YesYesNoYesYesYes

No

Time/DayNo

TimeNoTime, DayTime, DayTime, DayNo

No

Other

Client IDGroup andprofile ID

Subscribedversus non-subscribedPeak time andduration


summary data and not the raw data thelibrary expected from the field-testing.The following comment from another li-brary also explains why libraries have notdone further analysis: “We currently placeraw vendor statistics on our staff intranetand do not compile them for comparisonpurposes, as we have yet to define whatstatistics and what format would best suitour institutional needs for such a compi-lation.”

At least one library reported specifi-cally how it processed the field-testingdata. For each vendor analyzed, the li-brary compared the session counts fromthe library redirect page (all requests toexternal vendor databases pass througha Web page that counts how many timesdifferent databases are accessed) and thevendor report. This produced, for eachdatabase, a rough idea of what portion ofattempted log-ins (sessions) originatedfrom people who bypass the library da-tabase Web page. The library also calcu-lated the estimated cost per article viewedand the distribution of articles viewed bytitle, which confirmed that 25 percent ofthe titles account for 80 percent of articlesviewed for the particular database.

The field-testing instructions providedguidelines in terms of essential data ele-ments, data arrangement, and file format.Contrary to the authors’ expectations, allof the vendors simply repackaged theirmonthly usage reports and submittedthem to the libraries. Therefore, the onlypractical difference between the field-test-ing report and the report that libraries ac-cessed from the vendor Web site in a nor-mal situation was that libraries receivedthe data files directly from the vendorsinstead of retrieving them from the ven-dor Web sites. Several libraries appreci-ated the fact that they could receive data

files in text format, which is much easierto handle than, say, HTML format. An-other minor difference was the availabil-ity of data definitions and statistics col-lection processes from some of the par-ticipating vendors. In some cases, this wasthe first time that explanations were avail-able to the libraries. Typically, documentsthat contain definitions of statistics andother background information, if they areavailable, are provided on the vendors’Web sites.

Even when the sets of data were avail-able from vendors, it was difficult for thelibraries to do valid comparisons of thedata because of insufficient descriptionsof data definitions and limited explana-tion of how the data sets were collectedand summarized. Many libraries fearedthat, without explanatory information onwhat each data element in the vendor re-ports meant and how the counts were fil-tered, such a comparison would havebeen faulty at best. This suggests that untilthere is a satisfactory degree of assurancethat the statistics provided by the differ-ent vendors—based on the documenta-tion they provide—are consistent enoughfor cross-comparison, libraries will notcommit major resources in an attempt tocompile vendor data into a standardizedformat or repository.

Another problem with comparing datafrom multiple vendors was the inconsis-tent data formats. The task of combiningdata fields and adjusting data arrangementfrom even three or four vendors provedto be extremely time-consuming. What li-braries want is a standardized usage re-port containing common data elementsand arranged in a predetermined, agreed-upon order that is provided separatelyfrom vendor-specific data elements or ad-ditional data. Even the different placementof field headings, in a column or in a row,requires special handling by the libraries.

The majority of respondents said thatthe data provided by these vendors are“necessary and valuable.” They liked thefact that the data are “very straightfor-ward and easy to use” and, more impor-tant, that they provide some indication of

The market for electronic contentproviders is becoming more diverseand complicated, and the types ofstatistics that best serve libraries inthis changing environment need tobe considered.


the extent to which subscription-basedservices are being utilized. Of course, therelative value depends on the quality ofdata and the importance of the databaseto the library (e.g., the amount of moneythe library spends for a particular data-base as compared with other databasesto which they subscribe).

Although the majority of libraries be-lieved that the usage reports provided bythe individual vendors are useful, somequestioned the cumulative value of usagereports combined across vendors. Giventhe fact that typical ARL libraries dealwith several dozen database vendors,normalizing the data, in the currentforms, from these vendors will require aprohibitive amount of effort.

Usage reports deal almost exclusivelywith the specific use of vendor databasesin terms of frequencies (e.g., searches andsessions), duration (e.g., connection time),and amount of information transfer (e.g.,items requested) while largely ignoringanother dimension that many librariesconsider very important—informationabout user behavior. The current usagemetrics provide information about userbehavior to a degree, but not at the levelmany libraries would hope.

To be useful, information about userbehavior will need to be correlated withindividual user profiles. But the currentenvironment for database access, whichis heavily rooted on IP-based authentica-tion, does not permit the kind of data col-lection that libraries expect. Althoughthere is a desire to receive more detailedinformation about user behaviors, it con-flicts with the current practices and thelibraries’ concern about user privacy.

Optimally, vendors would provide anoption that allows libraries to access rawdata log files that have sufficient infor-mation for useful analysis and standard-ized definitions, and that are collectedconsistently over time. Unfortunately,many vendors were unable to provide logdata files because of technical, legal, orother concerns.

Because the field-testing dealt withonly one month’s data (April 2001), it is

difficult to know if what was collected istypical. However, the authors have notheard from the field-testing libraries ofany unusual discrepancy between thefield-testing data and data they receivedbefore the field-testing. The authors real-ize that just comparing data from thesame vendors will not provide a satisfac-tory answer to collecting accurate, reli-able, and standardized data.

During the course of writing this re-port, the authors came across an e-mailmessage from a major database vendoracknowledging errors in its usage reports.This suggests that libraries are not in agood position to know what exactly goesinto the vendor reports. Some unusualnumbers or patterns are relatively easyto identify, but consistent under- orovercounts are harder to detect.

The authors believe that the data pro-vided from the vendors studied are easyto obtain and manipulate. Most vendorsoffer several data formats, including textformat (e.g., comma-separated file) andspreadsheet format (e.g., MS Excel), inaddition to standard HTML format foreasy viewing in Web browsers. Also,many vendors offer an ad hoc report gen-eration facility whereby libraries can cus-tomize the fields and set desired timeperiods they want to examine. However,processing vendor reports from multiplevendors may become a burden on librar-ies in terms of time and staff efforts be-cause the formats and data arrangementsvary considerably from vendor to vendor.

Dealing with vendor usage reportsraises a number of other issues. The mar-ket for electronic content providers is be-coming more diverse and complicated,and the types of statistics that best servelibraries in this changing environmentneed to be considered. Companies suchas netLibrary did not even exist when theICOLC guidelines were first drafted. Arelated issue is the effect of mega-merg-ers taking place in the electronic contentproviders’ market and how these merg-ers will affect statistical reporting.

For the most part, libraries have reliedon the ICOLC guidelines as the de facto


standard for usage statistics for licensedmaterials. Indeed, the guidelines broughtthe issue of usage statistics to full view formany practicing librarians and databasevendors. Although most vendors includedin the study claimed a high level of compli-ance with the guidelines, some librariansremain skeptical, citing the differences inthe way statistics are collected by differentvendors (e.g., different time-outs) and thelack of concrete documentation. The ICOLCguidelines are concerned mainly with de-fining basic usage statistics and do not con-tain detailed information that can be usedto validate whether the vendor reports ad-here to the standard. In addition, the librarycommunity may have different opinionsabout how statistics should be counted.What level of specificity are we pursuingin the standardized reports? And who isgoing to ensure that a vendor report meetsthe accepted standard?

The validity of usage statistics is a criti-cal issue that needs to be addressed seri-ously. First, there should be more detailedinformation to analyze the validity of re-ported statistics from database vendors.The current documentation, albeit im-proved, is simply not adequate. In this re-gard, Project Counter is an important ini-tiative because it attempts to define anagreed-upon code of practices. All relatedparties need to work together to draw upthe specifics of the practices to dispel thepersistent suspicion that even the samestatistics are counted differently. For thisto happen, libraries, publishers, andaggregators need to continue a healthydialogue regarding their expectations.Vendors need to be more forthcoming inthe discussion and better describe whatthey do and how they do it in usage re-porting. Because practitioners themselvessometimes do not agree on what is valid,the library community needs to deter-mine what a valid metric is. Establish-ment of the National Clearinghouse forLibrary and Information Center Net-worked Statistics described earlier canhelp formulate consensus among practi-tioners. Finally, an external validationservice or organization can be considered

as a part of the solution. The validatingservice then would enforce compliance toindustry standards and monitor actualuse. The authors mention this simply asa possibility in the long term and suggestthat it be thoroughly examined beforebeing put forward for implementation.

This study has not dealt with issues re-lated to usage reporting in consortial ar-rangements. As those are becoming verycommon in research libraries, librarianswill need to make sure that the individualmembers involved in the consortia re-ceive the same level of usage statistics fortheir institutions as in individual site-li-censing agreements.

Usage statistics currently provided byvendors give useful information regard-ing the utilization of external subscrip-tion-based information services. Librar-ies use the data for a variety of purposes:usage trends over time, justification forexpenditures, cost analysis, modificationof service provision, and so on. Relatedto the issue of the value of the data is thetrustworthiness (reliability) of the data.And, as discussed earlier, there also issome concern about the lack of user-re-lated information in usage statistics.

RecommendationsBased on the findings of this study, theauthors make several suggestions thatmay be useful for ARL libraries (and per-haps other libraries) to consider in deal-ing specifically with vendor statistics, in-cluding:

• Focusing data analysis on high-impactdatabases: Libraries should not treat all da-tabases equally when it comes to dataanalysis. Because of inconsistencies indata elements and report delivery, it isdifficult to normalize usage statistics fromall vendors who report data. Instead, li-braries need to investigate the usage pat-terns of “major” databases, whateverthose might be locally, and ways that im-provements can be made in terms of ac-cess and use of materials.

• Collecting locally obtainable data forexternal databases: Although libraries needto depend on database vendors for usage


statistics, they have several ways (e.g.,through redirect page counters for li-censed databases or through proxy serverlogs) to capture at least partial informa-tion on user access to the external data-bases (e.g., attempted log-ins). This kindof internal data helps libraries spot-checkthe reliability of vendor-supplied usagestatistics. Moreover, because the data willbe under the control of libraries, they willbe more consistent than measures re-ported by different vendors.

• Keeping track of aggregate key statis-tics and use them: Libraries often findthemselves in need of gross figures of useraccess to external licensed databases forvarious internal and external reporting.The aggregate numbers are good indica-tors of overall trends in user demand for,and access to, external databases. It isimportant to keep some level of consis-tency in the way the gross figures are cal-culated and reported. One way to main-tain consistency is to gather data from thesame pool of database vendors or data-base titles over a specified period of time(e.g., Total number of searches conductedin existing licensed databases grew by20% in 2000 to 1,200,000 as compared tothe 1999 total of 1,000,000 searches. Thedata are based on the same thirty-fivevendors that report the statistic.).

• Validating reliability: The librarycommunity needs to consider concreteways (e.g., third-party validation) to en-sure consistent and reliable reportingfrom vendors.

• Demanding documentation: Librariesshould demand better documentation ofthe data collection and filtering processfrom the various vendors. Such documen-tation should describe how the sets ofdata are collected and defined, and dis-cuss any issues or concerns in the report-ing of these data to libraries.

• Organizing the library for data collec-tion, analysis, and working with the vendors:Many libraries simply lack adequate staff,or the staff members lack adequateknowledge and training, to work effec-tively with the statistics and informationthat some of the vendors can supply. Li-

brary staff need to have an understand-ing of the statistics and to know how tomanipulate the files and how to organizeand report such data. In addition, the li-brary needs to be able to commit organi-zational resources to working with andusing such vendor statistics.

The use of different system parameters(e.g., time-out), the application of differentassumptions about user behavior (e.g., howto treat or count multiple clicks on the samedocument within a session), and the lack

of adequate explanation in vendor docu-mentation regarding specific definitionsand data collection and filtering processesall contribute to the reporting problem. Thecomprehensive standardization of usagestatistics and data delivery methods (e.g.,file format and data arrangement) cannotbe easily achieved in the short term. Theseare long-term goals toward which vendorsand libraries need to work together. TheARL community should continue to makeprogress in this area by working amongthemselves and with the database vendorcommunity. Therefore, the authors recom-mend that comparisons be limited to datafrom the same vendors or data that areknown to be collected, defined, and re-ported similarly.

The authors strongly recommend thatvendors report standardized usage statis-tics, such as those recommended by theICOLC and those defined in the finalmanual that resulted from the project.19

These should appear in the standardizedcolumn and row arrangements and in-clude a separate report that contains anyadditional vendor-specific data.

Continuing the MomentumARL libraries have needed consistent,comparable, easy-to-use, and useful us-age statistics from content providers (da-

The fact of the matter is that both thelibrary community and the vendorcommunity have much to learn interms of understanding how best todefine, collect, analyze, report, andvalidate such statistics.


tabase vendors) ever since they embracedthe notion of maintaining statistics on theuse of external licensed materials. TheARL E-Metrics Project provided an op-portunity for the ARL community to lookat the issues and problems related to ven-dor usage reporting in a more systematicway and to begin working toward devel-oping more useful reports. However,much more work remains in this area.

Members of the study team found thatsome library staff had little knowledgeabout the vendor statistics, had limitedtraining in being able to manipulate andanalyze the reported data, and were quitesurprised that such evaluation and ma-nipulation of data required special train-ing and knowledge. Some libraries werenot organized for ongoing data collectionand analysis of vendor statistics: it wasunclear who was responsible for such ef-forts and whether resources were avail-able to support these efforts. And finally,most libraries simply had no manage-ment information system (even in themost basic sense of the word) for orga-nizing, analyzing, and reporting suchdata. The study team found that, in gen-eral, libraries were not prepared to com-mit the necessary resources, staff time,training, and effort into the evaluation.20

Thus, one difficulty in some of the dis-cussions with the vendors was a lack ofknowledge and skills on the part of the li-brarians in using and analyzing the data.The fact of the matter is that both the librarycommunity and the vendor communityhave much to learn in terms of understand-ing how best to define, collect, analyze, re-port, and validate such statistics. For theirpart, many libraries simply do not have aculture of evaluation that supports the as-sessment effort needed to use vendor-basedstatistics successfully.21 Organizational de-velopment, staff training, and strategicplanning for exploiting such data on thepart of libraries will be key components inmoving forward in this area.

Several organizations in library andvendor communities, national and inter-national bodies, are currently working inthis area. Although these initiatives do not

overlap exactly in terms of goals andscopes, there is a danger that they mayresult in conflicting reporting require-ments. Specific ways to coordinate andencourage cooperation have yet to be de-veloped. Indeed, the number and rangeof organizations interested in developingstandardized statistics is significant.

From the vendors’ point of view, it isimpossible to respond to multiple andconflicting requests for data from the li-brary community. As one vendor com-mented, until the library community candecide how best it wants the data defined,collected, validated, and reported, ven-dors cannot provide endless responses tousers in terms of offering “customized”data sets. Thus, to some degree, the mem-bers of the library community must con-tinue to work among themselves to reachsuch an agreement regarding these stan-dards.

In addition, different types of libraries(academic, school, public, special, etc.)need to reconsider the degree to whichthey think their needs are unique to theirparticular settings. A “full-text down-load” is not going to vary across type oflibrary. Those librarians who argue thatthey have unique or special data needssimply support the view of some vendorsthat it is impossible to provide multipledata types, defined differently, for differ-ent libraries. And little progress will bemade on standardizing these statistics.The members of the library communitymust work together in the developmentof such standards, definitions, and report-ing.

Both vendors and the library commu-nity need to realize that the development,testing, refinement, and standardization ofvendor-based statistics is an ongoing pro-cess. Given the changes in technology, da-tabase structures, and other factors, the lifespan of the statistics may be short (com-pared with more traditional library statis-tics). Thus, being able to have longitudi-nal data from vendors may be difficult andthere will be a need to be much more prag-matic as to the availability and use of ven-dor-based statistics. Establishment of the


National Clearinghouse for Library andInformation Center Networked Statistics(http://www.ii.fsu.edu) at Florida StateUniversity’s Information Use Manage-ment and Policy Institute will play a coor-dinating role in the collection, use, andanalysis of network data sources includ-ing, but not limited to, database vendorstatistics. The clearinghouse will facilitatethe cross-fertilization of the various effortsthus far to build on each other and inte-grate activities for meaningful library as-sessment in support of decision makingand analysis.

One important accomplishment dur-ing the project was the initiation of con-versations and cooperation with majordatabase vendors. Currently, library lead-ership in this area is diffuse and lacks co-ordination. Work needs to continue, es-pecially in the standardization of key us-age statistics, data delivery, and betterdocumentation of definitions and report-ing procedures. An ongoing, more for-malized mechanism is essential to ensurethat such meetings take place, progressis made, and better standards for vendor-based statistics are developed.

Notes

1. Association of Research Libraries, ARL Supplementary Statistics 2000-2001 (ARL, Washing-ton, D. C., 2002). Available online from: <http://www.arl.org/stats/sup/sup01.pdf>.

2. It is quite possible that the users may not even realize that the library evaluated the elec-tronic sources and negotiated the licensing contracts. Most electronic databases validate legiti-mate user log-ins by examining the origination IP (Internet protocol) addresses included in therequests. Users can bypass the library Web site when they access external electronic databases aslong as they use the computers carrying the legitimate IP addresses. For remote users usingInternet service providers (ISPs), they may need to set up a proxy server that allows them toaccess electronic databases. The proxy information allows them to use the institution’s IP ad-dresses.

3. The arrows denote movement of information contents and location of user access. In thedepiction of the traditional library, materials reside in library premises and users come to thelibrary to use the collection. On the other hand, in the networked library, not all of the librarycollection resides within the library’s physical boundaries. Also, part of user access occurs out-side the library, as in the case of access to most subscription-based, electronic-licensed materials.

4. These include off-site storage facilities libraries use to reduce the cost of warehousing lessfrequently requested materials.

5. The authors are not suggesting here that only external, licensed materials are of concernto scholars and libraries. They acknowledge that other freely available or community-based ser-vices are being heavily used by academic users.

6. There can be a set of arbitrary limitations such as simultaneous log-in limits, but they donot originate from the characteristics of the electronic format.

7. International Coalition of Library Consortia, “Guidelines for Statistical Measures of Us-age of Web-Based Information Resources” (ICOLC, 2001). [revised December 2001]. Availableonline from http://www.library.yale.edu/consortia/2001webstats.htm.

8. There is a large and diverse number of electronic content providers in the market, and it isdifficult to describe them collectively. The term “database vendors” is used in this article todenote various content providers such as traditional journal publishers providing electronic coun-terparts, aggregators of full-text journals and reference databases (e.g., JSTOR, Project Muse,Ebsco, Gale, ProQuest), electronic book providers (e.g., Questia, netLibrary), and so on.

9. John Carlo Bertot, Charles R. McClure, and Joe Ryan, Statistics and Performance Measuresfor Public Library Networked Services (Chicago: ALA, 2001).

10. Carol Tenopir and Eleanor Read, “Patterns of Use and Usage Factors for Online Databasesin Academic Libraries,” College & Research Libraries 61 (May 2000): 234–46.

11. Deborah D. Blecic, Joan B. Fiscella, and Stephen E. Wiberley Jr., “The Measurement of Useof Web-based Information Resources: An Early Look at Vendor-supplied Data,” College & Re-search Libraries 62 (Sept. 2001): 434–53.

12. Sally A. Rogers, “Electronic Journal Usage at Ohio State University,” College & ResearchLibraries 62 (Jan. 2001): 25–34.

13. Kathleen Bauer, “Indexes as Tools for Measuring Usage of Print and Electronic Resources,”College & Research Libraries 62 (Jan. 2001): 36–42.

14. Charles R. McClure and John Carlo Bertot, ed., Evaluating Networked Information Services:Techniques, Policy, and Issues (Medford, N.J.: American Society for Information Science and Tech-


nology, 2001).15. The meeting served as the planning session for the project that later became known as the

ARL E-Metrics Project. Thirty-five ARL institutions were represented at the meeting. A surveyquestionnaire was sent out before the meeting, and twenty-one libraries responded. The surveycontained four open-ended questions on the data needs for electronic information resources, thestatus of data collections at participating libraries, and the expectation for the meeting. The sum-mary presentation during the meeting (in the Microsoft Powerpoint format) is available onlinefrom http://www.arl.org/stats/newmeas/scottsdale/jeffshim/.

16. Bertot, McClure, and Ryan, Statistics and Performance Measures for Public Library NetworkedServices.

17. Additional meetings between vendors and other members of the library community re-garding statistics have occurred at ALA midwinter 2001 and 2002. These meetings also haveattempted to better coordinate and validate vendor-based statistics that are being reported tolibraries. The National Commission on Libraries and Information Science has sponsored thesemeetings and summaries, as well as other related reports, and is available online from http://www.nclis.gov/statsurv/statsurv.html.

18. An interim Phase I report describing current practices of participating ARL member li-braries related to network statistics and measures was issued November 7, 2000, and is availableonline from http://www.arl.org/stats/newmeas/emetrics/index.html.

19. Wonsik Shim, Charles R. McClure, Bruce T. Fraser, and John Carlo Bertot, Data CollectionManual for Academic and Research Library Network Statistics and Performance Measures (Tallahassee,Fla.: Information Use Management and Policy Institute (Dec. 2001); also available from the ARLonline from http://www.arl.org/stats/newmeas/emetrics/index.html.

20. As part of the E-Metrics Project, the study team produced three PowerPoint presentationsabout preparing the organization for evaluation, the importance of evaluation and use of vendorstatistics, and an overview/introduction to the recommended statistics. These presentations areavailable from the ARL, in Washington, D.C. and from http://www.arl.org/stats/newmeas/emetrics/index.html.

21. Amos Lakos, “The Missing Ingredient: Culture of Assessment in Libraries,” PerformanceMeasurement and Metrics: The International Journal for Library and Information Services 1 (1): 3–7.

improving database vendors' usage statistics reporting through collaboration between libraries

Documents