discovery systems as an alternative to stand-alone databases... · gathers metadata harvested from...

54
Kandidatnummer 602 ________________________________ Discovery Systems as an Alternative to Stand-Alone Databases The Example of Oria at BI Norwegian Business School Bacheloroppgave 2016 Bachelorstudium i bibliotek- og informasjonsvitenskap Høgskolen i Oslo og Akershus, Institutt for arkiv-, bibliotek- og informasjonsfag

Upload: others

Post on 01-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

Kandidatnummer 602

________________________________

Discovery Systems as an Alternative

to Stand-Alone Databases

The Example of Oria at BI Norwegian Business School

Bacheloroppgave 2016

Bachelorstudium i bibliotek- og informasjonsvitenskap

Høgskolen i Oslo og Akershus, Institutt for arkiv-, bibliotek- og informasjonsfag

Page 2: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

2

Abstract

Discovery systems consist of an intuitive search interface and a single central index that

gathers metadata harvested from various sources. They offer a unified search environment

that allows library users to search at once through all the content to which their library

subscribes, and more. As such, discovery services have a great potential as an alternative to

stand-alone databases. This potential is what this paper has been trying to evaluate by using

the version of Primo Ex Libris (Oria) implemented by BI Norwegian Business School.

Due to the size and coverage of Primo’s central index, users risk to be faced with more hits

than they can handle, or with “off-topic” items. For systematic reviews or other types of

searches by advanced users, native databases may be more adapted, due to additional search

features and a field-specific coverage.

But the central index and searching capabilities in Oria must not be underestimated. Field

searching proved to be quite efficient, despite different indexing standards and metadata

inconsistencies from one content provider to another. Provided that one knows the limits of

Oria and makes use of the available features of its advanced interface, it can turn to be a great

starting point for exploratory searches.

Høgskolen i Oslo og Akershus, Institutt for arkiv-, bibliotek- og informasjonsfag

Oslo 2016

Page 3: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

3

Acknowledgments

I would like to sincerely thank Kristin Askildsen, Anita Bergsvenkerud and Michael

Preminger for providing guidance and counselling during the whole process, since Autumn

2015, and for being understanding when my bachelor thesis was delayed due to medical

reasons.

I also want to thank all the librarians and employees at BI who patiently answered my many

questions while working there as a library assistant. I’ve really enjoyed my time there, which

is also why I decided to include BI in my thesis, and chose a topic that would give me

valuable experience to work in an academic library.

Finally, thanks to my friends for their help and support, both with the thesis and in general

the last semester.

Page 4: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

4

Table of Contents

Abstract ............................................................................................................................. 2

Acknowledgments ............................................................................................................. 3

Table of Contents .............................................................................................................. 4

List of Figures ................................................................................................................ 5

List of Tables ................................................................................................................. 5

1 - Introduction .................................................................................................................. 6

2 - Some Definitions .......................................................................................................... 7

3 - Literature Review ......................................................................................................... 9

Characteristics of Discovery Systems ............................................................................ 9

Processes Involved in the Constitution of the Central Index ......................................... 11

Metadata ...................................................................................................................... 12

Agreements ................................................................................................................. 14

Relevance Ranking ...................................................................................................... 16

Discovery Services vs. Standalone Databases ............................................................ 18

4 - Presentation of the Databases ................................................................................... 20

5 - Research Strategy and Description of the Experiment................................................ 22

Search Types & Search Queries .................................................................................. 22

Additional Search Filters .............................................................................................. 25

The Data Collection Process ........................................................................................ 25

6 - Results and Data Analysis .......................................................................................... 28

Number of Co-References ........................................................................................... 28

Scope of PCI ................................................................................................................ 29

Record Sources ........................................................................................................... 30

Coverage of the Four Databases in PCI ....................................................................... 33

Content Delivery .......................................................................................................... 34

Keywords in the PNX ................................................................................................... 36

Field Weighting ............................................................................................................ 39

Metadata ...................................................................................................................... 40

Recap .......................................................................................................................... 43

7 - Future Research ......................................................................................................... 45

8 - Conclusion ................................................................................................................. 46

References ...................................................................................................................... 47

Personal Communication................................................................................................. 50

Appendices ..................................................................................................................... 51

Appendix I - Search Parameters .................................................................................. 51

Appendix II - Search Logs ............................................................................................ 52

Appendix III - Duplicate Records and Co-References .................................................. 52

Appendix IV - Items retrieved by the three searches .................................................... 52

Appendix V - Publication Dates for Items Retrieved with the First Boolean Search ...... 53

Appendix VI - Record Sources for Items Retrieved during Week 12............................. 54

Page 5: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

5

List of Figures

Figure 1 - Architecture of a discovery system ........................................................................ 7

Figure 2 - Content in the central index (Hoeppner, 2012, p. 9). ........................................... 10

Figure 3 - The "Boolean search" in Oria .............................................................................. 23

Figure 4 - Resource type facet ............................................................................................ 25

Figure 5 - "Article"-filter in the search interface .................................................................... 25

Figure 6 - Folders in one of the EndNote libraries ............................................................... 26

Figure 7 - Access to the databases hosting the articles ....................................................... 30

Figure 8 - Metadata representing the article ........................................................................ 30

Figure 9 - CrossRef as a content provider ........................................................................... 31

Figure 10 - Record with SAGE Publications as a record source .......................................... 32

Figure 11 - SD and Emerald in the Activation Wizard .......................................................... 34

Figure 12 - ABI in the Activation Wizard .............................................................................. 34

Figure 13 - Link order in the "View it"-section ...................................................................... 35

Figure 14 – Changing the scope of documents included in the result list ............................ 35

Figure 15 - Record in BSC without any of the keywords it was retrieved with ...................... 36

Figure 16 - Record from Web of Science in the user interface ............................................ 37

Figure 17 - Display section of the same record's PNX ......................................................... 37

Figure 18 - Search section of the same record's PNX ......................................................... 37

Figure 19 - Representing record in the FRBR group ........................................................... 38

Figure 20 - Record from SD in Oria ..................................................................................... 39

Figure 21 - Original record in SD ......................................................................................... 39

Figure 22 - Weight of the different fields .............................................................................. 40

Figure 23 - Title variations for the same article .................................................................... 41

Figure 24 - Record as viewed in Oria .................................................................................. 41

Figure 25 - Same record in EndNote ................................................................................... 41

Figure 26 - Author names in the different databases (EndNote) .......................................... 42

Figure 27 - Name variations in Oria ..................................................................................... 42

Figure 28 - Searching for the author "Doug Williamson" ...................................................... 43

List of Tables

Table 1 - Number of hits retrieved for each search in Oria .................................................. 28

Table 2 - Number of duplicate records in the three result lists in Oria* ............................... 41

Page 6: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

6

1 - Introduction

Discovery systems are libraries’ latest response to an evolution that has taken place the last

decades. The increasing use of the Internet and especially of the Google search engine as a

first stop to find information, has shaped users’ search behavior and expectations as to what a

search interface should look like (Vaughan, 2011, p. 6). At the same time the nature of the

academic library collection itself has evolved dramatically. Online catalogs were once the

primary search interface, but a gateway to printed collections only (Breeding, 2010, p. 6).

They became less and less adapted, as the proportion of electronic resources libraries

subscribe to – through external databases – increased drastically (p. 7).

Discovery service technology, though not new, was first applied to library environments from

around the late 2000s, as a way to address both issues. Discovery services centralize, in a

single index, metadata records of billions of items, gathered from numerous databases. This

index can be searched seamlessly at the end-user side through an intuitive interface. The very

same interface returns a single result list organized in a relevance-ranked order, and provides

further access to the full text article when possible (Vaughan, 2011, p. 6).

In 2013, BIBSYS Consortium, that represents and provides services for research and

academic libraries in Norway, decided to implement the discovery service Primo Ex Libris

(Ex Libris, March 19th, 2013), which was renamed “Oria”. One of its members is the library

at BI Norwegian Business School (BI) (BIBSYS, n.d.b).

I conducted several searches in BI’s version of Oria in order to explore the characteristics of

Primo Ex Libris and of discovery systems in general. I performed the same searches in four

stand-alone databases that are subscribed to by BI. I collected and analysed the items in the

results lists returned by Oria, also by comparing them to the items collected from the four

databases. The main focus here was to understand the results returned by Oria, based on what

is known of how discovery services work and “behave”.

Ultimately this paper aims at answering the following question: to what extent can the

discovery system Oria be used as an alternative to stand-alone databases BI-library

subscribes to?

Page 7: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

7

2 - Some Definitions

A discovery system (DS), or discovery service, is composed of an intuitive search interface

and a central index. The central index includes pre-harvested content from multiple sources,

which the end user can simultaneously search through thanks to the front-end interface. The

end-user interface returns and displays results according to a relevance algorithm. Accessing

the retrieved e-books or electronic articles, and in some cases only viewing the records,

requires that the library subscribes to this particular content through agreements with the

different content providers (Hoeppner, 2012, p. 7). This technology differs from the

“federated search”, that sends queries to the native databases before returning a single result

list (Narayanan & Mukundan, 2013, p. 2).

Figure 1 - Architecture of a discovery system

Aggregators per definition bring "together in a coherent collection disparate information

sources". Aggregators may provide licensed full-text content to libraries, but not necessarily

(Galyani Moghaddam & Moballeghi, 2007, p. 222).

Page 8: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

8

A bibliographic or metadata record consists of the metadata elements describing and

representing a particular document in a library or database environment (UO Libraries, 2013).

The use of the word “record” instead of “hit” or “item” in the paper is not random as it refers

specifically to information contained in the metadata element or to the representation of a

document over the document itself.

In a bibliographic environment, a duplicate record refers to two or more records

representing the same document (Sitas & Kapidakis, 2008, p. 287). In this paper it refers to

duplicate records found in the same result list in Oria, within the limit of the results I

collected. For documents represented in both Oria and at least one of the four databases I

used the term co-reference.

Page 9: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

9

3 - Literature Review

Discovery service technology, though not new, was first applied to library environments

around the late 2000s, as library online catalogs were evolving into so-called “next-

generation” library catalogs (Vaughan, 2011, p. 6). Four main discovery tools emerged

between 2007 and 2010: OCLC’s WorldCat Local, Summon by Serials Solutions, EBSCO

Discovery Services (EDS) and Primo by Ex Libris (Breeding, 2014, p. 13). As discovery

services, they all share characteristics that have been defined by several authors.

Characteristics of Discovery Systems

Vaughan (2011) describes DS as a “service capable of searching across a vast range of pre-

harvested and indexed content quickly and seamlessly”, contrary to federated search that

sends queries to various remote databases (p. 5-6). This definition focuses largely on one

important trait of discovery services, which is the content. This content is searchable within

one unified central index, whose size is a crucial factor for libraries when choosing a DS: Ex

Libris’ unified index – Primo Central Index (PCI) – contains “hundreds of millions of

scholarly e-resources of global and regional importance” (Hoeppner, 2012, p. 8). Content

coverage is equally important, especially the comprehensiveness of pre-harvested commercial

content such as academic journals (Narayanan & Mukundan, 2013, p. 2).

This content is mainly provided by publishers through their publication platform – for

example Elsevier’s ScienceDirect and Thomson/Reuters’ Web of Science – and by

aggregators, such as ProQuest, and their databases. Free scholarly materials are also included

in the central index (Renaville, 2015, p. 6). The index can also contain local resources,

imported from the library’s collections. The central index of the DS called Summon looks

very much like the drawing in Figure 2 on the next page, where all content is blended into the

unified index (Vaughan, 2011, p. 23). Primo however, harvests library collections into a

local index. Results from both this index and PCI are blended together in the front-end

interface (p. 40).

Another essential component of discovery services is the interface on the end-user side, also

called the discovery layer. This interface is a single entry point for both searching and

displaying the content (Hoeppner, 2012, p. 8). This applies also for the delivery of the

Page 10: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

10

content, as records may be directly linked to the full-text articles, if the library subscribes to

this particular content (Narayanan & Mukundan, 2013, p. 5; Vaughan, 2011, p. 6).

The most recognizable features of DS are the Google-like single search box, although an

advanced search option is available as well, the relevancy ranking of the results, and faceted

browsing as a way of narrowing the search (Breeding, 2014, p. 9; Vaughan, 2011, p. 7). DS

interfaces are often described as “modern”, including “modern interface conventions”

(Breeding, 2014, p. 8) such as “design elements expected by today’s students”. This includes

the possibility of saving records in a “shopping cart” and the displaying of book covers

(Narayanan & Mukundan, 2013, p. 5-6). This does not influence which content is retrieved

and how it is displayed, but contributes to making DS intuitive and user-friendly (p. 3).

Flexibility is also a characteristic of DS, which are more “open” in comparison to traditional

library systems: the content in the central index is remotely hosted and maintained by the

vendor. Because DS are not tied to a specific library management system, they can be

customized by each library to fit the user’s needs (Sadeh, 2011, p. 12; Vaughan, 2011, p. 7).

The front end in Primo can be configured and customized by libraries in the administration

interface “Primo Back Office” (Ex Libris Knowledge Center, n.d.a).

Figure 2 - Content in the central index (Hoeppner, 2012, p. 9).

Page 11: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

11

This description of DS characteristics distinguishes between the content and the discovery

layer. Narayanan and Mukundan (2013) also identify content as being one of two essential

components, the other one being the technology that allows the user to find relevant content.

This technology includes the searching and retrieving functionalities, a relevancy-algorithm,

as well as other aspects that come into play at the end-user side of the discovery service. It

involves the processes of harvesting and normalizing the data into the central index database.

These processes play a role in the constitution of the index, prior to any search (p. 3).

Processes Involved in the Constitution of the Central Index

Harvesting means that data is gathered using protocols such as OAI-PMH and FTP, with a

frequency that varies depending on content sources (Hoeppner, 2012, p. 9). Data is harvested

periodically from multiple sources (p. 7). As a result, a single item can originate from various

databases, using different metadata standards.

For this reason, the harvested data has to go through a normalization process, based on a

schema developed by each DS vendor. Data harvested by and into Primo is mapped and

converted to an XML file called “Primo Normalized XML” or PNX. The data is organized

into different sections, and might be duplicated from one section to another (Ex Libris,

2016c, p. 4). Some of these sections are the display, search, and delivery sections. Each of

them has a specific purpose. The display section includes the content that is viewable in the

user interface (p. 5). The search section is the part of the PNX that is indexed for search

purposes. The data is broken down to various subfields that enable searching with a particular

field – for example title or subject field – in the user interface (p. 17). It is possible to display

an item’s PNX by adding the following parameter at the end of its URL: &showPnx=true (Ex

Libris Knowledge Center, n.d.c).

In Primo, the normalization step is followed by one or both of the following processes:

- the de-duplication process: the records are matched against each other. If matching

records are found, a merged record is created. This new record only is then indexed

and loaded into the central index, and will be used for display. The merged record is

based on a preferred record, which is chosen according to the number of fields in the

display section, and the “delivery category” (Ex Libris, 2016c, p. 99). Vaughan

(2011) notes that the record displayed is the publisher’s, if available (p. 40), but this

Page 12: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

12

may be customized as libraries can choose which of the available records should be

displayed (Ex Libris, n.d.b, slide 18).

- the FRBRization process: the harvested records go through a FRBR-grouping process.

Records are assigned a “FRBR-group1 ID” based on common fields, for example title

and author metadata. This is a way of linking the displayed record to all the records in

the FRBR group. Only one record, representing the members of the group, is

displayed in the user interface. This record is either one of the members in the group,

depending on a search query, or a generic record gathering information elements from

all records in the group (Ex Libris, 2016c, p. 118). All members are indexed and

searchable (Ex Libris Knowledge Center, n.d.d).

Both processes may be run one after the other, leaving fewer records to be “FRBRized” as

the rules for de-duplication are stricter. Both processes are deactivated in Oria for BIBSYS

consortium members’ local resources. However, de-duplication and FRBRization processes

are run directly by Ex Libris for records in PCI (Asbjørn Risan, email, June 22nd, 2016), such

as the records retrieved and examined for this paper.

The success of this preprocessing of information plays a role in how many records are shown

in the result list for one single article, which is a good example of how the discovery layer

interacts with the central index. However, this success, and the final result on the end user

side, depend a great deal on the metadata supplied to the DS vendor by all parties.

Metadata

As early as 2010, in a book directed among others to libraries willing to choose and

implement a discovery service, Marshall Breeding (2010) advocated for an effort “towards

clean and consistent metadata across all the resources that will be addressed by the discovery

interface” (p. 111). One way to reach this goal is for libraries to use authority records and

metadata standards to populate records (p. 112).

1 "FRBR-group" here refers to a group of records in PCI identified as representing the same article,

and not to the three groups of entities conceptualized in IFLA's Functional Requirements for Bibliographic Records (OCLC.org, n.d.).

Page 13: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

13

These resolutions are however easier to enforce at a library than at a central index level. The

amount of inconsistencies across multiple content sources that are brought together is likely

to be higher; the use of authority control for author names in particular is a challenge

(Calarco, Conrad, Kessler & Vandenburg, 2014, p. 536). Spelling mistakes, wrong dates or

wrong content types are other examples of incorrect metadata. Such errors, especially textual,

can be difficult to identify among millions of records (p. 535).

These issues, as well as insufficient metadata, can have an impact on content discoverability

and delivery. Open access articles being marked as available in full text when they actually

are not, is a problem that was reported by several libraries in Renaville’s paper (2015, p. 13).

In spite of being “the most mature and the most automated” and having “the benefit of the

greatest number of industry standards” journals metadata lacks a standard metadata element

to identify open access articles. The absence of such a standard sometimes prevents PCI from

differentiating between abstract-only and full-text records (Calarco et al., 2014, p. 534). Such

issues regarding content delivery also apply to other article types, when the metadata required

to directly link the record to a full-text article is missing (p. 538).

One of the main issues for DS is the inconsistent application of metadata, from one content

provider to another. As mentioned by Calarco et al. (2014), metadata originate from disparate

sources using different standards and practices. The way author metadata is expressed by the

different providers makes it particularly challenging for DS to normalize data. According to

Rachel Kessler (Calarco et al., 2014) from Ex Libris, the less the fields in the harvested

record are broken down, the more ambiguous it is as to what the first or last name is. This

may be explained by the fact that no universal standard has yet been set for the format in

which metadata should be delivered by the different sources (p. 536).

Different practices regarding content labeling is also an issue: while some providers rightly

state that a given document is an “article” or “book chapter”, others label it as “text”. This

has an impact on the discoverability of resources, that may become invisible, especially if the

user applies a content type filter (Renaville, 2015, p. 9).

One can also speak of metadata quality in term of “depth”. Narayanan and Mukundan (2013)

distinguishes between basic or “thin” metadata, that includes only a few record fields, and

Page 14: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

14

“thick” metadata”. “Thick” metadata may include “additional abstracting and indexing by

dedicated staff, or includes author-supplied subject headings and abstracts” (p. 4).

Other records comprise the full text of a document (Breeding, 2014, p. 13), which is

searchable but not displayed in its entirety in the result list. As a consequence, records can be

retrieved even though query terms do not match the displayed metadata, as long as they

match the full-text content. But this may confuse users and decrease their confidence in the

search tool (Calarco et al., 2014, p. 538-539). Users may or may not be able to access the

article, as some content providers only provide their full-text metadata to make their content

discoverable in the DS, no matter if the library subscribes and have access to this particular

document (Narayanan & Mukundan, 2013, p. 4).

Agreements

The records displayed in the DS depend on the level of content that is indexed, but also on

partnerships created prior to harvesting, between the DS and as many content providers as

possible. The brokered agreement should allow harvesting, indexing and exposing of the

content through the DS (Breeding, 2014, p. 13; Hoeppner, 2012, p. 9; Narayanan &

Mukundan, 2013, p. 3; Vaughan, 2011, p. 9). The interest of entering such an agreement for

providers is that they can increase access and use of their content by library users (Breeding,

2014, p. 13). However, publishers have different policies as to whom can harvest their

content and how much (full-text content or abstract and citation only), and some of them may

have exclusive agreements with some of the DS vendors. As a result, the richness of a record

from a specific publisher might differ from one DS to another (Vaughan, 2011, p. 11).

Agreements between vendors and providers is only one of the conditions for the content to be

viewable by a library user. The second condition is for libraries to subscribe to the content

itself, i.e. to the source databases, in order to access the full-text articles. As a result, full-text

content from a given database will only be accessible through the DS to authenticated users if

both the library and the DS vendor mutually license that database (Breeding, 2014, p. 14;

Hoeppner, 2012, p. 9; Renaville, 2015, p. 6).

In Primo, content has to be activated in PCI’s “Activation Wizard” in order to be searchable

and viewable in the discovery layer. The content that is shown in the result list and the full-

text content accessed through Primo are not necessarily the same. Publications from

Page 15: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

15

providers that did not enter agreements with Ex Libris can be accessed through the DS – by

subscribing libraries – even though these publishers’ metadata cannot be shown in the user

interface. In the same way, publications may be represented by metadata from providers the

library only have citation-level access to, as long as this particular content is activated in the

Activation Wizard (Ex Libris, n.d.a, Activation Wizard).

When a library is member of a consortium, as it is the case with BI, some resources might be

activated by the consortium itself, on behalf of all member institutions. This regards mostly

resources that are relevant to, and subscribed to otherwise by all members (Ex Libris

Knowledge Center, n.d.b)

Libraries can choose whether they only want to display records of documents they have

access to, or show content they do not subscribe to (Hoeppner, 2012, p. 10). In that last case,

users would have access to citation-level – or more – metadata, without the library paying

anything to the publisher (Vaughan, 2011, p. 10).

This might explain why some publishers or aggregators refuse to enter agreements with DS

vendors. This is especially the case when the content is enriched by in-house staff with for

example abstracts and controlled vocabulary. As explained by Breeding (2014), this

additional metadata enhances searching capabilities within the provider’s own product, but is

still of great value in other search environments such as discovery services. As a

consequence, some providers are reluctant to their proprietary content being available to non-

subscribers, as this could lead to a decrease in the use of and interest for their products (p.

14). Aggregators also have little control over where their content might end up in another

search service’s result list. This also might cause a lower access to their content (Kelley,

2012, p. 40).

Some of the subject database providers are also DS vendors. Unsurprisingly they do not want

the whole of their content to appear in their competitors’ central indexes (Hoeppner, 2012, p.

9). Two of the four main DS vendors, EBSCO and ProQuest, are also major producers of

subject indexes. Breeding (2014) and Kelley (2012) reported that these vendors did not fully

cooperate with DS competitors at the time of writing, and that the enriched metadata

available in EBSCOhost databases was not included in PCI and other central indexes (2014,

p. 14; 2012, p. 37). Several libraries were asked in a survey to describe major resources they

did not believe were addressed by the discovery product they were using. EBSCO and

Page 16: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

16

ProQuest were largely mentioned, even though some of them confirmed subscribing to these

two providers’ content (Breeding, 2014, p. 18).

It should be mentioned that ProQuest recently acquired Ex Libris (ProQuest, December 15th,

2015). Whether this has had an impact on the result lists in Primo has yet to be documented.

Relevance Ranking

The relevance ranking of results is a technology offered by all the main discovery services,

and has been made necessary by the tremendous amount of items in central indexes. As

opposed to traditional online library catalogs, that sorted results either chronologically or

alphabetically (Breeding, 2010, p. 16), results in DS are by default sorted according to how

relevant they are thought to be by the system. The relevance is calculated based on how well

a record matches the query terms, with the records matching the most closely appearing first

(Narayanan & Mukundan, 2013, p. 4).

If the search query comprises several words, items that contain as many of the terms as

possible will be considered more relevant, especially if one of them contains the exact same

phrase (Breeding, 2010 p. 16). As a result, documents for which the full text was indexed

might be favored over basic-metadata records. This shows the limitations of a keyword-only

approach (2014, p. 21). Discovery services take other criteria into consideration in their

algorithms, such as scholarly value, disciplinary focus (Vaughan, 2011, p. 9), but also “usage

data as an indicator of popularity or impact factor [and] citation frequency” (Breeding, 2014,

p. 21).

In Primo’s algorithm, some of the factors taken into account are term frequency, currency,

field weighting, how often a document has been accessed, and peer-review status (Vaughan,

2011, p. 40). Other vendors might put emphasis on other factors – for example subject

heading – as each DS provider has developed its own relevancy algorithm. This implies

different results from one DS to another with a similar query (Narayanan & Mukundan, 2013,

p. 4).

These algorithms being proprietary, it can be difficult for client libraries to find out how and

why some materials appear on top of the result list (Kelley, 2012, p. 39). This secrecy is

justified by economic interests but raises concerns as to how unbiased the ordering of results

Page 17: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

17

is. Especially DS vendors who also are content providers might favor their own content in the

ranking of results (Breeding, 2014, p. 16). Several libraries reported that EDS seemed to

prioritize EBSCO resources (p. 22). Vendor neutrality to external content providers may also

be a cause of concern. The formers hope to see their content highly ranked in search results:

“should we not expect [them] to have vested interests in making sure that [happens]?”

(Kelley, 2012, p. 38).

On a general plan, Kelley (2012) notes that the nature of agreements between a DS vendor

and content providers may create unbalanced results as detailed subject indexing allows a

greater deal of precision when it comes to retrieval of relevant documents. The fact that some

items are indexed with citation-level metadata while full text is available for others, can have

consequences for the quality of the result set (p. 36).

This lack of visibility may underserve libraries. But institutions are also given the possibility

by some DS vendors to influence the algorithm, or to boost some items in particular, making

them appear higher on the result list. Libraries using Primo are for example able to define the

weight of some record fields (Vaughan, 2011, p. 40), or make their own content appear

before content from other institutions (Renaville, 2015, p. 4).

Another aspect that may influence the result set at an institution level is social data. Usage

statistics is taken into consideration in some DS algorithms, including Primo’s (Vaughan,

2011, p. 40). The number of times an item has been accessed is a good indicator of how

relevant users at a research institution might think it is (Breeding, 2010, p. 16). The

“document authority boost”, that consists in getting updated information on the number of

clicks, is also something libraries can implement (Ex Libris, n.d.b, slides 10-11). However,

usage data might be more meaningful when gathered across institutions, at the total number

of clicks is likely to be more representative of the research community in general (Sadeh,

2011, p. 16).

The result list may also be personalized, as some algorithms take into account information

about the user. In a DS, this information consists of the person’s discipline and academic

level. This is a way to adjust the result set and retrieve items that, in theory, are more adapted

to the researcher’s information need, in terms of complexity of the documents and subject

(Sadeh, 2011, p. 16).

Page 18: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

18

Namei and Young (2015) advocate for going a step further in the incorporation of user data

into the relevancy algorithm. Not unlike what is done by Google, they encourage vendors to

take into account previous searches performed by the user at an individual level in order to

improve relevancy (p. 529). However, the use of information about individual users brings up

privacy issues, which libraries may be concerned about, hence a limited personalization of

the results (p. 530; Sadeh, 2011, p. 17).

Discovery Services vs. Standalone Databases

Even though DS harvest content from diverse resources, making searching in more than one

place optional, they are not meant to replace stand-alone databases. Sadeh (2011) argues that

DS technology is little flexible compared to the variety of content types and resources it has

to accommodate. As such, it cannot become “the ultimate search entry point for many users”

(p. 15).

As mentioned by Vaughan (2011), metadata may be obtained from a specific search

environment, adapted to a particular content type or discipline, for example health databases.

As a result, the additional value provided by controlled vocabulary and other search features

is lost in the DS. Furthermore, DS do not necessarily comprehend the entirety of collections

owned and subscribed to by the library as additional indexes provided by local library

systems may not be included in the central index (p. 9).

In addition, higher educated users, such as researchers, with information needs within a

specific discipline, are more likely to continue using native interfaces (Breeding, 2014, p. 13).

Calarco et al. came to the same conclusion: being up to date about the latest literature on their

area of expertise, faculty members are very aware of what and if a key article is missing from

the first page of results in a DS. As a consequence, they tend to opt out the library’s DS and

use specific – and updated – databases instead (2014, p. 539).

Rather than a replacement to stand-alone databases, DS serve other purposes and are

especially appealing for “novice” users, with their simplified search environment (Breeding,

2014, p. 13). This is particularly true for searchers with low information literacy, who could

be tempted to find literature on unreliable web pages. They can instead benefit from an

Page 19: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

19

intuitive interface that returns high quality and reliable scholarly content (Vaughan, 2011, p.

8).

Hanneke and O'Brien (2016) compared three DS with the health database

PubMed/MEDLINE. They were surprised to find that all three DS retrieved relevant

literature that was not found otherwise with a precision search in Medline. The queries they

used in the DS were simple keywords that could potentially have been used by “inexpert

users”. The authors concluded that DS were particularly effective for this kind of audience

and for other purposes than systematic literature reviews (p. 115).

Newcomer (2011) stated that, due to their coverage of various fields and disciplines, DS are

very suitable for interdisciplinary content that may not be indexed by subject databases. They

are also great as a starting point for a new search, to gain “a general sense of the information

available” (p. 143). By making this information easily discoverable, DS may also contribute

to making content from stand-alone databases more visible and more used (Vaughan, 2011, p.

8).

Page 20: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

20

4 - Presentation of the Databases

The four databases I compared Oria to are the following: ABI/Inform (from now on ABI),

Business Source Complete (BSC), Emerald Management (Emerald) and ScienceDirect (SD).

They cover subject areas that are relevant for students and researchers at BI, for example

business and management (BI Handelshøyskolen, n.d.c).

ABI and BSC are similar in terms of scope and coverage and each of them reports featuring

thousands of full-text journals, as well as a variety of scholarly and non-scholarly content,

such as reports and case studies (ProQuest, n.d.a; EBSCOhost, n.d.). ABI is a database

product developed by the company ProQuest, while BSC is one of many EBSCOhost

databases, owned by EBSCO Information Services. Both these content providers belong to

the category of full-text aggregators (Galyani Moghaddam & Moballeghi, 2007, p. 222).

Emerald Management, on the other hand, contains only journals managed by the publishing

company of the same name. The database is said to contain articles about both management,

human resources, marketing and economics (BI Handelshøyskolen, n.d.a). But its scope is

necessarily much narrower than ABI’s and BSC’s, as the number of journals in Emerald’s

portfolio culminates at 300 (Emerald Group Publishing, n.d.b).

SD is a platform developed by the publisher and provider of information solution Elsevier.

The database includes specifically journals and books published by Elsevier and contains

over 3000 journals (Elsevier, n.d.). SD differs from the three other databases by the subject

areas it covers. SD is relevant to BI as it includes journals about economics, but it also covers

topics within medicine, science and technology (BI Handelshøyskolen, n.d.c, ScienceDirect).

Common to the four databases is that their content includes articles from scholarly or

academic journals. According to ProQuest (n.d.b), a scholarly journal’s goal is to “report on

or support research needs as well as advance one's knowledge on a topic or theory”. Such

publications are likely to be peer-reviewed by external experts, but not necessarily.

EBSCOhost (n.d.b) uses a similar and as broad definition for “academic journals”, that

includes both content types. Both definitions also agree on the fact that these journals are

Page 21: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

21

mostly aimed at academics and professional researchers (ProQuest, n.d.b, Overview; EBSCO

Support, n.d.b).

This is also the type of audience targeted by Elsevier and Emerald, which means articles

provided through their platforms also qualify as scholarly or academic content (Elsevier, n.d.;

Emerald Group Publishing, n.d.a). As a consequence, content type from one base to another

can be said to be quite homogenous to a certain extent, which facilitates comparison between

them and Oria.

Their basic search features as well are quite similar to Oria’s. The five interfaces support

Boolean search, and search in specific fields such as title, author, and keyword or subject

fields. They all allow searching for exact expressions with the use of the phrase function.

Only ABI and BSC enable searching with controlled vocabulary.

Page 22: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

22

5 - Research Strategy and Description of the Experiment

The aim of this paper is to determine to what extent the DS at BI can be used as a substitute

for databases the library subscribes to. The research laid out in the literature review partly

answered that question, and found that DS can work in such a way that they are a great

substitute to stand-alone databases as a one-stop search environment, at least in some

situations (Breeding, 2014, p. 13; Hanneke & O'Brien, 2016, p. 115; Newcomer, 2011, p.

143).

My initial hypothesis was the following: Because Oria is a DS, a search should return hits in

accordance with what is known of how discovery services work, and provide an alternative to

databases BI subscribes to. The method I have used aimed at testing and verifying this

hypothesis by running different types of searches in Oria. Similar searches were performed in

four stand-alone databases for control purposes.

The main features of the project were discussed with two librarians at BI, Anita

Bergsvenkerud and Kristin Askildsen. We agreed that I would conduct three different article

searches in both Oria and four databases: ABI, BSC, Emerald and SD. The first 50 results

were to be retrieved and examined, no matter the total number of hits. The number retained

for Oria, 200, is equal to the total number of items retrieved in the four databases.

ABI, BSC, Emerald and SD are journal databases BI subscribes to, and their content can be

delivered through Oria. They cover topics within economics (SD), management (Emerald

Management), or both (ABI and BSC) (BI Handelshøyskolen, n.d.c). The three searches were

also supposed to reflect this range of topics. However, many of the economics-related

keywords I tried in Emerald retrieved less than 50 hits, which is why I changed the focus to

management only. Topics and keywords were found by browsing the thesauri in ABI and

BSC.

Search Types & Search Queries

The starting point of my experiment was to run similar searches in different search tools, in

order to, among other things, evaluate the degree of overlap between them. Five articles,

Page 23: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

23

written by Craven et al. (2014), Flatley, Lilla and Widner (2007), Kent (2005), Read and

Smith (2000) and Vinson and Welsh (2014) had the same premise. They observed different

features of the databases by performing various kinds of searches. The formulation of search

queries was also discussed in each publication. I decided to conduct three different searches

combining various aspects from the five articles.

The Boolean and Subject Search

Both Kent (2005) and Read and Smith (2000) performed searches using subject headings. In

each case the terms were adapted according to each database’s thesaurus (2005, p. 33; 2000,

p. 121). Conducting searches based on controlled keywords only is not possible for this paper

as neither Oria, Emerald Management and ScienceDirect have a thesaurus.

However, ABI and BSC do, and the first search query was largely inspired by Read and

Smith (2000), who compared three bases that are available through the same online system

(p. 120). As a result, they were able to search all three of them at the same time, using a

unique search statement. Although the authors stressed that no attempt was made to “include

all possible ways of specifying a topic” (p. 122), they did take into account each database’s

specific topic formulation. The subject heading “Online information retrieval, online

searching” is common to all three databases’ subject lists, but with various formulations in

each of them. The search query for this subject was formulated as follows: “(online or

on(w)line or electronic?(1n)(retriev? or search?)/ti” (p. 121).

Quite similarly, Oria returns results from databases that have different ways of formulating

one topic. For example, “Human Resource Management” is expressed as such in ABI’s

thesaurus, but as “Personnel Management” in BSC’s. One can imagine that other resources

possibly use other labels. Not unlike Read and Smith (2000), I gathered these two

formulations into one search statement. As was done by Vinson and Welsh (2014), I built a

Boolean query and combined it with the use of the “subject” field (p. 119) (Figure 3).

Figure 3 - The "Boolean search" in Oria

Page 24: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

24

With this search strategy, I was hoping to observe Oria’s treatment of the keyword-field data

from stand-alone databases. I also expected ABI and BSC to stand out compared to Emerald

and SD, as I specifically used their controlled keywords.

The Phrase Search

Kent (2005) used a simple keyword search with no specific field, but pointed out that this

strategy provided inefficient and frustrating results (p. 33). Flatley et al. (2007) ran such a

search as well, but limited the number of hits with the use of the phrase function, retrieving

only items with the exact searched expression (p. 50). I also chose this strategy, but used it

along with a term referring to something specific within management: “talent management”

is the practice of attracting, developing and retaining skilled employees (Johns Hopkins

University, n.d.). Choosing a specific term allowed me to further limit the number of hits.

This point was inspired by Craven et al. (2014, p. 57) and reproduced for the title search, but

with one keyword only.

As mentioned before, some items may be retrieved because the search query matches the full

text of the article, rather than the other fields of the metadata record (Calarco et al., 2014, p.

538). It is this phenomenon I had in mind when I chose a search type that would not be

narrowed by any metadata field, leaving the DS free to find “talent management” in any part

of a record.

The Title Search

Read and Smith (2000) called title search a “more consistent” method, as quality in indexing

and abstracting articles can vary from one database to another (p. 121-122). In other words,

not all articles on a specific topic may be retrieved with one subject search, as we have seen

databases use different keywords, while the title words remain the same. Title search is also

one of the various strategies used by Vinson and Welsh (2014) in order to find out to what

extent the different bases overlap with each other (p. 116). Overlap between Oria and other

bases, or co-references, being one of my focuses, this type of search was an obvious choice.

The keyword chosen here, “downsizing”, is a specific term in the context of management and

means reducing the number of employees in a company (Oxford Dictionaries, n.d.).

Page 25: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

25

Additional Search Filters

Several authors chose to narrow their searches to scholarly articles (Kent, 2005, p. 33) and

more specifically to peer-reviewed articles issued from academic journals (Vinson & Welsh,

2014, p. 116). I considered using the same strategy, as a post-search “peer-reviewed” facet

exists in Oria. But this is not the case in Emerald and SD, which is something I had to take

into account in order to retrieve comparable sets of results from each database. However, as

previously mentioned, all bases contain articles from scholarly journals.

When running a search without the content-type filter, Oria distinguishes “articles” from

“newspaper articles” (Figure 4). A test search indicated that the “articles” retrieved originated

from scholarly journals. As a result, “article” as a material type was applied to all three

searches in Oria (Figure 5). The searches in Oria, and in the bases where it was possible, were

also narrowed to articles in English to further limit the results to potentially comparable

items.

While all the queries were formulated the same way in all bases, thanks to similar interfaces

and features, not all of them had the same way of narrowing a search as in Oria. As this paper

focuses on Oria, I adjusted the search parameters in the other databases based on default

parameters or search limitations defined in Oria (see Appendix I).

The Data Collection Process

Each search was conducted from the point of view of a guest user, to eliminate potential

information about the searcher that could have been taken into account by the relevancy

algorithm and influenced the search results (Sadeh, 2011, p. 16). Therefore, this had to be

done at BI, in order to be identified as an authorized institution, which is the condition to gain

access to resources that require subscription for search and display (Hoeppner, 2012, p. 9).

Figure 4 - Resource type facet Figure 5 - "Article"-filter in the search interface

Page 26: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

26

All three searches were run between March 9th and March 16th, 2016 (Weeks 10 & 11) (See

Appendix II for search logs).

All of the hits retrieved for analysis were exported to the reference management software

Endnote, which made it easier to search through and organize the references. I created a new

EndNote library for each search, and a folder for each set of results, by database, in each

library, as in Figure 6. I then identified co-references across

the different folders and duplicate records in Oria, and

exported them to Excel for further comparison and analysis

(see Appendix III). I had to keep track of the rank-order the

records were arranged into in the various search tools by

saving each result list as a PDF file (See Appendix IV for

the complete lists of references in EndNote and as PDF

files). This information had to be added manually to the

Excel files, as well as the record source for each item in

Oria.

The three searches were performed again in Oria up to a week later (Week 12), in order to

check the record source for all 200 highest-ranked items each time, regardless of other

bibliographic information. This is the only information that was checked this time, and the

records were not exported to EndNote again. The title and phrase searches were conducted

off campus, and differ from the results obtained with the previous searches. Web of Science

was for example totally absent, even though it was observed to be a record source the

previous weeks (10 & 11). However, these off-campus searches gave a good picture of the

variety of resources represented, and are therefore included in my research material. Some

individual records also had to be searched and checked again later in the process.

Since results in Oria and Primo also may be influenced by local choices, a look at BI’s

configurations was also necessary. I had a meeting with Askildsen and Bergsvenkerud on

June 14th. Future references to information they provided are from that meeting, unless

otherwise stated. We particularly discussed findings and results I could not explain with

literature on discovery services only, and we had a look at some of the configuration

interfaces used by BI that have a direct influence on the result lists in Oria, such as the PCI

Figure 6 - Folders in one of the EndNote libraries

Page 27: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

27

Activation Wizard. As mentioned in the literature review, this is where the library “activates”

subscribed resources, in order to make their metadata viewable in Oria.

Due to personal reasons, this meeting and the second part of the research project took place

three months after the data collection had started. This has been a challenge as PCI is updated

weekly (Vaughan, 2011, p. 40), making it necessary to collect and analyse data in a limited

period of time. Some elements were impossible to go back to or had clearly changed.

An example of this is the retrieval of 46 items with Web of Science as a record source in

March. When the same references were checked again on June 5th, only seven would still

originate from Web of Science. Bergsvenkerud supposed it had nothing to do with a usual

update of PCI, but was unable to determine the exact cause of the problem.

However, the data collected before the project was brought to a halt was thorough enough,

and is the main basis for the findings presented in the next chapter. Most of the individual

items checked in the later part of the project remained the same, and could be examined again

for metadata, as this had nothing to do with their position in the result lists from March.

Page 28: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

28

6 - Results and Data Analysis

Number of Co-References

I did not use any date filter for the very first Boolean search I conducted, which returned

results with very diverse publication years across the interfaces. Oria retrieved no item older

than 2003, with a concentration of articles from 2011 and 2012. BSC and SD returned articles

published the last couples of years while publications from the last two bases were more

evenly spread, dating back to 1988 for the oldest (See Appendix V). As a consequence, none

of the 200 records in Oria overlapped with any of the four databases’, which was not much of

a surprise considering the fact that the first search returned as many as 76 131 hits.

However, after narrowing the search to items published since 2015, the number of hits

dropped down to 1520. As a result, the three searches have a comparable number of hits in

Oria, as shown in Table 1.

Table 1 - Number of hits retrieved for each search in Oria

Result lists in

Oria

First Boolean

search

Boolean 2015-16 Phrase search Title search

Total number of

hits

76 131 1 520 1 102 2 008

Number of co-

references (out

of 200)

0 13 73 62

Yet only 13 single articles, retrieved from Oria with the Boolean search, were also found in at

least one of the four databases (see “co-references” in table 1). This number reached 73 and

62 for the two other searches, but was not evenly distributed between the bases. For the title

search, only seven out of the 62 co-references – and out of the 200 records from Oria – were

found in ABI, while 29 were from SD.

No logic relations were found as to where co-references were placed in each result list, which

shows that they were ranked in each search interface according to different relevancy

algorithms (co-references’ position in the result list can be checked in Appendix III). In

Page 29: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

29

addition, each search retrieved more than the 200 or 50 records that were to be examined,

which means the degree of overlap between Oria and ABI was potentially higher.

The low number of co-references also illustrates the fact that the databases selected for this

paper were only four among many others, whose metadata is also harvested into PCI.

Scope of PCI

Another example regarding both the size and scope of PCI is the variety of topics among the

articles retrieved with the title search. The keywords used in the three searches were

supposed to reflect subjects within management. This was the case with “downsizing” which,

in this context, means reducing the number of employees in a company. But a more general

definition of downsizing also means making something smaller (Oxford Dictionaries, n.d.).

As a result, this term is used in numerous disciplines. For 65 out of 200 records in Oria, it

could be said, on the basis of the “details” section, that they represented articles about topics

other than management, organizations and job loss. Some of the subject areas they referred to

were biodiversity, energy, chemistry, medicine, engineering, nutrition, pharmacology, but

also aging and household.

These are not disciplines that are taught or researched at BI. BI’s core areas are business,

management, marketing, finance, law, accounting and auditing, which are otherwise reflected

in the results in Oria (BI Handelshøyskolen, n.d.b). As pointed out by Bergsvenkerud on June

14th, 2016, this emphasis is due to the coverage of the databases BI subscribes to, and not to

any particular customization by the library. Individual users can influence the ranking of

some items by choosing “preferred disciplines” when logging into their account for the first

time, or for a specific search session. Bergsvenkerud specified this was not done at an

institutional level at BI, which means that articles about business and management are not

ranked higher in the result list.

As a consequence, articles that fulfill the two conditions – being indexed in PCI and

occurring in a free repository or a database the library subscribes to – appear in the result list

in Oria, regardless of the topic of the article. Some of these databases cover multiple

disciplines. SD for example, includes journals about economics but also medicine, science

and technology (BI Handelshøyskolen, n.d.c, ScienceDirect). This, and the fact that the word

Page 30: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

30

“downsizing” is not specific to management, explains the presence of “off-topic” articles in

the result list in Oria.

In a similar title search run a couple of weeks after the first one, on March 21st, 15 out of the

65 non business-related items were found to have metadata originating from SD. These items

were identified by checking the record source in the “Details” section.

Record Sources

The record source is not the same as

content delivery or access to databases that

contain the full-text articles. As shown in

Figure 7, there can be multiple databases

hosting the article referred to in the record.

But the record that is displayed is based on

metadata from one publisher or database,

referred to as the “record source” (Figure

8).

Figure 7 - Access to the databases hosting the articles

Figure 8 - Metadata representing the article

Page 31: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

31

The range of content providers that may be listed as the record source may be very wide. For

the title search conducted on March 21st, 26 different record sources were identified among

the first 200 items. This search was performed off campus and did not show resources that

require subscription to be searched into, such as Web of Science. As a consequence, this

number could potentially have been higher. Some of the databases and publishers in the list

are part of the same publishing group, such as “Routledge, Taylor & Francis” and “Informa

Taylor & Francis” (Taylor & Francis Group, n.d.). Some others are less obvious and reflect

the intricate network of data providers that have agreements with Ex Libris.

CrossRef is worth naming, as it appears both

for itself, and juxtaposed to other content

providers, as in the example below.

CrossRef is an association representing

publishers of scholarly content. Its mission is

to facilitate linking to scholarly literature

using the DOI technology. DOI stands for

“Digital Object Identifier” and is paired with

a persistent URL in CrossRef bibliographic

records. This system makes it easier to

identify and deliver electronic articles

(crossref.org, 2012). According to an internal

file at BI that documents resources indexed

by Ex Libris and/or subscribed to by BI,

records from CrossRef are available in PCI.

They represent content from primary publishers that are not yet indexed by Ex Libris, even

though the library might have access to the content through the publisher.

The record source in Figure 8 from p. 30, Cengage Learning, is a company that owns the

provider of research resources Gale (Gale Cengage Learning, n.d.). Gale produces numerous

databases such as Academic OneFile and General OneFile (EBSCO Support, n.d.a). The

internal file previously mentioned reveals that both these databases as well as other Gale

products are indexed in PCI. However, BI does not subscribe to any of them.

Figure 9 - CrossRef as a content provider

Page 32: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

32

The records displayed in Oria are created based on a preferred source among all of the

sources indexed into PCI (Ex Libris, 2016c, p. 99), regardless of local subscriptions, as

confirmed by the many Cengage records found throughout the three result lists in Oria. In

other words, even if all of the articles retrieved by an article search in Oria are directly

accessible by BI users, the displayed record can originate from a content provider BI does not

subscribe to, as long as Ex Libris was allowed to show the metadata to non-subscribers

(Hoeppner, 2012, p. 9).

Cengage was found to be the record source for 102 items for the second Boolean search,

which is about half of the hits that were collected and examined for this paper. The number

was even higher – 164 – before the search was narrowed to articles from 2015-2016. The

phrase search conducted during Week 11 retrieved 60 out of 200 items with Cengage as the

record source. When the same search was run off campus five days later (Week 12), this

number went up to 70. Seventy-six Cengage records were retrieved with a title search

performed off campus the same day (record sources for items retrieved during Week 12 can

be consulted online in Appendix VI).

For every search, the proportion of Cengage records was the highest, and higher by far than

the second record source. Askildsen and Bergsvenkerud assumed this was due both to the

great amount of metadata from Cengage indexed in PCI, and its quality. This is a plausible

explanation as the amount of metadata is taken into account when choosing and creating the

merged record (Ex Libris, 2016c, p. 99). I also observed that the record representing an

article, and its source, could change, depending on the search query. The example in Figure

Figure 10 - Record with SAGE Publications as a record source

Page 33: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

33

10 was retrieved with a known-item search; the record source is SAGE Publications. But by

combining the same search with a subject search on “personnel management”, which the

SAGE record does not have as a keyword, the one item that appeared had

MEDLINE/Pubmed as a record source. A look into the PNX confirmed they had the same

FRBR-group ID, which is why they appeared interchangeably and not one after the other in

Oria.

Coverage of the Four Databases in PCI

I had a special look at the record source for co-references. The number of articles represented

in both Oria and at least one of the four databases varied between seven and 29 for the title

search, and 14 and 35 for the phrase search. It quickly became clear that not a single co-

reference had BSC or ABI as the record source. This was further confirmed by the

verification of all 200 records per search in Oria (Appendix VI).

The absence of metadata from BSC in Oria records may be explained by the fact that only

ABI, Emerald and SD are indexed in PCI (Ex Libris, 2016a, p. 3, 54 & 138). Their content is

searchable, contrary to other providers that do not have an agreement with Ex Libris (p. 1).

This is the case with EBSCO’s BSC, whose data is otherwise available at up to 82 % through

alternative providers. Those providers are Scopus and three ProQuest databases, that the

library has to subscribe to and activate, and ECONIS and three Gale databases, whose data is

freely available in PCI (Ex Libris, 2016b, p. 19 & 1). This means that even though the

resources from BSC are still deliverable through Oria (as shown in Figure 8 from p. 30), the

metadata on display will always come from a third party.

BSC does not appear in BI’s Activation Wizard, contrary to the three other bases. To activate

the databases, and making the data available for search and/or delivery, the checkbox has to

be ticked as in Figures 11 and 12. Both Emerald and SD are impossible to uncheck, which is

why the checkbox is grey. This means the databases were activated by BIBSYS on behalf of

the libraries in BIBSYS Consortium, Askildsen and Bergsvenkerud explained. Even then the

local institutions still have to subscribe to such databases, either through direct agreements

with the provider or through CRIStin consortia agreements. Such agreements are negotiated

on behalf of several institutions from the public sector for specific databases. Different

subscriptions are ruled by distinct agreements with different participants (CRIStin, n.d.). SD

Page 34: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

34

is an example of CRIStin-negotiated agreement, while access to Emerald was negotiated

directly by BI, according to the internal file documenting BI’s resources and agreements.

However, access to ABI was neither negotiated through CRIStin, nor activated by BIBSYS,

which means BI is free to switch it on and off. Bergsvenkerud had reasons to believe that

ABI was deactivated in the PCI Activation Wizard when I ran the search in March, due to

access problems to ABI through Oria. The Boolean search was performed again in Oria on

June 17th, first without and then with the date filter. In each case, the record source was

checked for the first 50 hits. Six and then two items with ProQuest as a record source were

found. This never happened for the Boolean searches run in March, nor for any of the other

searches. Since ProQuest is obviously represented in Oria otherwise, this seems to confirm

that ABI was not activated at the time of the first searches. This had probably consequences

on the result lists.

Content Delivery

Direct access to full-text articles is managed in the library management system Alma,

Bergsvenkerud and Askildsen showed me. The links presented under “View it” in Oria are

generated in Alma by a link resolver, whose role is to connect together records on the one

Figure 11 - SD and Emerald in the Activation Wizard

Figure 12 - ABI in the Activation Wizard

Page 35: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

35

hand, and content associated with local resources and library subscriptions on the other hand

(Narayanan & Mukundan, 2013, p. 5). In order to be suggested as one of the alternatives to

access an article under “View it”, a resource has to be activated by the library in the link

resolver.

This has nothing to do with the activation of a resource in the PCI Activation Wizard, that

controls which metadata is shown in Oria, as mentioned before. As a result, records could

have led to ABI even though ProQuest was not activated in the PCI Activation Wizard. The

contrary is also true: none of the two items found with the date-refined search on June 17th

were linked to ABI under “View it”. This is due to the fact that BI is only granted access to

publications prior to, and excluded 2015 (Linnea Lund Jacobsen, email, June 17th, 2016).

For content they have access to,

libraries can also influence how these

resources appear under “View it”. It

is not random that Emerald always

appears first for items linked to both

Emerald Management and

ABI/Inform (Figure 13), or that

Taylor & Francis Group comes

before BSC (Figure 8 from p. 30) in

BI’s Oria. This is due to BI’s configuration of online service order in Alma, that decides

which provider should be linked to first under “View it”. Askildsen and Bergsvenkerud

further explained the criteria motivating the ordering of content providers: publishers

followed by individual databases come first, both due to more stable links and because

eventual embargos are shorter than for database packages. Archives, newspaper and

bibliographic bases come last. Same-category providers are then listed alphabetically.

This list only influences linking to different

databases for each record, and not the order in

which records appear in the result list. However,

activation of content in the link resolver has an

impact on which items are returned by Oria. The

link resolver works here as a filter allowing Oria to retrieve only items BI has full-text access

Figure 13 - Link order in the "View it"-section

Figure 14 – Changing the scope of documents included in the result list

Page 36: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

36

to. Checking the box under “expand my results” (Figure 14) removes this filter and returns a

different set of results (Asbjørn Risan, email, June 20th, 2016). This is an example of how

choices made by libraries at different levels can impact result lists in a DS.

No effort was put into more direct customization at BI, Askildsen and Bergsvenkerud

reported: the reason is that changes implemented at a local level, for example field weighting,

could disappear after each update by BIBSYS.

Keywords in the PNX

The Boolean search, with the use of the subject field, aimed at observing Oria’s treatment of

subject metadata from other bases. The subject terms used in the search query were also used

in at least two different databases, ABI and BSC. However, this search did not turn out as

expected, as only one reference had BSC’s “Personnel Management” as a keyword.

Unsurprisingly, and for the previously explained reasons, this record did not have BSC, but

MEDLINE as a record source. This article is available in, and linked to BSC, through Oria.

But it could not possibly have been retrieved by the Boolean search directly in BSC as neither

“Personnel Management” nor “Human Resource Management” were in the subject terms

there (Figure 15).

A quick search in EndNote revealed that 145 of the exported references had “Human

Resource Management” in their keywords, while two references had other keywords than

those searched for. The remaining references were exported without keywords. Among them,

46 had Web of Science as a record source. All of them did have “Human Resource

Management” as a keyword when checked directly in Web of Science, but this information

was not displayed in Oria, as shown in Figure 16 on the next page. This illustrates the

structure and role of the PNX: while most records often have similar entries in different

Figure 15 - Record in BSC without any of the keywords it was retrieved with

Page 37: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

37

sections of the PNX, the display section attached to Web of Science records shows little

information (see Figure 17). The search section however, happens to be more comprehensive

(see Figure 18). This is also the section being indexed and searched into, which explains why

46 hits with apparently no keywords were retrieved (Ex Libris, 2016c, p. 17).

It has now been established that at least 191 out of 200 items were indexed with the subject

term “Human Resource Management”. Since none of the records originated from ABI, of

which these search terms were taken from, it shows that “Human Resource Management” is

used by other content providers than ProQuest, for example Taylor & Francis and SD (see

record sources for the Boolean search in Appendix VI). As mentioned earlier, SD do not use

controlled vocabulary, contrary to other bases such as ABI. It seems that keywords, that

represent the topic of an article, are handled by the “subject” field in the PNX’s search

section regardless of their nature (Ex Libris, 2016c, p. 20). As a result, the extra value

provided by controlled vocabulary in a given database disappears in Oria.

Figure 16 - Record from Web of Science in the user interface

Figure 17 - Display section of the same record's PNX

Figure 18 - Search section of the same record's PNX

Page 38: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

38

A known-item search, conducted on June 20th to find a specific article, retrieved a record

originating from Nature Publishing Group, as shown in Figure 19. The same search was run

again and combined with one of the subject terms used in ABI for the same article,

“Knowledge Management”. It retrieved first a record from Cengage with the following PNX

field: “<subject>Knowledge Management – Methods</subject>”. A last search with the use

of the phrase function finally retrieved a record from ProQuest, as it was the only one of the

three PNX records with the exact formulation “<subject>Knowledge

Management</subject>”. All three PNX records had the same FRBR-group ID.

This example shows that the representing record in a FRBR group varies depending on the

search query. It implies that one could still use a database’s specific formulation of a topic

and be able to retrieve items linking to that same database, even though Oria does not support

controlled vocabulary. But this is only valid for FRBRized records, when members of a

FRBR group are hidden, but indexed in PCI and searchable otherwise (Ex Libris Knowledge

Center, n.d.d). Had the records been matched together during the de-duplication process, the

new merged record would possibly have been based on Nature Publishing Group’s record.

Being the only one indexed into PCI (Ex Libris, 2016c, p. 99), it would not have been

retrievable with the Boolean search, even though the subject terms it was indexed with in

ABI shows it has to do with Human Resources.

It seems that searching with the subject field is quite efficient in Oria: for the Boolean search,

the vast majority of the items were found based on their keyword metadata. In addition, for

all three searches, the exact same keywords were found for co-references in Oria and SD, or

Emerald, when the record source in Oria was this particular database. This further

demonstrates that some articles are findable by using the same keywords in Oria as in their

original database.

Figure 19 - Representing record in the FRBR group

Page 39: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

39

In some cases, however, the record in Oria had extra words or codes together with the subject

terms (Figure 20). A look at the original record in SD showed that “G32” and “G34” were

classification codes, and were assigned a distinct category from “Keywords” (Figure 21).

Classification codes were also used in another record in ABI. Contrary to the example from

SD, those codes were not listed in the display section and not viewable in the end user

interface in Oria. This slightly different treatment of the metadata is difficult to explain

without seeing how it was originally presented and marked by the bases themselves. But in

both cases the different types of data were marked as “subject” in the search section of the

PNX.

It is not unlikely that the diversity of resources harvested into PCI makes it difficult for Ex

Libris to accommodate all of their specificities when converting them to PNX. The two

examples mentioned here lead to think of a certain simplification in the semantic markup of

the metadata.

Field Weighting

The following screenshot was taken during the meeting on June 14th, from the Primo Back

Office interface (Figure 22 on the next page):

Figure 20 - Record from SD in Oria

Figure 21 - Original record in SD

Page 40: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

40

Libraries are able to change the weight

of the different fields from this interface,

but BI chose to keep Primo’s default

configuration of field weighting,

Askildsen and Bergsvenkrerud reported.

Figure 22 shows that the title field is

assigned the highest weight, much higher

than for the description field. This is

confirmed by, and may explain the outcome of the phrase search, run without using any

special field.

A search through all Oria references in EndNote revealed that all of them, except for one, had

“talent management” (or “talent-management”) in the title. The one exception had “talent

management” as a keyword in Web of Science, where the record originated from. Ninety-

eight other references with the keyword “talent management” were found in Endnote. This

number is potentially higher, as we have seen keywords sometimes fail to be exported to

Endnote. One hundred sixteen items had the subject terms in the abstract. It is unsure whether

one or several of these attributes are a guarantee of high ranking in the result list; the one

record without “talent management” in the title ranked 72nd.

Since almost half of the items combined the three properties, it is possible that records

containing the search string in one or several of those three fields rank higher than records

that do not. As a result, and because only 200 out of 1102 items were analysed, I could not

observe records that were retrieved based on the full-text metadata only.

In any case, knowing the weight differences between the various fields may help librarians

adapt their use of the interface, by specifically using the fields that are assigned less weight,

while a title search needs less thinking.

Metadata

Some of the result lists, especially from the title and phrase searches, presented a certain

number of duplicate records within the same set of results in Oria (see Appendix III).

Figure 22 - Weight of the different fields

Page 41: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

41

Table 2 - Number of duplicate records in the three result lists in Oria*

Out of 200 items in Oria Boolean 2015-16 Phrase search Title search

Total number of duplicate

records **

0 12 24

Number of articles

represented 3 times

0 1 3

*Duplicates records overlap with co-references

** 2 or more occurrences of the same article

Some of them had slightly different journal titles, author names, year or page numbers,

visible directly in Oria. More frequently, it seemed like the records were not identified as

representing the same article, due to small differences in the title of the article. Titles in

Cengage records in particular were often completed with information about the nature of the

article, as in the following examples (Figure 23):

Figure 23 - Title variations for the same article

Differences in the viewable section of the PNX cannot explain duplicate records alone.

Duplicate records were analysed based on the data exported to Endnote, which, in several

occasions, did not match the content in the display section. The reference exported from the

record in Figure 24 appears in EndNote as shown in Figure 25: It seems that the article title

was misinterpreted as the journal title. This illustrates the issue of inconsistent metadata in

DS, not only from one record to another but also within the same record.

Figure 24 - Record as viewed in Oria

Figure 25 - Same record in EndNote

Page 42: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

42

Inconsistencies are the reason duplicate records are not identified as representing the same

resource. This problem is especially hard to deal with as it does not necessarily take much for

records to be considered as different, such as with two identical-looking records from

Cengage.

The following reference was found twice in the same result list in Oria:

- Claussen, B., Naess, O., Reime, L. J., & Leyland, A. H. (2013). Proof firm

downsizing and diagnosis-specific disability pensioning in Norway.(Research article).

BMC public health, 13, 27.

An explanation as to why these two records were not “merged” was only found after

comparing their PNXs: https://www.diffchecker.com/ge2abpqo (last checked July 28th 2016).

Even though both items had Cengage as a record source, they were harvested from different

Gale databases, most probably Academic One File (“ofa”) and Health Reference Center

Academic (“hrca”) (EBSCO Support, n.d.a). Apart from distinct information and metadata as

to where the records came from, the only difference between the two PNX files seemed to lie

in the additional data section. This section “contains data elements that are required for a

number of functions in Primo that cannot be extracted from other sections of the PNX” (Ex

Libris, 2016c, p. 32). One of the title subfields included the additional “Research article”

(https://www.diffchecker.com/ge2abpqo, line 93-94). Both records were assigned different

FRBR-group ID (line 67-68), supposedly as a consequence.

The inconsistency in the way author names are expressed is also a problem. I observed that

all records from Emerald used the non-inverted form, contrary to other databases (Figure 26).

The way author names appear in Oria depends on where the record originates from (Figure

27).

Figure 26 - Author names in the different databases (EndNote)

Figure 27 - Name variations in Oria

Page 43: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

43

Oria seems to handle this quite well: a search with either of the writings retrieved both

records. However, applying the phrase function returned different amounts of hits. Not using

this function may help recall records with both writings, but is a double-edged strategy:

searching for the author “Doug

Williamson” retrieved the

record shown in Figure 28 on

the first page of results.

Knowing this might be helpful

as to how to narrow the search,

and for understanding the

results.

Recap

The subject terms in the search queries were chosen based on the coverage of ABI, BSC,

Emerald and SD, with the expectation that the result lists in Oria would overlap with the

databases’. The first Boolean search did not go as planned, as no co-references across the

search tools were retrieved.

The same can be said of the title search: the keyword “downsizing” was chosen for being a

specific concept within management discipline, as a way to limit the number of hits. So it did

in specialized databases, but it happened to return more than items about management only in

Oria. This is indicative of both the scope and coverage of PCI, that includes much more

content than expected from stand-alone databases.

The occurrence of duplicate records within the same result list in Oria is also revealing of the

nature of this content. Metadata in PCI comes from multiple resources. Slight differences and

inconsistencies between their metadata records were often the reason many records were not

identified as representing the same article. In addition, these resources are sometimes

disparate regarding metadata elements and categories they describe a document with. It

seems this has led, in some cases, to a certain simplification of normalized records’ semantic

structure.

A closer look at the source of records showed that all databases were not represented the

same way in Oria. Emerald and SD, automatically activated by BIBSYS Consortium in the

Figure 28 - Searching for the author "Doug Williamson"

Page 44: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

44

PCI Activation Wizard, were found to be the record source for numerous items. ABI

however, is supposed to be activated directly by BI. The absence of ABI metadata from the

result lists retrieved in March leads to think that the resource was off in the Activation

Wizard at that time. Metadata from BSC is not available in PCI, as Ex Libris does not have

any agreement with EBSCO (Ex Libris, 2016b). As a result, this database was not

represented in Oria either.

Examining the record source showed that metadata and content delivery were independent

from each other, as the record chosen for display could originate from a resource not

subscribed to by BI, as long as this content was indexed in PCI. But even though metadata

from PCI and content delivery are administrated by two different and independent interfaces,

only metadata associated with deliverable content is returned by default in Oria.

The different search types were meant to observe the efficiency of field searching in Oria.

The Boolean or subject search proved to be quite efficient as keywords in the native

interfaces were transposed to the subject field in PNX records. In some cases, Oria was even

able to retrieve metadata records from a specific resource by using this database’s controlled

vocabulary. However, Oria treats this particular kind of keyword as any other, which means

that this special feature only is of value in the original database. Unsurprisingly, the title

search retrieved only items with “downsizing” in the title. The one search without any

specific field did almost as well, uncovering the special weight assigned to the title field in

Primo’s relevancy algorithm.

Page 45: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

45

7 - Future Research

This paper explored only a few aspects of Oria’s search features, but gave insight in how to

use these particular features in the formulation of a strategy for exploratory searches.

Knowing more about other aspects of the discovery service may also contribute to improving

information retrieval. Future research projects could for example examine how Oria handles

known-item searching. Breeding (as cited in Namei & Young, 2015) declared that this type

of search was problematic when using common words, especially for one-word titles (p. 523).

A new experiment designed to test known-item retrieval could help evaluating this particular

aspect, and adapting search strategies accordingly.

My paper focused on the title and subject fields, but a few quick searches were also run with

the author field. A search with the same author returned different sets of results, depending

on the writing: the way this metadata element is expressed can greatly impact recall in the

DS, which makes it clear that this particular side of known-item search could benefit from

being looked into more deeply.

Other types of filter could also be worth testing. The present experiment focused on articles,

that is, records that were assigned the content type “article” by the DS. When checking

individual items, I sometimes ran into other records representing the same article, but that

were defined as “text resource”. This means a searcher potentially misses out on a lot of

relevant resources when applying a filter; at the same time, content type filters are an

effective way of narrowing a search. It is important for users of the DS to be aware of this

issue, in order to make the best out of the search tool.

Page 46: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

46

8 - Conclusion

The experiment described in this paper showed that Oria may be less adapted than stand-

alone databases for specialized searches, where special search features or a field-specific

coverage are an advantage. But Oria is nonetheless a smart choice as a one-stop search

engine, as it potentially gathers a great number of relevant documents on various topics in

one place. Indeed, the size and coverage of PCI makes it a tool with great potential, provided

than one makes use of the search features and narrowing options in the interface. Those

examined for the purpose of the experiment showed the semantic structure of original records

was respected quite well during the normalization process. Even though controlled subject

terms from native databases are treated as ordinary keywords in the DS, Oria can, to a certain

extent, retrieve results equivalent to these databases’ when using similar subject terms. As a

result, Oria can be used as an alternative to stand-alone databases for tasks other than

comprehensive literature searches, for example exploratory searches, or in order to get

a sense of the available content.

Due to regular updates by BIBSYS (n.d.a) and the changing nature of PCI, the results of this

experiment may not be possible to reproduce. They cannot be generalized to other

implementations of Primo than BI’s either. But as many findings are consistent with the

literature about DS, they can be useful when searching Oria at BI or at other institutions.

Knowing which databases are indexed in PCI, or which fields are assigned most weight can

hopefully help librarians and other users adapt their search strategies, in order to use the

discovery tool at its maximum potential.

However, the experiment also revealed that an optimal use of the interface is not enough to

produce duplicate-free result lists, as this depends a great deal on the quality and consistency

of metadata. It seems that development in this matter goes in the right direction as Ex Libris

is concerned by metadata quality and works for example towards the incorporation of unique

identifiers for author names, also known as ORCID, into PCI (Calarco et al., 2014, p. 537). In

the meantime, a better knowledge of the different aspects that come into play when searching

Oria can help understand the result lists it returns.

Page 47: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

47

References

BI Handelshøyskolen. (n.d.a). Emerald Management Retrieved July 4th, 2016, from

https://www.bi.no/bibliotek/Databaser/Emerald-Management-Xtra/

BI Handelshøyskolen. (n.d.b). Master. Retrieved June 19th, 2016, from

http://www.bi.edu/master/

BI Handelshøyskolen. (n.d.c). Tidsskriftdatabaser. Retrieved June 19th, 2016, from

https://www.bi.no/bibliotek/Databaser/Tidsskriftdatabaser/

BIBSYS. (n.d.a). Arkivet for Juni 2016 sortert på tittel. Retrieved July 23rd, 2016,

from http://epostlister.bibsys.no/pipermail/oria/2016-June/subject.html

BIBSYS. (n.d.b). Oversikt over bibliotek i konsortiet. Retrieved June 18th, 2016, from

http://www.bibsys.no/oversikt-over-biblioteker/

Breeding, M. (2010). Next-gen library catalogs (Vol. 1). London: Facet.

Breeding, M. (2014). Library resource discovery products: context, library

perspectives and vendor positions. Library Technology Reports, 50(1), 1-32.

BusinessDictionary. (n.d.). talent management. Retrieved July 28th, 2016, from

http://www.businessdictionary.com/definition/talent-management.html

Calarco, P., Conrad, L., Kessler, R., & Vandenburg, M. (2014). Metadata Challenges

in Library Discovery Systems. Paper presented at the Proceedings of the Charleston

Library Conference.

Craven, J., Jefferies, J., Kendrick, J., Nicholls, D., Boynton, J., & Frankish, R. (2014).

A comparison of searching the Cochrane library databases via CRD, Ovid and Wiley:

implications for systematic searching and information services. Health Information &

Libraries Journal, 31(1), 54-63.

CRIStin. (n.d.). CRIStins avtaler. Retrieved June 18th, 2016, from

http://www.cristin.no/konsortieavtaler/avtaler/

crossref.org. (2012). Library benefits. Retrieved June 26th, 2016, from

http://www.crossref.org/03libraries/index.html

EBSCO Support. (n.d.a). What is Gale Cengage Learning? Retrieved July 7th, 2016,

from http://support.epnet.com/knowledge_base/detail.php?id=6930

EBSCO Support. (n.d.b). What is the difference between Academic Journals and

Scholarly (Peer Reviewed) Journals in EBSCOhost? Retrieved July 4th, 2016, from

http://support.epnet.com/knowledge_base/detail.php?id=2721

EBSCOhost. (n.d.). Business Source Complete. Retrieved July 4th, 2016, from

https://www.ebscohost.com/academic/business-source-complete

Elsevier. (n.d.). Research Platforms. Retrieved July 4th, 2016, from

https://www.elsevier.com/research-platforms

Emerald Group Publishing. (n.d.a). Emerald eJournals Collections. Retrieved July

4th, 2016, from

http://www.emeraldgrouppublishing.com/products/collections/index.htm

Page 48: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

48

Emerald Group Publishing. (n.d.b). Emerald Journals. Retrieved July 4th, 2016, from

http://www.emeraldgrouppublishing.com/products/journals/index.htm

Ex Libris. (2016a). Primo Central Index Collection List. Retrieved June 18th, 2016,

from

https://knowledge.exlibrisgroup.com/@api/deki/files/40853/Primo_Central_Index_Co

llection_List_%5BApril_2016%5D.pdf

Ex Libris. (2016b). Primo Central Index Collection List - Alternative Coverage

Analysis. Retrieved June 18th, 2016, from

https://knowledge.exlibrisgroup.com/@api/deki/files/40854/Primo_Central_Index_Co

llection_List_-_Alternative_Coverage_%5BApril_2016%5D.pdf

Ex Libris. (2016c). Primo - Technical Guide. Retrieved June 27th, 2016, from

https://knowledge.exlibrisgroup.com/@api/deki/files/42351/Primo_Technical_Guide.

pdf

Ex Libris. (March 19th, 2013). BIBSYS Consortium Selects Ex Libris Primo and SFX.

Retrieved July 23rd, 2016, from

http://www.exlibrisgroup.com/category/PressReleases2013

Ex Libris. (n.d.a). Primo Central Index Storyline[video]. Retrieved July 12th, 2016,

from

https://d417806c05e90ef4b04f77340f833b8d9910252b.googledrive.com/host/0B7X8r

osCUUJlRHVHTFF2WGJ3bm8/

Ex Libris. (n.d.b). Primo Ranking Customization. [PowerPoint slides]. Retrieved

from https://knowledge.exlibrisgroup.com/@api/deki/files/25180/Handouts.pdf

Ex Libris Knowledge Center. (n.d.a). 01 Primo Back Office Overview. Retrieved July

12th, 2016, from

https://knowledge.exlibrisgroup.com/Primo/Training/Primo_Administration/01_Prim

o_Back_Office_Overview

Ex Libris Knowledge Center. (n.d.b). Consortium Activation in Primo Central.

Retrieved June 27th, 2016, from

https://knowledge.exlibrisgroup.com/Primo_Central/Product_Documentation/PC_Ind

ex_Configuration_Guide/Primo_Central_Index_Registration/Consortium_Activation_

in_Primo_Central

Ex Libris Knowledge Center. (n.d.c). Displaying PNX Records from Primo Front

End. Retrieved July 23rd, 2016, from

https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/Back_Office_Gu

ide/070Monitoring_and_Maintaining_Primo/Displaying_PNX_Records_from_Primo

_Front_End

Ex Libris Knowledge Center. (n.d.d). Overview of the Publishing Process. Retrieved

July 12th, 2016, from

https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/System_Admini

stration_Guide/System_Architecture/Overview_of_the_Publishing_Process

Flatley, R. K., Lilla, R., & Widner, J. (2007). Choosing a Database for Social Work:

A Comparison of Social Work Abstracts and Social Service Abstracts. Journal of

Academic Librarianship, 33(1), 47-55.

Gale Cengage Learning. (n.d.). About Us - Gale. Retrieved June 17th, 2016, from

http://solutions.cengage.com/gale/about/

Page 49: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

49

Galyani Moghaddam, G., & Moballeghi, M. (2007). The importance of aggregators

for libraries in the digital era. Interlending & Document Supply, 35(4), 222-225.

Hanneke, R., & O'Brien, K. K. (2016). Comparison of three web-scale discovery

services for health sciences research. Journal of the Medical Library Association :

JMLA, 104(2), 109.

Hoeppner, A. (2012). The ins and outs of evaluating web-scale discovery services:

librarians around the world are trying to learn what WSD services are and how they

work. Computers in Libraries, 32(3), 6.

Kelley, M. (2012). Coming into focus: Web-scale discovery services face growing

need for best practices. Library Journal, 137(17), 34-40.

Kent, D. (2005). Retrieving Scholarly Articles: A Database Comparison. Alki, 21(1),

33-34.

Namei, E., & Young, C. A. Measuring Our Relevancy: Comparing Results in a Web-

Scale Discovery Tool, Google & Google Scholar ACRL 2015 Proceedings – March

25–28, 2015, Portland, Oregon (pp. 522-535).

Narayanan, N., & Mukundan, R. (2013). Cloud Web Scale Discovery services

Landscape: An overview. Paper presented at the International conference on

Academic Libraries, New Delhi. Retrieved April 2, 2016, from

https://www.researchgate.net/profile/Nikesh_Narayanan/publication/281811595_Clou

d_Web_scale_discovery_landscape_an_overview/links/55f92b3208aeba1d9f181615.

pdf

Newcomer, N. L. (2011). The Detail Behind Web-Scale: Selecting and Configuring

Web-Scale Discovery Tools to Meet Music Information Retrieval Needs. Music

Reference Services Quarterly, 14(3), 131-145.

OCLC.org. (n.d.). OCLC Research Activities and IFLA's Functional Requirements for

Bibliographic Records. Retrieved July 25th, 2016, from

http://www.oclc.org/research/activities/frbr.html

Oxford Dictionaries. (n.d.). downsize. Retrieved June 19th, 2016, from

http://www.oxforddictionaries.com/definition/english/downsize?q=downsizing

ProQuest. (December 15th, 2015). News 2015 - ProQuest Completes Acquisition of

Ex Libris Retrieved July 11th, 2016, from

http://www.proquest.com/about/news/2015/ProQuest-Completes-Acquisition-of-Ex-

Libris.html

ProQuest. (n.d.a). ABI/INFORM Collection. Retrieved July 4th, 2016, from

http://www.proquest.com/documents/ABIINFORM-Collection-Brochure.html

ProQuest. (n.d.b). ProQuest Help. Retrieved from ProQuest database.

Read, E. J., & Smith, R. C. (2000). Searching for library and information science

literature: A comparison of coverage in three databases. Library computing, 19(1-2),

118-126.

Renaville, F. (2016). Open Access and Discovery Tools: How do Primo Libraries

Manage Green Open Access Collections? In K. Varnum (Ed.), Exploring Discovery

The Front Door to Your Library's Licensed and Digitized Content (pp. 233-256).

Chicago: ALA Editions. Advance online publication. Retrieved from

http://arxiv.org/abs/1509.04524

Page 50: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

50

Sadeh, T. (2011). Discovery and management of scholarly materials: new-generation

library systems. ProInflow(2), 4-22.

Sitas, A., & Kapidakis, S. (2008). Duplicate detection algorithms of bibliographic

descriptions. Library hi tech, 26(2), 287-301.

Taylor & Francis Group. (n.d.). Our History. Retrieved June 17th, 2016, from

http://taylorandfrancis.com/about/history/

UO Libraries. (2013). Data Documentation and Metadata. Retrieved July 5th, 2016,

from https://library.uoregon.edu/datamanagement/metadata.html

Vaughan, J. (2011). Web Scale Discovery Services: A Library Technology Report.

Chicago: ALA Editions.

Vinson, T. C., & Welsh, T. S. (2014). A Comparison of Three Library and

Information Science Databases. Journal of Electronic Resources Librarianship, 26(2),

114-126.

Personal Communications

Kristin Askildsen, Librarian, BI Library

Anita Bergsvenkerud, Senior Librarian, BI Library

Linnea Lund Jacobsen, Senior Librarian, BI Library (email, June 17th, 2016)

Asbjørn Risan, Product Owner, BIBSYS (email, June 20th, 2016)

Page 51: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

51

Appendices

Appendix I - Search Parameters

Oria ABI/Inform Business Source Complete

Emerald Management

ScienceDirect

Advanced search

Title field "in the title" "Document title - TI" "TI Title" "Content Item Title" "Title"

Subject field

"in subject" "Subject heading (all) - SU"

"SU Subject Terms" "Keywords" "Keywords"

Phrase "phrase"/quotation marks

Quotation marks Quotation marks Quotation marks Quotation marks

Language English English English Not possible to choose/default

Not possible to choose/default

Document type

"Articles" (as opposed to "Newspaper

articles")

"Article" "Article" "Articles and chapters"

"Article"

Publication type

"scholarly journals" (Source type)

"Academic Journal" Not possible to choose/default

"Journal" tab

Materials BI has access

to

default "full text" "full text" "Only content I have access to"

"Subscribed journals" and "Open Access

articles"

Post-search filter

Page 52: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

52

Appendix II - Search Logs

March 2016 Search type Oria ABI BSC Emerald SD

9th Boolean search

76 131 7 430 23 671 2 447 370

12th Boolean search, since

2015

1 520 153 828 89 65

14th Title search 2 008 377 477 129 134

16th Phrase search 1 102 1 685 514 981 333

Appendix III - Duplicate Records and Co-References

The three Excel sheets documenting co-references and duplicate records for each search can

be downloaded from the following links:

https://drive.google.com/open?id=0B2f3RDbKcsE-U09PM0JWWTRCREE (direct

link)

https://goo.gl/nwQ0e4 (folder containing all online appendices)

Appendix IV - Items Retrieved by the Four Searches

Appendix IVa – EndNote Libraries

The four EndNote libraries can be downloaded from the following links:

https://drive.google.com/open?id=0B2f3RDbKcsE-R3JXR2k3Z2NoQjg (direct link)

https://goo.gl/nwQ0e4 (folder containing all online appendices)

N.B: It is recommended to extract all the files before opening the libraries in EndNote.

Appendix IVb – Result Lists

Boolean Search 2015-16: https://drive.google.com/open?id=0B2f3RDbKcsE-

ZHp5bW5id1lBUlk

Title Search: https://drive.google.com/open?id=0B2f3RDbKcsE-

MFJoUmZ4Z1hzeDA

Phrase Search: https://drive.google.com/open?id=0B2f3RDbKcsE-U1JFcDEtdkdvcjg

Page 53: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

53

Appendix V - Publication Dates for Items Retrieved with the First Boolean

Search

Oria ABI BSC Emerald SD

1988 1

1989 1 6

1990 4

1991 1

1992 1 7

1993 2 4

1994 2

1995 3 3

1996 2 1

1997 3

1998 2

1999 2 1

2000 6 1

2001 5 1

2002 2

2003 3 1 3

2004 3 2 2

2005 2 1 2

2006 1 1

2007 3 1

2008 3

2009 1 3

2010 15 2

2011 85 1 2

2012 55 4 2 1

2013 17 4

2014 4 1 1

2015 14 37 35

2016 1 13 6

n.d. 7

200 50 50 50 50

Page 54: Discovery Systems as an Alternative to Stand-Alone Databases... · gathers metadata harvested from various sources. They offer a unified search environment that allows library users

54

Appendix VI - Record Sources for Items Retrieved during Week 12

The Excel file documenting the record sources for all three result lists can be consulted or

downloaded from the following links:

https://docs.google.com/spreadsheets/d/1Q9gdlunFOzQawk_xfJT_IUIdrHPQJ-

4bIq1kMrLHLM0/edit#gid=1668623962 (direct link)

https://goo.gl/nwQ0e4 (folder containing all online appendices)