implementing frbr on large databases

25
Implementing FRBR on Large Databases Thomas Hickey Diane Vizine-Goetz OCLC Research

Upload: pello

Post on 23-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Implementing FRBR on Large Databases . Thomas Hickey Diane Vizine-Goetz OCLC Research. What is FRBR. IFLA study group report: Functional Requirements for Bibliographic Records Bibliographic model independent of cataloging rules Clusters bibliographic items into a f our-level structure - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Implementing FRBR on Large Databases

Implementing FRBR on Large Databases

Thomas HickeyDiane Vizine-Goetz

OCLC Research

Page 2: Implementing FRBR on Large Databases

2CNI 2002 Fall Task Force

What is FRBR• IFLA study group report: Functional

Requirements for Bibliographic Records • Bibliographic model independent of

cataloging rules• Clusters bibliographic items into a four-

level structure• Work• Expression• Manifestation• Item

Page 3: Implementing FRBR on Large Databases

3CNI 2002 Fall Task Force

Control of Entities in FRBR

ItemManifestation

ExpressionWork

Corporate Body

Person Concept

PlaceEventObject

Entities

SurrogatesUniform titlesCitations Names Subjects

Page 4: Implementing FRBR on Large Databases

4CNI 2002 Fall Task Force

Why FRBR?• Potential to improve:

– Cataloging– Discovery– Delivery

• By– Bringing versions of works together– Showing relationships of various kinds– Enabling users to navigate to level of

interest

Page 5: Implementing FRBR on Large Databases

5CNI 2002 Fall Task Force

Research on FRBR & WorldCat• Subsets

– By library, region– Example/problem sets

• Shakespeare, the Bible• Humphry Clinker• 1,000 random works

– By genre• Dissertations• Fiction

• Whole file, 47 million bibliographic records

Page 6: Implementing FRBR on Large Databases

6CNI 2002 Fall Task Force

Our Approach• Concentrating on work-level

– Problems with expression-level clusters

• Efficient, maintainable, understandable

• Few, if any, false matches with correct cataloging– Err on the side of missed matches– Some accommodation of frequent

variants• Compare with manually clustered

Page 7: Implementing FRBR on Large Databases

7CNI 2002 Fall Task Force

The Algorithm• A key is generated for each record• Extract author, title

– Look up in NACO authority file– Added entry information as needed

• Form a key from bibliographic record– Author, title, added entry information– These can be sorted, compared

Page 8: Implementing FRBR on Large Databases

10CNI 2002 Fall Task Force

Problems• Many (17%) records do not have

– Author main-entry– Uniform title

• In general these can not be matched– Look at added entries– Information at the expression and

manifestation levels– Handled separately– 180,000 clusters involving ~400,000

records

Page 9: Implementing FRBR on Large Databases

11CNI 2002 Fall Task Force

Top 10 WorldCat Clusters# Recs Author/Title Key

8,383 bible\n t8,055 bible6,174 bible\authorized4,033 bible\o t\psalms3,964 haggadah3,477 great britain/treaties etc2,402 bible\o t2,248 koran2,153 arabian nights

Page 10: Implementing FRBR on Large Databases

12CNI 2002 Fall Task Force

Top 10 from a Public Library# Recs Author/Title Key

89 bible\authorized85 mother goose84 chopin, frederic\1810 1849/piano music81 schulz, charles m/peanuts63 davis, jim/garfield61 moore, clement clarke\1779 1863/night before

christmas60 mozart, wolfgang amadeus\1756

1791/instrumental music58 bach, johann sebastian\1685 1750/cantatas57 beethoven, ludwig van\1770 1827/sonatas56 twain, mark\1835 1910/adventures of

huckleberry finn

Page 11: Implementing FRBR on Large Databases

13CNI 2002 Fall Task Force

Results• Manual estimate: 1.5

manifestations/work in WorldCat• Algorithm: ~1.3• 25,844 clusters have 20 or more

records• 401,659 clusters have 5 or more

records

Page 12: Implementing FRBR on Large Databases

14CNI 2002 Fall Task Force

Preliminary Plans• Build structures for FRBR into new

catalog• Expose FRBR clustering for

searching• Make visible in cataloging

– As consensus on implementation is developed

– As cataloging rules accommodate FRBR

Page 13: Implementing FRBR on Large Databases

15CNI 2002 Fall Task Force

Spin-offs• NACO normalization code

– Testbed– Server

• Authority work– ePrints UK

• FRBR in other projects– FictionFinder– NDLTD union catalog

Page 14: Implementing FRBR on Large Databases

16CNI 2002 Fall Task Force

Fiction Subset • 2,665,662 WorldCat records • 1,758,479 work clusters• 1.5 records/cluster• 3,866 clusters have 20 or more

records• 50,540 clusters have 5 or more

records

Page 15: Implementing FRBR on Large Databases

17CNI 2002 Fall Task Force

Top 10 clusters for fiction# Recs Author/Title Key

1,296 defoe, daniel\1661 1731/robinson crusoe1,248 carroll, lewis\1832 1898/alices adventures in

wonderland 971 cervantes saavedra, miguel de\1547 1616/don

quixote 828 stevenson, robert louis\1850 1894/treasure

island 689 twain, mark\1835 1910/adventures of

huckleberry finn 624 twain, mark\1835 1910/adventures of tom

sawyer 618 swift, jonathan\1667 1745/gullivers travels 600 andersen, h c\hans christian\1805 1875/tales 581 stowe, harriet beecher\1811 1896/uncle toms

cabin 570 arabian nights

Page 16: Implementing FRBR on Large Databases

18CNI 2002 Fall Task Force

FictionFinder• Employs work clusters in a prototype

system for searching and browsing bibliographic records for fiction

• Indexes records at the work level and organizes displays by work and expression (primarily language)

• Includes records for textual items; additional modes of expression (moving image, sound) to be added later

Page 17: Implementing FRBR on Large Databases

395 records for author “crichton, michael\1942” clustered into 17 entries

23 airframe 40 andromeda strain 5 binary 11 case of need 44 congo 26 disclosure 5 disclosure a novel 16 eaters of the dead 7 eaters of the dead the manuscript of ibn fadlan relating his experiences with the

northmen in a d 922 27 great train robbery 47 jurassic park 25 lost world 37 rising sun 31 sphere 7 sphere a novel 19 terminal man 25 timeline 395

Page 18: Implementing FRBR on Large Databases

Typical Results Set Display

Page 19: Implementing FRBR on Large Databases

Typical Work-level Display

Page 20: Implementing FRBR on Large Databases

Typical Results Set Display

Page 21: Implementing FRBR on Large Databases

Typical Work-level Display

Page 22: Implementing FRBR on Large Databases

24CNI 2002 Fall Task Force

Benefits • Aggregated displays for works and

expressions• Enhancement of (fiction) records at

work level– with elements from records within the

work cluster (e.g., summaries, genre terms, subject headings, class numbers)

– with external data (e.g., literary prizes, prequels/sequels, evaluative content)

Page 23: Implementing FRBR on Large Databases

25CNI 2002 Fall Task Force

Challenges• Identifying appropriate bibliographic

data for systematically grouping or differentiating works and expressions – Works

• Genre (graphic novel v.s novel)• Genre + mode of expressions (audio book v.s

radio play)• Degree of modification (abridgement of juvenile

work v.s an adaptation for young children)– Expressions

• translators, illustrators, editors

Page 24: Implementing FRBR on Large Databases

26CNI 2002 Fall Task Force

Next Steps• FRBR algorithm

– Explore applications– Refine algorithm as needed

• FictionFinder– Add records for sound and image– Conduct user studies

Page 25: Implementing FRBR on Large Databases

27CNI 2002 Fall Task Force

Links• Functional Requirements for Bibliographic

Records - Final Report– http://www.ifla.org/VII/s13/frbr/frbr.htm

• Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR)– http://www.dlib.org/dlib/september02/hickey/09hicke

y.html• OCLC Research Activities and IFLA's Functional

Requirements for Bibliographic Records– http://www.oclc.org/research/projects/frbr/index.shtm

• Implementing FRBR on Large Databases– http://staff.oclc.org/~vizine/CNI/OCLCFRBR.htm