marc content designation use i mplications for indexing & interoperability william e. moen...

21
MARC Content Designation Use Implications for indexing & interoperability William E. Moen <[email protected]> School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX 72603 Central Unicorn Users Group Annual Conference, October 17, 2003 Austin, Texas

Upload: beverley-richards

Post on 20-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Context for the analysis Interoperability across library online catalogs Indexing of MARC records to support searching Richness of MARC content designation available Indexing guidelines prepared for the Z39.50 Interoperability Testbed (Z-Interop) Implications for indexing guidelines and policies

TRANSCRIPT

Page 1: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

MARC Content Designation Use Implications for indexing & interoperability

William E. Moen<[email protected]>

School of Library and Information SciencesTexas Center for Digital Knowledge

University of North TexasDenton, TX 72603

South Central Unicorn Users Group Annual Conference, October 17, 2003 Austin, Texas

Page 2: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 2

Overview Context for the analysis -- interoperability Findings from the analysis Indexing and MARC Discussion

Page 3: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 3

Context for the analysis Interoperability across library online catalogs Indexing of MARC records to support searching Richness of MARC content designation available Indexing guidelines prepared for the Z39.50

Interoperability Testbed (Z-Interop) Implications for indexing guidelines and policies

Page 4: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 4

Interoperability

Systems and organizations will interoperate!

One should actively be engaged in the ongoing process of ensuring that the systems, procedures and culture of an organisation are managed in such a way as to maximise

opportunities for exchange and re-use of information, whether internally or externally.

Paul Miller, 2000

Page 5: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 5

Factors affecting interoperability Multiple and disparate systems

operating systems, information retrieval systems, etc.

Multiple protocols Z39.50, HTTP, SOAP, etc.

Multiple data formats, syntax, metadata schemes MARC 21, UNIMARC, XML, ISBD/AACR2-based, Dublin Core

Multiple vocabularies, ontologies, disciplines LCSH, MESH, AAT

Multiple languages and character sets Indexing, word normalization, and word extraction policies

Page 6: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 6

Information communities Community agreements exist (e.g., standards, rules, etc.) Interoperability factors reduced Interoperability more easily achieved

Do we need additional agreements regarding indexing policies to improve interoperability?Libraries as Focal Community

Relative homogeneity of data and systems Standards-based MARC records Content and structure prescribed by AACR Commonly understood access points Use of controlled vocabularies

Page 7: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 7

Interoperability testbed projectRealizing the Vision of Networked Access toLibrary Resources: An Applied Research andDemonstration Project to Establish andOperate a Z39.50 Interoperability Testbed

A Institute of Museum and Library Services National Leadership Grant

Goal: Improve Z39.50 semantic interoperability among libraries for information access and resource sharing

FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE…

http://www.unt.edu/zinterop/

Page 8: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 8

Threats to Z39.50 interoperability Differences in implementation of the standard Differences in local information retrieval systems

Search functionality Indexing policies

These threats can be addressed by Z39.50 specifications and configuration (i.e., profiles) Enhancing local information retrieval systems Recommendations for local indexing decisions

Page 9: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 9

Components of the testbed Test dataset

400,000+ MARC 21 records from OCLC’s WorldCat Z39.50 reference implementations

Z-client (Bookwhere), Z-server & information retrieval system (Sirsi Unicorn)

Test scenarios & searches Searches with known result records from dataset

Benchmarks Results of test searches using reference

implementations

Page 10: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 10

MARC Record structure for encoding data for machine processing

Standard structure (ANSI/NISO Z39.2/ISO 2709) Leader Directory map 3-digit tag to identify a field 2 indicator values to provide additional processing information 1 or more delimiters/codes to identify subfields

Content designation: Semantics MARC 21 245 00 $a [title] $h [format] : $b [subtitle]

Rules Anglo-American Cataloguing Rules and others

Page 11: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 11

MARC 21 content designationMARC 21

Field Groups

Currently Defined

Obsolete Total MARC 1972(Books Format Only)

00x 6 1 7 30xx 238 7 245 281xx 66 1 67 402xx 137 32 169 153xx 109 32 141 44xx 69 0 69 375xx 323 38 361 86xx 184 5 189 667xx 452 47 499 418xx 141 20 161 36TOTAL 1725 183 1908 278

Page 12: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 12

Z-Interop test dataset

Books: 91% Cartographic Materials: < 1% Electronic resources: < 1% Archival/Mixed Materials: <1%

Sound recordings: 4% Visual Materials: 1% Serials: 3%

Approximately 1% sample of MARC records from OCLC’s WorldCat database

Weighted sampling based on number of libraries “holding” the object represented by the record

419,657 total MARC records 89% of records “full level” cataloging Formats represented in test dataset

Page 13: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 13

MARC record LDR01019cam 2200265 4500^001 ocm00000003^003 OCoLC^005 20010925133908.0^008 690414s1963 nyu b 000 0 eng ^010 $a63064323 ^040 $aDLC $cDLC ^050 04 $aHV700.5 $b.N37 ^082 0 $a362.7/3 ^110 2 $aNational Study Service. ^245 10 $aIllegitimacy and adoption in Maine : $breport of a study made for the Maine Committee on Children and Youth. ^260 $a[New York], $c1963. ^300 $a24 p. ; $c28 cm. ^500 $aCover title. ^504 $aBibliographical footnotes. ^650 0 $aIllegitimacy $zMaine. ^650 0 $aAdoption $zMaine. ^710 1 $aMaine. $bCommittee on Children and Youth. ^

Page 14: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 14

Decomposing MARC RecordsOCLC #

Tag 1st Ind

2nd Ind

SubFld Fld Pos

SubFld Pos

Word Pos

Word

3 1 1 1 1 Ocm00000003

3 3 2 1 1 OCoLC

3 110 2 a 11 1 1 National

3 110 2 a 11 1 2 Study

3 110 2 a 11 1 3 Service

3 245 1 0 a 12 1 1 Illegitimacy

3 245 1 0 a 12 1 2 and

3 245 1 0 a 12 1 3 Adoption

3 245 1 0 b 12 2 1 Report

3 650 0 a 17 1 1 Illegitimacy

3 650 0 z 17 2 1 Maine

400,000 MARC21 records = 33 million decomposed records

Page 15: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 15

Content designation in datasetMARC 21

Field Groups

Currently Defined

Obsolete Unlikely Used

Total

00x 6 0 0 60xx 96 1 33 1301xx 49 0 2 512xx 81 0 19 1003xx 23 6 0 294xx 10 0 30 405xx 128 1 3 1326xx 104 1 7 1127xx 205 0 5 2108xx 105 3 8 116TOTAL 807 12 107 926

Page 16: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 16

Summary frequency results

Frequency # of Fields/Subfields % of All Occurrences> 600,000 1 4.4%500,000 > 599,999 0 0%400,000 > 499,999 13 39.9%300,000 > 399,999 6 14.3%200,000 > 299,999 6 10.6%

100,000 > 199,999 10 10.3%TOTAL 36 79.5%

Total number of fields/subfields occurring in dataset = 13,849,499

Only 4% of all fields/subfields account for 80% of all occurrencesor96% of all fields/subfields account for 20% of all occurrences

Page 17: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 17

Characteristics of top 36 Most frequently occurring: 650 $a [Subject data] 2nd most frequently occurring: 040 $d [Cataloging

source] 3rd & 4th most frequently occurring: 260 $a & $b

[Publication information] 5th most frequently occurring: 245 $a [Title] Contain data useful to end users: 28 Contain control numbers, etc.: 5 Contain data useful to catalogers: 3

Page 18: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 18

Indexing & MARC Indexing Guidelines to Support Z39.50 Profile Searc

hes Identified all MARC 21 fields/subfields that may

contain author, title, or subject data Author-related fields/subfields : 119 AuthorTitle-related fields/subfields: 21 Title-related fields/subfields: 253 Subject-related fields/subfields: 144

537 fields/subfields contain author, title, subject data Usefulness of indexing all possible fields?

Page 19: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 19

Occurrences in test dataset 381 occur one or more times in Z-Interop dataset Author, title, or subject fields/subfields in Z-Interop dataset

Author-related fields/subfields : 86 AuthorTitle-related fields/subfields: 16 Title-related fields/subfields: 178 Subject-related fields/subfields: 101

19 of the 381 (5%) account for 80% of all occurrences 9 of 19 are subject-related 5 of 19 are author-related 5 of 19 are title-related

The 19 fields/subfields

Page 20: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 20

Implications for indexing What difference does indexing decisions make? Preliminary testing using the 19 fields/subfields:

95% - 100% of correct records retrieved! How much time would be saved in setting up

indexing policies? Is there a systematic method to identify the “best”

fields/subfields to index? Per format of materials? Per user (librarians and end users) needs? Good enough search results?

Page 21: MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of…

Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 21

References Z39.50 Interoperability Testbed

http://www.unt.edu/zinterop/ Indexing Guidelines to Support Z39.50 Profile

Searches http://www.unt.edu/zinterop/Documents/IndexingGuidelin

es1Feb2002.pdf