bea2014 - understanding new developments in metadata

Post on 30-Oct-2014

96 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

You may have heard about ONIX 3.0, THEMA, or ISNI, but are unsure how these terms relate to you or your publishing program. Do you have to convert your ONIX 2.1 to 3.0 now? If you do not have ONIX should you start with 3.0? Is ISNI mandatory to sell to major retailers? If I assign BISAC codes do I also need to assign THEMA codes? This panel of experts on these new metadata developments, moderated by Laura Dawson of Bowker, will share the key points. Attendees will learn about implementation dates and where they can gain assistance in learning more about these new metadata standards. Moderator: Richard Stark, Director of Product Data, Barnes & Noble Speakers: Laura Dawson, Product Manager, Identifier Services, Bowker; Chris Saynor, Metadata Manager and Project Manager, GiantChair; Kempton Mooney, Senior Analyst of Market Research and Business Development, Hachette Book Group

TRANSCRIPT

Understanding New Developments in Metadata

BEA Conference 2014

Richard Stark, Moderator Director of Product Data Barnes & Noble

Laura Dawson, Speaker Product Manager, Identifier Services Bowker

Chris Saynor, Speaker Metadata Manager and Project Manage GiantChair

Kempton Mooney, Speaker Research and Analytics Director Nielsen Book

ISNI Disambiguating Public Identities

What Is ISNI• ISO Standard, published in 2012• International Standard Name Identifier• Numerical representation of a name

– 16 digits– Assigned to public figures, contributors of content –

researchers, authors, musicians, actors, publishers, research institutions – and subjects of that content (if they are people or institutions).

– Example: 0000 0004 1029 5439

Who is ISNI• Founding members

– IFRRO (International Federation of Reproduction Rights Organizations)

– CISAC (International Confederation of Authors and Composers Societies)

– SCAPR (Societies’ Council for the Collective Management of Performers’ Rights)

– OCLC– CENL (Conference of European National Librarians),

represented by the British Library and the National Library of France

– ProQuest, represented by Bowker

ISNI Assignment Agency

Members

Quality Team

Board of Directors

ISNI Organizational Structure

Registration Agencies

Ongoing assignments/general public

How Does ISNI Registration Work• Publisher submits names for assignment through a Registration

Agency• RA works with the publisher to ensure the data feed is well-

formatted, and sends that feed to the Assignment Agency• AA assigns as many ISNIs to the names in the feed as it can, using

complex algorithms and business rules that evolve with each feed• AA returns a file of names with ISNIs attached to them

– This may not be the full file of names– Ambiguous names are held for review by Quality Team– QT assignments and other exceptions (assignments as a result

of improvements to the algorithm) are returned to RA quarterly– Process is not instant. Assignment may be immediate if the name

and other information is unique, but frequently assignments take a week or two.

Stage One

Stage Two

Stage Three

Display• Only minimal metadata is displayed• Not meant as a comprehensive profile• ISNI is a tool for linking data sets, collocation, and

disambiguation• Enhancements to the record can be made but not

required

Sample Public ISNI Record

ISNI links

13

Who is using ISNIs?• Wikipedia/Wikidata• VIAF• Access Copyright• Scholar Universe and Pivot• British Library• JISC• Musicbrainz• Macmillan (Digital Science)• Booknet Canada (piloting)• Authors Guild (piloting)

Einstein’s Wikipedia Page

How many names in the ISNI database?• Over 8,300,000 assigned• 10,112,931 provisional (awaiting a match from another

data set for corroboration)• Your author names may well already have ISNIs.

http://www.isni.org/search.

Use Case: Publisher

Use Case: Cross-Domain Linking

Use Case: Cross-Domain Linking

20

Data Quality• Based on matching names to existing records in

database (over 18 million names)• Strict criteria for assigning ISNIs to names• Quality team oversight (manual edits)

– British Library– National Library of France– OCLC

21

Assignment Criteria• If on the common surname list:

– Birth date– Death date– ISBN(s)– Title(s)– Co-authors or institutional affiliation

• If not on the common surname list– Title(s)– Birth date– Death date– Any other distinguishing factors (“is not”)

• If unique– Immediate assignment

22

ISNI and ORCID• ORCID numbers are a subset of ISNI’s database• Working towards alignment, with ultimate goal of single

assignment• There is ISNI representation on the ORCID Technical

Steering Group, and ORCID representation on the ISNI Technical Committee

• A researcher may have both an ORCID and an ISNI

23

Do You Have An ISNI?

Laura.Dawson@bowker.com

Understanding New Developments in Metadata

???

What is ONIX?

• ONIX stands for ONline Information eXchange.

• ONIX stands for ONline Information eXchange.

• There are over 200 data elements.

• ONIX stands for ONline Information eXchange.

• There are over 200 data elements.

• ONIX is an international metadata standard for

communicating book product information.

• ONIX stands for ONline Information eXchange.

• There are over 200 data elements.

• ONIX is an international metadata standard for

communicating book product information.

• This electronic information is distributed between

publishers, distributors, wholesalers, bookstores, online

retailers, libraries, book data aggregators and anyone else

involved in the supply chain.

• ONIX stands for ONline Information eXchange.

• There are over 200 data elements.

• ONIX is an international metadata standard for

communicating book product information.

• This electronic information is distributed between

publishers, distributors, wholesalers, bookstores, online

retailers, libraries, book data aggregators and anyone else

involved in the supply chain.

• ONIX allows global communication regardless of language.

• ONIX stands for ONline Information eXchange.

• There are over 200 data elements.

• ONIX is an international metadata standard for

communicating book product information.

• This electronic information is distributed between

publishers, distributors, wholesalers, bookstores, online

retailers, libraries, book data aggregators and anyone else

involved in the supply chain.

• ONIX allows global communication regardless of language.

• Book information can be communicated between

organizations with different technical infrastructures.

• ONIX stands for ONline Information eXchange.

• There are over 200 data elements.

• ONIX is an international metadata standard for communicating

book product information.

• This electronic information is distributed between publishers,

distributors, wholesalers, bookstores, online retailers, libraries,

book data aggregators and anyone else involved in the supply

chain.

• ONIX allows global communication regardless of language.

• Book information can be communicated between organizations

with different technical infrastructures.

• ONIX is not a database, but uses XML to organize data storage.

ONIX. A history.

With the growth of the internet and e-commerce in the 1990s there was a compelling need to create a standard digital format to communicate book information.

The goal was to create a universal, international format with which publishers large and small could exchange information about their books.

• ONIX was developed jointly in the late 1990s by Editeur

with Book Industry Standards Group (BISG) in the US and

Book Industry Communication in the UK.

• ONIX was developed jointly in the late 1990s by Editeur

with Book Industry Standards Group (BISG) in the US and

Book Industry Communication in the UK.

• ONIX for books 1.0 was published in January 2000.

• ONIX was developed jointly in the late 1990s by Editeur

with Book Industry Standards Group (BISG) in the US and

Book Industry Communication in the UK.

• ONIX for books 1.0 was published in January 2000.

• ONIX for books 2.1 (revision 02) was published in 2004.

• ONIX was developed jointly in the late 1990s by Editeur

with Book Industry Standards Group (BISG) in the US and

Book Industry Communication in the UK.

• ONIX for books 1.0 was published in January 2000.

• ONIX for books 2.1 (revision 02) was published in 2004.

• ONIX for books 3.0 was released in January 2009.

• ONIX was developed jointly in the late 1990s by Editeur

with Book Industry Standards Group (BISG) in the US and

Book Industry Communication in the UK.

• ONIX for books 1.0 was published in January 2000.

• ONIX for books 2.1 (revision 02) was published in 2004.

• ONIX for books 3.0 was released in January 2009.

• ONIX is governed by an International Steering Committee

with local committees providing information, support and

feedback internationally.

• ONIX was developed jointly in the late 1990s by Editeur with Book

Industry Standards Group (BISG) in the US and Book Industry

Communication

in the UK.

• ONIX for books 1.0 was published in January 2000.

• ONIX for books 2.1 (revision 02) was published in 2004.

• ONIX for books 3.0 was released in January 2009.

• ONIX is governed by an International Steering Committee with local

committees providing information, support and feedback

internationally.

• There are national ONIX groups in Australia, Belgium, Canada, China,

Egypt, Finland, France, Germany, Italy, Japan, Korea, The Netherlands,

Norway, Russia, Spain, Sweden, the UK and the USA. It is also used in

many other countries.

Why use ONIX?

• ONIX is a message – not a database.

• ONIX is a message – not a database.

• ONIX is a standard – a common language.

O N I X

• ONIX is a message – not a database.

• ONIX is a standard – a common language.

• ONIX is international.

• ONIX is a message – not a database.

• ONIX is a standard – a common language.

• ONIX is international.

• ONIX can communicate your title information with

everyone.

???

Why ONIX 3.0?

• With the growth of new digital formats ONIX needed

revision.

• With the growth of new digital formats ONIX needed

revision.

• ONIX 2.1 had a lot of depreciated elements left over from

earlier versions of ONIX 2.

What is different about ONIX 3.0?

• ONIX 3.0 reflects the changed global book market.

• ONIX 3.0 reflects the changed global book market.

• ONIX 2.1 and 3.0 share many common traits. About 66% of

a

typical ONIX 2.1 message does not need significant

changes made

to make it valid ONIX 3.0.

• ONIX 3.0 reflects the changed global book market.

• ONIX 2.1 and 3.0 share many common traits. About 66% of

a

typical ONIX 2.1 message does not need significant

changes made

to make it valid ONIX 3.0.

• Outdated and depreciated elements have been removed.

Product supply information now better reflects the global nature of market

• ONIX 3.0 pushes you to express all market data even if it is

to say

“Not known for these countries”.

• ONIX 3.0 pushes you to express all market data even if it is

to say

“Not known for these countries”.

• Can express much more detailed pricing information on a

global scale.

• ONIX 3.0 pushes you to express all market data even if it is

to say

“Not known for these countries”.

• Can express much more detailed pricing information on a

global scale.

• Can express dates and availability by market.

Percentage ofpopulation whoSpeak English

Source: wikipedia

?? ?

?

???

?

??

?

?

Digital products can now be described more completely

• Formats changed to express method of delivery.

• Formats changed to express method of delivery.

• Information on DRM and usage constraints.

• Formats changed to express method of delivery.

• Information on DRM and usage constraints.

• Accessibility information.

• Formats changed to express method of delivery.

• Information on DRM and usage constraints.

• Accessibility information.

• Rental information and conditions.

“Set” and “Series” replaced by a more general notion of “Collections”

• It is easier to express a shared identity.

Title information can be expressed and defined more clearly

In Search of Lost Time Volume 1

Swann’s Way

A Storm of SwordsA Song of Ice and Fire

Book 3Game of Thrones

Better expression of data for marketing material

• Text content – any text included in your metadata.

• Text content – any text included in your metadata.

• Cited content – any third party content you make reference

to that could improve sales.

• Text content – any text included in your metadata.

• Cited content – any third party content you make reference

to that could improve sales.

• Supporting resources – any material a publisher wishes to

make available in their metadata to support the sale of the

title.

“I always wanted to be a writer.”

Multilingual data

• Can repeat and send textual information in different

languages and different scripts.

• Can repeat and send textual information in different

languages and different scripts.

• Add a note about a product in English, French and Spanish.

• Not suitable for children under 36 months, due to small parts

• No apto para niños menores de 36 meses, debido a las piezas pequeñas

• Ne convient pas aux enfants de moins de 36 mois, en raison de petites

pièce

• Nicht geeignet für Kinder unter 36 Monaten, wegen verschluckbarer

Kleinteile

• Не подходит для детей в возрасте до 36 месяцев, в связи с мелких

деталей

• Can repeat and send textual information in different

languages and different scripts.

• Add a note about a product in English, French, Spanish

etcetera...

• Send your author’s biography in English and Spanish.

• Miguel de Cervantes Saavedra; 29 September 1547 (assumed) – 22 April 1616) was a

Spanish novelist, poet, and playwright. His magnum opus, Don Quixote, considered to be

the first modern European novel, is a classic of Western literature, and is regarded

amongst the best works of fiction ever written. His influence on the Spanish language

has been so great that the language is often called la lengua de Cervantes ("the language

of Cervantes"). He was dubbed El Príncipe de los Ingenios ("The Prince of Wits").

• Miguel de Cervantes Saavedra (Alcalá de Henares,29 de septiembre de 1547 – Madrid,

22 de abril de 1616) fue un soldado, novelista, poeta y dramaturgo español.Es

considerado una de las máximas figuras de la literatura española y universalmente

conocido por haber escrito Don Quijote de la Mancha, que muchos críticos han descrito

como la primera novela moderna y una de las mejores obras de la literatura universal,

además de ser el libro más editado y traducido de la historia, sólo superado por la Biblia.

Se le ha dado el sobrenombre de «Príncipe de los Ingenios».

Block updates

• Send updates for part of the product instead of sending the

whole product file.

• Send updates for part of the product instead of sending the

whole product file.

• So updates can be sent as smaller files.

Even better resources

• Very comprehensive ONIX 3.0 Global Best Practice and

implementation documents available.

• For developers, ONIX 3.0 has XSD and RNG schemas.

More about best practices

• BISG – Best Practices for Product Metadata: Guide for North

American Senders and Receivers.

• BISG – Best Practices for Keywords in Metadata: Guide for

North American Senders and Receivers.

• Editeur – Implementation and Best Practice Guide

To find out more about ONIX

www.editeur.org

www.bisg.org

http://www.booknetcanada.ca/

Thema: The First Global Subject

Category Codes

May 2014

*Contains information from Howard Willows LBF 2014 Presentation

Thema… What is it? How will it help?What are its implications?What does it look like?

95

Thema: What is it?

• Thema is a subject category classification system.

• Thema is made for all members of the supply chain to use.

• Thema is meant for use with physical and digital products.

• Thema is an international standard for the global book trade.

96

Thema: How will it help?

• Book trade subject schemes tend to be national, not international • We can now clearly communicate all product data – except subject

classification

• Thema can replace the need for endless mappings & conversions

It is live!Version 1.0 was released November 2013Sunrise Date was December 2013

97

Thema: How will it help?

• Facilitate international transactions

• Increase understanding in international markets

• Reduce subject code confusion

• Increase discoverability

98

Thema Committee Structure

a subcommittee of

99

(Maintained by EDItEUR)

100

Thema countries: LBF 2014

AIEAmazonAustralian PABaker & TaylorBarnes & NobleBICBISGBokrondellenBooknet CanadaBowkerBTLFCB

Danish PA DilveEditisElectreElkotob.comElsevierGiant ChairGuild of Book Dealers (Russia)HachetteHarperCollinsInformazioni EditorialiIngram

Japan Publishers OrganisationKoboKogan PageLibriMVBNielsen BookNorske Bokdatabasen NTCPDS ChinaPenguin Random HouseSpringer Waterstones

101

Current Participants

* As of London Book Fair 2014

Implications for BISAC Subject Heading Users

• Thema will reduce mappings to BIC, BISAC, CLIL, etc.

• Thema and BISAC will operate in parallel.

• No timeline for BISAC being deprecated.

• There is a BISAC-to-Thema mapping.

102

(Can use BISAC to select a Thema code.)

What does Thema look like?

F Fiction & relatedFJ Speculative fictionFJB Dystopian fiction Use for any fiction set in

dysfunctional or degraded society; use with FL or FB codes if appropriate

Code Heading Notes

HIE

RARC

HY

Because of hierarchy, F is implied in FJB.

Subject Headings

103

What does Thema look like?

Code Heading

Subject Headings – More Examples

104

AGA History of art FRX Erotic romance XAMC Manga: KodomoNHW Military historyQRRF ZoroastrianismKJMP Project managementLWKF Shariah law: family relations MKE DentistryUGB Web graphics & designWBB TV / celebrity chef cookbooksYBC Children's picture books

What does Thema look like?

1K The Americas1KBB United States of America, USA1KBB-US-NAKC New York City

Geographic Code Heading

Qualifiers

3M c 1500 onwards to present day3MPQS c 1960 to c 19693MPQS-US-P USA: Civil Rights Movement

Time Period Code Heading

105

1K The Americas1KBB United States of America, USA1KBB-US-NAKC New York City

3M c 1500 onwards to present day3MPQS c 1960 to c 19693MPQS-US-P USA: Civil Rights Movement

What does Thema look like?

Geographic Code Heading

Qualifiers

Time Period Code Heading

106

Geographic 1HFGU UgandaLanguage 2ACSC IcelandicTime Period 3MD 16th century, c 1500 to c 1599Education 4GH For International GCSE (IGCSE)Interest 5AG Interest age: from c 6 yearsArtistic Style 6BA Baroque

What does Thema look like?

Code Heading

Qualifiers – More Examples

107

Type

(about, not in)

Diving Deeper: Technical specsSummary of Elements

108

Element Code begins

May contain Length Mandatory / Optional

Categories A-Y A-Z 1-9 1-8 MandatoryGeographical Qualifiers 1 1 A-Z - 2-19 OptionalLanguage Qualifiers 2 2 A-Z - 2-19 OptionalTime Period Qualifiers 3 3 A-Z - 2-19 OptionalEduc Purpose Qualifiers 4 4 A-Z - 2-19 OptionalInterest Qualifiers 5 5 A-Z - 2-19 OptionalArtistic Style Qualifiers 6 6 A-Z - 2-19 Optional

Thema in Onix: Use following values from code lists 26 & 2793 Thema subject category94 Thema geographical qualifier95 Thema language qualifier96 Thema time period qualifier97 Thema educational purpose qualifier98 Thema interest age / special interest qualifier99 Thema style qualifier

Diving Deeper: Technical specs• Only a Subject Category is mandatory; Qualifiers are optional.• The first Subject Category entered is the primary subject.• Thema is recognized in ONIX, and can be sent as part of any ONIX 2.* and

ONIX 3.* messages, using standard ONIX practice for subject classification metadata.

• In product records and message formats (such as ONIX), only the code is required to be communicated.

• There is no defined upper limit of the number of Subject Category values or Qualifier values that may be assigned.

• It is expected that a maximum of 10 of each type would sufficiently cover all reasonable circumstances.

• Systems designers working with systems which require limits to be placed on data element lengths and/or number of occurrences are advised to provide for the full length of codes and recommended maximum number of occurrences. 109

Notes on Implementation• The schema is now available via the EDItEUR website.• Documentation on structure definitions is available.• A document of basic user instructions is available.• A BISAC-to-Thema mapping is available.

It is live! Version 1.0 was released November 2013• Mappings from BIC & BISAC schemes completed• Full translations into French, German and Norwegian• Workshops & presentations for publishers in Germany• Other groups working on translations into Italian, Spanish, Swedish etc…

• In the US, various supply chain partners have said they are working towards transmitting and receiving Thema

110

111

More on Thema

Official Thema Documentation http://www.editeur.org/151/Thema/

The US Thema Working Groupwww.bisg.org

BISAC to Thema Translatorhttp://bisactothema.biblioshare.org/

Kempton Mooney Research and Analytics Director, Nielsen

top related