modeling the complexity of music metadata in semantic graphs for exploration and discovery

24
Modeling the Complexity of Music Metadata in Semantic Graphs for Exploration and Discovery ANR-14-CE24-0020 @pasqlisena [email protected] Pasquale Lisena, Raphaël Troncy, Konstantin Todorov, Manel Achichi Digital Libraries for Musicology (DLfM) Workshop 28th October 2017 | Shanghai Conservatory of Music

Upload: pasquale-lisena

Post on 22-Jan-2018

70 views

Category:

Engineering


1 download

TRANSCRIPT

Modeling the Complexity of Music Metadatain Semantic Graphs for Exploration and Discovery

ANR-14-CE24-0020

@[email protected]

Pasquale Lisena, Raphaël Troncy, Konstantin Todorov, Manel Achichi

Digital Libraries for Musicology (DLfM) Workshop28th October 2017 | Shanghai Conservatory of Music

https://list.indiana.edu/sympa/arc/mla-l/2017-08/msg00248.html

2

Information contained in librarian knowledgebut not publicly available

Hard question for currentmusic models and ontologies

Different practical implications(MIR, concert and radio programming, music recommendation)

3

4

Project Goals • Improve music description to fostermusic exchange and reuse

• Connect sources, multiply usage,enrich user experience

• Music specific data model• Vocabularies and data public available as

Linked Open Data• Tools for visualization, interconnections,

recommendation• Experience and praxis for other institutions

5

Works62 550 | XML

Scores9 154 | XML

Concerts340 609 | XML

Discs9 500 | XML

Works6 846 | UNIMARC

Scores30 319 | UNIMARC

Concerts5 164 | XML

Discs8 602 | XML

Source Datasets

Works135 940 | INTERMARC

Scores89 184 | INTERMARC

6

Source Datasets

DATASET

Works

Scores

Concerts

Discs

Classic work

Jazz improvisation

Ethnic/World/Traditional music

How to manage this complex metadata?

7

State of the Art: MusicOntology

- One of the first example of describingmusic using Semantic Web

- Extend FRBR, Timeline Ontology, Event Ontology

- Uses vocabularies for Keys, Musical Instrument (by MusicBrainz), Genres (DBpedia)

8Raimond, Samer A. Abdallah, Mark B. Sandler, and Frederick Giasson. 2007. The Music Ontology. In 15thInternational Conference on Music Information Retrieval (ISMIR). 417–422

The DOREMUS model

F15Work

F22Expression

F28Expression

Creation

- Music specific extension of FRBRoo

- Triplet pattern:Work-Expression-Event

- Dynamic:every triplet is autonomous, and linkable to the other ones

- Relies on Linked Data principles (everything is an URI,RDF model)

9http://data.doremus.org/ontology

F14Work

F22Expression

M2Opus

StatementF28Expression

Creation

R3 is realized in

E7Activity

5

1

“Sonate pour violoncelle et piano no 1”@fr“Sonates" , "Sonata in F"

Ludwig van Beethoven

Ludwig von Beethoven

composercompositeur@frcompositore@it

U17 has opus statement

U12 has genre

P102 has title

U31 had function of

type

P14 carried out by

P9 consists of

P4 has time span1796

Sonatasonata@it , sonate@fr ,

klaviersonate@de

M42 PerformedExpression

Creation

M43PerformedExpression

Berlin

P4 has time span

1796

P7 tookplace at

F24 Publication Expression

F30 Publication

Event

P4 has time span

1797

P7 took place at

Vienna

U4 had princepspublication

U54 is performed expression of

P165 incorporates

1770

1827

P98born

P100died

F MajorF Dur@de , Fa majeur@fr,

Fa maggiore@it , Fa mayor@es

M6Casting

M23Casting Detail 1

U30quantity

U2 foresees

mop

PianoPianoforte@itFortepian@pl

M23Casting Detail

1

U30quantity

U2 foresees

mop

CelloVioloncello@itVioloncelle@fr

F15Complex

Work

F19 Publication

WorkM44Performed

Work

U5 had premiere

U38 has descriptive expression

R10 has member

11

Controlled Vocabularies

12

“Sax”@en

“Saxophone”@en

“Saxofone”@pt

“Sassofono”@it

“Saxophone”@fr

Alternate labels Alternate languages

<http://data.doremus.org/vocabulary/iaml/mop/wsa>

“English term is preferred globally”

Notes

“Woodwinds”@en“Legni”@it Hierarchy

“Baritone Saxophone”@en• Disambiguation• Search• Graph-based analysis

APPLICATIONS

Controlled Vocabularies

13

GENRESDiabolo (629)

IAML (607)Itema3 (212)Redomi (313)

RAMEAU (654)

Medium of performanceMIMO (2480)Itema3 (314)IAML (419)

Diabolo (2117)RAMEAU (876)Redomi (179)

Musical keys29

Modes22

Catalogues151 Derivation types

16

Functions~ 30

coming soon

http://data.doremus.org/vocabularies

Interlinking: Vocabularies

14

http://data.doremus.org/vocabulary/iaml/genre/cha

“cha-cha-cha”

http://data.doremus.org/vocabulary/diabolo/genre/cha_cha_cha

“cha cha cha”

http://yamplusplus.lirmm.fr/

=

String matching + graph traversal

Interface for validatingthe matching

001 FRBNF139081882FR

100 $313891295$w.0..b.....$aBeethoven$mLudwig van$d1770-1827

144 $w....b.fre.$aSonates$bPiano$pOp. 27, no 2$tDo dièse mineurLANG TITLE MOP OPUS KEY

“MARC must die” -- Roy Tennant, 2002http://lj.libraryjournal.com/2002/10/ljarchives/marc-must-die/#_

MARC issues

16

• Different variantsUNIMARC, INTERMARC

• Free text fielddifferent practices in describing the same information

“Op. 27 n. 2” - “Op. 27 no 2”

• Frequent mistakes in editorial workwrong fields, typos, wrong punctuation

Data conversion

marc2rdf

experts-mademapping rules

17

controlled vocabularies

https://github.com/DOREMUS-ANR/marc2rdf/

• Field parsing and mapping• NLP techniques• Graph generation• String2URI

TASK

S

Interlinking: Works

18

http://data.doremus.org/expression/d72301f0-0aba-3ba6-93e5-c4efbee9c6ea

“Sonata quasi una fantasia”

http://data.doremus.org/expression/22679001-2cd0-3f84-b502-0f337429966f

“Quasi una fantasia”

https://github.com/DOREMUS-ANR/legato

=

Legato F-measure > 0.85Precision > 0.87

Recall > 0.82

Interlinking: Works

19

1. Data cleaningremoving “noisy” properties, i.e. identifiers, comments, …

2. Instance profilingrepresent each resource as sub-graph

3. Instance indexing and matchingconvert the sub-graph in a set of keywords in order to apply text document matching techniques

4. Post-processingClustering of the datasets, identify false positive of previous points

Visualizing

20http://overture.doremus.org

Prototype of web app that uses the DOREMUS dataset

• Follow the linkslike in the graph

• Enriched experienceDBpedia, GeoNames, …

• Timeline of related event• Similar works

recommendation

Future Work

21

• Pivot Vocabularies of Genres and MoPsas result of the interconnection task

• Recommendation Systemfirst step: “Combining Music Specific Embeddings for Computing Artist Similarity” @ISMIR2017

• Schema.org injection in all pagesgoals: SEO optimization, simplification of the data in order to extend their usage

22

But what about this?

23

results

This and more questions:https://github.com/DOREMUS-ANR/knowledge-base/tree/master/query-examples

Links

http://www.doremus.org/DOREMUS Website

GitHub pagewith tools, converters, ontologies, ...https://github.com/DOREMUS-ANR/

Dataset & SPARQL Endpointhttps://data.doremus.org/sparqlhttps://data.doremus.org/fct

OVERTUREhttps://overture.doremus.org/

This presentationhttps://www.slideshare.net/squalelis

24