mapping, interlinking and exposing musicbrainz as linked data

29
Mapping, Interlinking and Exposing MusicBrainz as Linked Data 1st Interna*onal Workshop on Seman*c Music and Media (SMAM2013) Sydney, Oct 21, 2013 Peter Haase

Upload: peter-haase

Post on 08-May-2015

1.967 views

Category:

Technology


0 download

DESCRIPTION

Slides from my keynote at the 1st International Workshop on Semantic Music and Media (SMAM2013) http://iswc2013.semanticweb.org/content/smam-2013

TRANSCRIPT

Page 1: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Mapping,  Interlinking  and  Exposing  MusicBrainz  as  Linked  Data  1st  Interna*onal  Workshop  on    Seman*c  Music  and  Media  (SMAM2013)  Sydney,  Oct  21,  2013  Peter  Haase  

Page 2: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

What  this  talk  is  about  A  Linked  Data  Perspec=ve  

affiliation affiliation (previous)

participatesIn participatesIn

isAbout

publishedTo

builtWith

worksOn

Page 3: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

EUCLID:  EdUca=onal  Curriculum  for  the  usage  of  LinkedData    

@euclid_project euclidproject euclidproject

http://www.euclid-project.eu

Other channels

eBook Course

Page 4: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

EUCLID  Scenario  

Visualiza*on  Module  

Metadata Streaming providers

Physical  Wrapper  

Downloads

Dat

a ac

quis

ition

R2R  Transf.  LD  Wrapper  

Musical Content

App

licat

ion

Analysis  &  Mining  Module  

LD D

atas

et

Acc

ess

LD  Wrapper  

RDF/  XML  

Integrated  Dataset  

Interlinking   Cleansing  Vocabulary  Mapping  

SPARQL Endpoint

Publishing

RDFa  

Other content

Page 5: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

MusicBrainz  

•  MusicBrainz  is  an  open  music  encyclopedia  that  collects  music  metadata  and  makes  it  available  to  the  public.  

•  MusicBrainz  aims  to  be:  •   The  ul=mate  source  of  music  informa=on  by  allowing  anyone  to  contribute  and  releasing  the  data  under  open  licenses.  •   The  universal  lingua  franca  for  music  by  providing  a  reliable  and  unambiguous  form  of  music  iden*fica*on,  enabling  both  people  and  machines  to  have  meaningful  conversa*ons  about  music.  

•  Like  Wikipedia,  MusicBrainz  is  maintained  by  a  global  community  of  users  and  we  want  everyone  —  including  you  —  to  par*cipate  and  contribute.  

•  MusicBrainz  is  operated  by  the  MetaBrainz  Founda*on,  dedicated  to  keeping  MusicBrainz  free  and  open  source.  

Page 6: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Publishing  Rela=onal  Databases  as  RDF:  W3C  RDB2RDF  

Task:  Publish  data  from  rela*onal  DBMS  as    Linked  Data  

 Approach:  map  from  

rela*onal  schema  to  seman*c  vocabulary  with  R2RML  

 Publishing:  two  alterna*ves  –  

•  Translate  SPARQL  into  SQL  on  the  fly  

•  Batch  transform  data  into  RDF,  infer,  index  ,  integrate  and  provide  SPARQL  access  in  a  triplestore  

LD  Dataset  

Access  

Integrated  Data  in  

Triplestore  

Interlinking   Cleansing  Vocabulary  Mapping  

SPARQL  Endpoint  

Publishing  

Data  acquisi*

on  

R2RML  Engine  

Rela*onal  DBMS  

Page 7: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Publishing  MusicBrainz  

Music  Ontology  MusicBrainz  DB     R2RML  

h"ps://wiki.musicbrainz.org/Next_Genera;on_Schema    h"p://musicontology.com  

Table  Recording(gid,  length)   Ontology  concept  mo:recording    R2RML  Mapping  

Concrete  Example  Mapping  

Page 8: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

MusicBrainz  Next  Gen  Schema  ar=st    As  pre-­‐NGS,  but      

       further  a`ributes  

ar=st_credit    Allows  joint  credit  

release_group    Cf.  ‘album’    

       versus:  

release  medium    

•  track  •  tracklist  

•  work  •  recording  

https://wiki.musicbrainz.org/Next_Generation_Schema

Page 9: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Music  Ontology  OWL  ontology  with  following  core  concepts  (classes)  and  

rela*onships  (proper*es):  

Source: http://musicontology.com

Page 10: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

R2RML  Class  Mapping  Mapping  tables  to  classes  is  ‘easy’:    lb:Artist  a  rr:TriplesMap  ;      rr:logicalTable  [rr:tableName  "artist"]  ;      rr:subjectMap            [rr:class  mo:MusicArtist  ;            rr:template                        "http://musicbrainz.org/artist/{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  mo:musicbrainz_guid  ;            rr:objectMap  [rr:column  "gid"  ;                                          rr:datatype  xsd:string]]  .    

Page 11: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

R2RML  Property  Mapping  Mapping  columns  to  proper*es  can  be  easy:    lb:artist_name  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery            """SELECT  artist.gid,  artist_name.name                    FROM  artist                    INNER  JOIN  artist_name  ON  artist.name  =  

artist_name.id"""]  ;      rr:subjectMap  [rr:template                                            "http://musicbrainz.org/artist/{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  foaf:name  ;            rr:objectMap  [rr:column  "name"]]  .  

Page 12: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

NGS  Advanced  Rela=ons  Major  en**es  (Ar*st,  Release  Group,  Track,  etc.)  plus  URL  

are  paired    (l_ar*st_ar*st)  

Each  pairing    of  instances    refers  to  a  Link  

Links  have  types      (cf.  RDF  proper*es)    and  a`ributes  

     

http://wiki.musicbrainz.org/Advanced_Relationship

Page 13: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

R2RML  Mapping  Editor  

Rela*onal  Database  

R2RML  Mappings  

R2RML  Engine   SPARQL  Endpoint  

R2RML: Expose data from relational DBMS as RDF / via SPARQL Endpoint

R2RML  Edi*ng  Made  Easy!  Hides  vocabulary  intricacies  from  end-­‐user  

Access  to  metadata  about  rela*onal  databases  

Preview  of  generated  triples  and  SQL  queries  

Very  expressive  (Supports  most  of  R2RML)  

Problem: R2RML Mappings are hard to create

See our R2RML Mapping Editor in the ISWC Demo Session on Wednesday!

Page 14: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Scale  MusicBrainz  RDF  derived  via  R2RML:  

lb:artist_member  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery          """SELECT  a1.gid,  a2.gid  AS  band                FROM  artist  a1                    INNER  JOIN  l_artist_artist  ON  a1.id  =  l_artist_artist.entity0                      INNER  JOIN  link  ON  l_artist_artist.link  =  link.id                      INNER  JOIN  link_type  ON  link_type  =  link_type.id                      INNER  JOIN  artist  a2  on  l_artist_artist.entity1  =  a2.id                  WHERE  link_type.gid='5be4c609-­‐9afa-­‐4ea0-­‐910b-­‐12ffb71e3821'"""]  ;      rr:subjectMap  [rr:template  "http://musicbrainz.org/artist/{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  mo:member_of  ;            rr:objectMap  [rr:template  "http://musicbrainz.org/artist/{band}#_"  ;                                        rr:termType  rr:IRI]]  .  

150M Triples

Page 15: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Some  Sta=s=cs  –  RDF  Dump  

(Lead) Table Triples Time (s) area 59798 2 artist 36868228 423 dbpedia 172017 13 label 201832 3 medium 18069143 163 recording 11400354 209 release_group 3050818 31 release 9764887 151 track 75506495 794 work 1728955 20

156822527 1809

Page 16: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Informa=on  Workbench  PlaGorm  for  Linked  Data  Applica=ons  

 §  Open  standards  and  technologies  

•  Seman*c  Wiki  based  frontend    (Using  SMW  Syntax)    

•  Suppor*ng  W3C  standards  (OWL,  RDF,  SPARQL,,  …)  

•  Community  Edi*on  (Open  Source)  +  Enterprise  Edi*on  (Commercial)  

§  Seman*cs-­‐  &  Linked  Data-­‐based  integra=on  of  private  and  public  data  sources  based  on  data  providers  

•  Generic  and  specific  providers  for  various  data  formats  and  sources  

•  Supports  established  mapping  frameworks  (e.g.  R2RML,  SILK,  …)  

•  Named  graphs  for  managing  contexts  and  provenance  

§  Intelligent  Data  Access  and  Analy=cs  •  Flexible  self-­‐service  UI  •  Visualiza*on,  explora*on,  

dashboarding  and  repor*ng  •  Seman*c  search  

§  Collabora=on  and  knowledge  management  

•  Cura*on  &  authoring  •  Collabora*ve  workflows  

Page 17: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Data  storage  and  management  plajorm  

Reusable  UI  and  data  integra*on  components    

Customized  applica*on  solu*ons  

External  resources  to  reuse  data  and  create  mashups  

Realiza=on  within  the    Informa=on  Workbench  Architecture  

Page 18: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

The  “MusicBrainz  Explorer”  Applica=on  

Data

Data Providers

Ontology

Templates

Widgets

Music Ontology

R2RML

Page 19: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Template:  …        

Ontology  as  a  “Structural  Backbone”  Resource  page        

RDF  Data  Graph  

Ontology  (RDFS/OWL)  

The_Beatles  Yesterday  

mo:Ar=st  

mo:Track  

rdf:type  rdf:type  

Template:mo:Track        

UI  templates  

Template:mo:Ar=st        

Resource  page        

Defining  data  

structure  

Defining  UI  

structure  

Page 20: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Information  Workbench:    Browsing  a  Music  Artist  

Page 21: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Information  Workbench:    Visualization  techniques  

Page 22: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Naviga=on  Through  the  Data  

Source: http://musicbrainz.fluidops.net/resource/Analytical5

Page 23: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

SPARQL  visualization  

SELECT  ?release                  ((SUM(xsd:double(?duration/60000)))  AS  ?avg)    WHERE  {      <http://dbpedia.org/resource/The_Beatles>                    foaf:made  ?release  .    ?release  mo:record  ?record  .    ?record  mo:track  ?track  .    ?track  mo:duration  ?duration  .}    GROUP  BY  ?release  ORDER  BY  DESC(?avg)  LIMIT  10  

SPARQL  Query    

Result  set  

Top ten The Beatles releases according to the sum of track durations in minutes

Page 24: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

SPARQL  visualization  

Top ten The Beatles releases according to the sum of track durations in minutes Widget  

Visualization:  Bar  chart  

{{#widget:  BarChart  |  query  ='SELECT  (COUNT(?Release)  AS  ?COUNT)  ?label  WHERE  {        <http://musicbrainz.org/artist/8538e728-­‐ca0b-­‐4321-­‐b7e5-­‐cff6565dd4c0#_>  foaf:made  ?Release.      ?Release  rdf:type  mo:Release  .    ?Release  dc:title  ?label  .}  GROUP  BY  ?label  ORDER  BY  DESC(?COUNT)  LIMIT  20'  |  settings  =  'Settings:barvertical_mb'    |  asynch  =  'true'  |  input  =  'label'  |  output  =  'COUNT'  |  height  =  '300’}}  

Page 25: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Information  Workbench:    SPARQL  visualization  Top ten The Beatles releases according to the sum of track durations in minutes Other  visualiza*ons  of  the  same  result  set  …  

Line  chart:  

Pie  chart:  

Page 26: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Automated  Widget  Suggestion  

Bar chart

Line chart

Pie chart

1  

2   3  Table

Pivot view

Select a suggested visualization Visualization automatically built

Page 27: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

R2RML  Mappings  •  h`ps://github.com/LinkedBrainz/MusicBrainz-­‐R2RML  

MusicBrainz  RDF  Dump  •  h`p://mbsandbox.org/~barry/  MusicBrainz  Linked  Data  Demo  system  •  h`p://musicbrainz.fluidops.net/  Informa*on  Workbench  •  h`p://www.fluidops.com/informa*on-­‐workbench/  

Euclid  Project  •  h`p://euclid-­‐project.eu/  

   

Try  it  out!  

Page 28: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Acknowledgements  The  Euclid  Project  Barry  Norton    Michael  Meier  Andriy  Nikolov  Yves  Raimond  Kurt  Jacobson  Thomas  Gaengler  Juan  Sequeda  Simon  Dixon  

 (in  no  par;cular  order)  

 

Page 29: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Contact    Peter  Haase  fluid  Opera*ons  AG  Altro`str.  31  Walldorf  Germany    +49  (0)  6227  358087-­‐0  www.fluidops.com  [email protected]    

Thank  you!