linking scientific metadata (presented at dc2010)
DESCRIPTION
Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.TRANSCRIPT
School of Information StudiesSyracuse University
Linking Entities in Scientific Metadata
Jian Qin, Miao Chen, Xiaozhong Liu, & Andrea Wiggins
School of Information Studies, Syracuse University
The context: Islands of research information
04/10/2023
Linking Entities in Scientific Metadata -- DC2010 2
Data
Projects
Publications
Research interest
Researchers
Unlinked entities
Same entity!
04/10/2023 3Linking Entities in Scientific Metadata -- DC2010
Duplication of entity data entry
04/10/2023
Linking Entities in Scientific Metadata -- DC2010 4
Seamless Daily Precipitation for the Conterminous United States
Metadata:Identification_InformationData_Quality_InformationSpatial_Data_Organization_InformationSpatial_Reference_InformationEntity_and_Attribute_InformationDistribution_InformationMetadata_Reference_Information
What’s lacking in scientific metadata?• Standards focus on describing datasets, not
entities• No mechanism is provided for linking entities
– It is considered as an implementation issue• Islands of entities duplication of data entry
for the same entity – Increased costs and time in creating metadata– Effect in resource discovery and browse
04/10/2023 5Linking Entities in Scientific Metadata -- DC2010
Defining the research Problem
04/10/2023 6Linking Entities in Scientific Metadata -- DC2010
How can we build an interlinked network of entities for a scientific domain?
How can we associate the linked entities with their corresponding metadata records?
Linked Data: A solution
04/10/2023 7Linking Entities in Scientific Metadata -- DC2010
Relational database
containing entities and relationships
Metadata records in
XML format
Problem: Lack relationships between entities
Problem: Not related to metadata records
ResourcePropertyType
Value
RDF TriplesConvert to Embed RDF triples into
Solution
Linked data: How it works
04/10/2023
Linking Entities in Scientific Metadata -- DC2010 8
Linked data is
04/10/2023
Linking Entities in Scientific Metadata -- DC2010 9
“…a recommended best practice for exposing, sharing, and connecting
pieces of data, information, and knowledge on the Semantic Web
using URIs and RDF.”
--Wikipedia, http://en.wikipedia.org/wiki/Linked_Data
A case study
04/10/2023 10Linking Entities in Scientific Metadata -- DC2010
Dataset collection search interface at HBES (http://hubbardbrook.org/data/dataset_search.php)
Hubbard Brook Ecosystem Study (HBES)• Long term ecological research sites since 1960s• 3,160 hectare reserve• Six principle organizations & 10 other participants:
– USDA Forest Service– Cornell– Dartmouth– Syracuse– Yale– the Institute of Ecosystem Studies (IES)– the U.S. Geological Survey
• Over 300 datasets available and 2000 publications
04/10/2023 11Linking Entities in Scientific Metadata -- DC2010
HBES Data Collection• Focused on entities on the HBES site:
– Projects– Persons– Publications– Subject interests– Datasets– Events
• Verified Person and Project information against the Long-Term Ecological Research (LTER) directory if necessary;
• Stored the entities in relational database• Metadata records in EML format
04/10/2023 12Linking Entities in Scientific Metadata -- DC2010
Ecological Metadata Language (EML) Structure and Modules
04/10/2023
Linking Entities in Scientific Metadata -- DC2010 13
Conditions required for interlinking
• URI-identified entities• Relationships between these entities• Relationships between the entities
and metadata records
04/10/2023 14Linking Entities in Scientific Metadata -- DC2010
Experiment stage 1: Data prep• Two sets of data:
– Entities and their relationships• Person, subject interest, project, dataset, and paper• Many-to-many relations between the entities
– Sample EML records in XML format• Downloaded from HBES website• Entity URIs added to the corresponding XML files to be
used as semantic identifiers and hyperlinks to the entities
• 126 XML files in total
04/10/2023 15Linking Entities in Scientific Metadata -- DC2010
Entity relationships
04/10/2023 16Linking Entities in Scientific Metadata -- DC2010
Experiment stage 2: Converting to RDF• Toolkit: D2R, a service for converting
relational databases into RDF triples and publishing them on the web– Turn each table into a class– Turn each column as class property– Make each value in a column as an instance– Assign a URI to each class, property, and instance
04/10/2023 17Linking Entities in Scientific Metadata -- DC2010
04/10/2023 Linking Entities in Scientific Metadata -- DC2010 18
Experiment stage 3: Incorporating URI into XML records
• Add the URIs generated from the D2R software to their corresponding entities in EML records by using an XSL program
• Transform the EML records with inserted URIs into the HTML format for display in browser
04/10/2023 19Linking Entities in Scientific Metadata -- DC2010
Example of name with URI inserted
04/10/2023 Linking Entities in Scientific Metadata -- DC2010 20
Original EML record without URI URI added to individual name element
<individualName> <givenName>Thomas G</givenName> <surName>Siccama</surName></individualName>
<individualName> <givenName>Thomas G. </givenName> <surName>Siccama</surName> <personURI>page/people/tsiccama </personURI></individualName>
04/10/2023
Linking Entities in Scientific Metadata -- DC2010 21
Original display of EML record RDF-enabled display of EML record
Discussion• Methodology for transforming islands of entities
into linked scientific metadata• A larger scale data set needed to test its
scalability• Potentials:
– Reducing duplicate entity data entry – Applicable to legacy metadata generated using older
data model– Linking semantic data already published on the web– Facilitating data/metadata visualization??
04/10/2023 22Linking Entities in Scientific Metadata -- DC2010
DEMO
http://sdl.syr.edu/eml/
04/10/2023 Linking Entities in Scientific Metadata -- DC2010 23