fair data in a um law study · lessons learned fair is not binary (your data is not either fair or...

Post on 12-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FAIR data in a UM Law study

Large-scale analysis of EU court decisions

Kody Moodley, Pedro Hernandez-Serrano, Marcel Schaper, Michel Dumontier, Gijs van Dijck

Lambin et al. Radiother Oncol. 2013. 109(1):159-64. doi: 10.1016/j.radonc.2013.07.007

We need to build a social, ethical and technological infrastructure that

facilitates the discovery and reuse of digital resources

for people and machines

@micheldumontier::IDS-TRAINING:2018-10-30

An international, bottom-up paradigm for the discovery and reuse of digital content

by and for people and machines

Improving the FAIRness of digital

resources will increase their quality

and their potential and ease for

reuse.

@micheldumontier::IDS-TRAINING:2018-10-30

Give unique names for ‘things’ in your data:

Globally unique: not just unique in your dataset

Persistent: don’t keep changing these names

Resolvable: make ‘things’ in your data discoverable on

the Web (e.g. a webpage with more information about it)

Make machine-readable descriptions of your data

so we can use machines to index, search and filter it

Provide metadata describing your data that is accessible beyond its lifetime

Clearly define and communicate access and security protocols for your data

(FAIR != Open)

Represent your data and metadata using machine interpretable formats

Use common vocabularies for representing your data

Link your data to other related datasets

License: who can reuse your data, under what conditions, for what purpose?

Provenance: who generated the data? when and how did they do this?

Community-standards: use the same data sharing, publishing platforms and

data formats, as your peers

(A CDDI pilot study)

Large-scale analysis of EU court decisions

Community for Data-Driven Insights (CDDI)

CDDI investigates how Maastricht

University can become the first FAIR

university (2025) by implementing

eScience, Technology, Expertise, and

Services.

Team for this pilot study

Prof. Michel DumontierIDS @ UM

Project partner

Prof. Gijs van DijckFaculty of Law

Project director

Team for this pilot study

Dr. Kody MoodleyIDS@UM / Faculty of Law

Project manager

Pedro Hernandez-SerranoIDS@UM

Lead Data Scientist

Prof. Marcel SchaperFaculty of Law

Court decision expert

Team for this pilot study

Elden van DelftFaculty of LawCourt decision

expert

Marion MeyersDKE / Faculty of

LawData Scientist

Bogdan CovrigFaculty of LawData Scientist

Andreea GrigoriuIDS @ UM

Faculty of LawData Scientist

Goal

Long term

To build a FAIR data infrastructure that supports empirical legal research

at the Faculty of Law, and makes this kind of research accessible for legal

scholars with limited data science expertise.

Short term

To build a (FAIR) software platform to analyse court decisions

Data sources

2,6 million court decisions

Daily, weekly & monthly updated with decisions

Access via download links on website & API calls

Data

Metadata

Citationss

Case code

Cited laws

Cited cases

Publication date

Court

Data extraction

Data extraction & cleaning scripts

Metadata

Citations

Tested scripts on sample of 2,6 million decisions

Plans to scale the entire data

extraction in the cloud

Data representation

Properties?

Entities?

Relations?

Data representation (common terms)

HCLS Dataset

Descriptions

Bioschemas.org

PROV-O

Dublin Core

PAV

ontology

Ontologies / Controlled Vocabulary (Community maintained)

Data representation (common terms)

EU Vocabularies (EUROVoc)

Common Data Model (CDM) ontology

Data representation (global identifiers)

62014CJ0587 ?

IW/2 1968/2 ?

Case C-16/18 ?

Identifiers for cases can change based on organisation (court) or database

ECLI:NL:CRVB:2014:952

European Case Law IdentifierCountry Court Year ID

Adopt the ECLI convention (uniquely identifies cases on EU level across organisations and databases)

Data representation (multiple formats)entity

attribute

relation

type

instance

Publish our data in both

Relational AND Graph

database formats

Legal Knowledge Graph (long term vision)

Findability & Accessibility

Vary according to the kinds of data, how much free storage and

some added features

Findability & Accessibility

Findability & Accessibility

Next steps

● Extract all citations & metadata for 2.6 million court decisions

● Convert information to graph (RDF) format - Data2Services pipeline

● Publish data in FAIR supporting repositories (Zenodo and OSF)

Lessons learned

● FAIR is not binary (your data is not either FAIR or not FAIR)

● FAIR != open

● A little FAIRness goes a long way

● Findability and accessibility was easier for us

● Interoperability and reusability can be a challenge when there

are few standards in your community

● Steps for making data FAIR may vary depending on the nature

of the project and the data

Thank you!

@MoodleyKody

top related