discovery metadata -metadata standards vs. new requirements

Lambert Heller, TIB Open Science Lab11th euroCRIS Strategic Seminar

Brussels, September 9-10, 2013

Discovery MetadataMetadata Standards vs. New Requirements

About me

Lambert Heller

•Social scientist & librarian

•involved with OA & CRIS for Hannover University until 2012

•since 2013 “Open Science Lab” at TIB; WGL “Science 2.0”

•Co-author of book “Opening Science” (Springer, October 2013)

•@Lambo at Twitter

2

Agenda

1. Scholarly objects on Uni Hannover website – some examples

1. Ways to manage discoverable “patchwork metadata”

2. Possible challenges & stuff to discuss …

3

1. Scholarly objects on Uni Hannover website

• Linked and/or embedded YouTube videos of individual lectures (consider technical metadata for videos!)

• Own YouTube channels of professors, but also twitter accounts

• Structured “recommended literature” lists next to own articles

• Whole institutes websites as wikis, with collaborative work of faculty members & alumni in it (example)

• In Wikis and elsewhere: Connections between objects like e.g. videos and traditional materials, “clusters”

• …

• Bottom line: Complex “patchwork” of heterogenous, connected scholarly objects. Objects hosted in many places, often changing, collaboratively authored etc.

4

Some examples (all not to uncommon)

5

a. Individual researchers websites vs. CRIS approach

b. Facebook-like business models, e.g. ResearchGate

c. Aggregation based scientist profile / network services

d. Possible future directions for aggregation services

2. Ways to manage discoverable “patchwork metadata”

6

• Individual institute and / or researcher websites

• up and running since inception of WWW (think CERN!)

• won’t go away with CRIS, sometimes mix with it

• will stay unpredictable, chaotic metadata patchwork

• New(ish) breed of institutional CRIS databases / portals

• projected + driven by research administration staff

• staff has authority to use (some) existing data

• tries to put CRIS into scientists’ record preparation loop

• …sometimes incentives (money)

• almost always cuts off the patchwork richness, and many faculty / institution “unrelated” data as well

a. Individual researchers websites vs. CRIS approach

7

• will work for many, but most probably never for all scientists

• …“not all” = dealbreaker for expectation of complete answers to any queries

• but at least a lesson to take from them: recency, simplicity, puts researcher in control! (To the extent possible in the FB-like business model.)

b. Facebook-like business models, e.g. ResearchGate

8

• BiomedExperts – was based on literature harvesting; predecessor / building block of Elsevier SciVal Experts

• Direct2Experts – network of CTSA member institutions (and others) mostly in biomedical area; institutions open + prepare their researcher metadata for harvesting

• Several new nationwide / funder approaches, not only aimed at discovery, but research measurement as well

c. Aggregation based scientist profile / network services

9

• AgriVIVO et al. – harvesting from institutional, heterogenous data for better discovery in one research area; may well be reproducible for some more communities?

• ORCID – bulk registration of scientists through institutions, scientists then ultimately in controll of their data; certainly an important building block for future aggregation services

• Proposal currently prepared by L3S and TIB: Large scale web crawl, then analyzing with few assumptions upfront – may result in reusable “web observatory” style collections / snapshots of institutional, heterogenous data?

d. Possible future directions for aggregation services

10

3. Possible challenges & stuff to discuss …

a. Aggregators & assessment tools will be widely scattered – let’s make them linked & discoverable, too!

a. “Social Media” will be a workbench – get the details!

b. Let’s extend & apply trusted vocabulary for “social web” type object relations

11

a. Aggregators & assessment tools will be widely scattered – let’s make them linked & discoverable, too!

• As good metadata of scholarly objects gets openly available, services collecting, computing and comparing data become abundant. This may even help to get rid of reliance on single research measurements. (Cf. San Francisco Declaration on Research Assessment, DORA)

• Problem: Archives / hosts (e.g. uni repositories, publishers, Wikimedia) won’t include / link to every meaningful service.

• Challenge: Establish a standard similar to “semantic pingback” so archives / hosts get structured info on how metadata is used by 3rd parties. Users can be given multiple options in which context to view / compare the object they are interested in.

• Example: Algorithms and services comparing / rating individual Wikipedia contributions and contributors. (Examples 1,2,3)

• Example: Aggregation services mentioned on slides before.

12

b. “Social Media” will be a workbench – get the details!

13

b. “Social Media” will be a workbench – get the details!

• Problem: Blogs, wikis etc. often perceived as anonymous heap of data that can be queried in total to derive some altmetrics number on a given DOI (or similar)

• Instead, researchers’ “SM” profiles will deliver dynamic, rich information often (but not always) connected to their traditional resarch products

• Challenge: Get meaningful, rich metadata on level of single objects (and changes to collaborative objects) into CRIS. By definition: No trusted archive = no DOI. So maybe we need a layer (think URL shorteners) of HTTP-Handles for each object that deliver machine readable, well linked metadata (e.g. JSON-LD) upon request.

• Examples: Make each Wikipedia-Edit, Github-Pull Request, MathOverflow forum post… easily linkable, countable, citable.

c. Let’s extend & apply trusted vocabulary for “social web” type object relations

• Problem: Stating new object types (e.g. program code, blog entries, wiki contributions…) are e.g. “research data” and append them to an article, or just to append “SM”-profiles to scientists’ profile pages, is not sufficient for discovery.

• Challenge: We need ontologies to relate objects to each other and to “traditional objects” like journal articles. We should work with and extend upon David Shottons CiTO ontology. Model implementations will be the tricky and interesting part.

• Example: The “Viewed / cited / saved / discussed” ontology of article level metrics, proposed in NISO ISQ 2013, v.25, no.2

14

Thank you for your attention!

discovery metadata -metadata standards vs. new requirements

Education