core: aggregating and enriching content to support open access
DESCRIPTION
The last 10 years have seen a massive increase in the amounts of Open Access publications available in journals and institutional repositories. The existence of large volumes of free state-of-the-art knowledge online has the potential to provide huge savings and benefits in many fields. However, in order to fully leverage this knowledge, it is necessary to develop systems that (a) make it easy for users to access, discover and explore this knowledge, (b) that lower the barriers to the development of systems and services building on top of this knowledge and (c) that enable to freely analyse how this knowledge is organised and used. In this paper, we argue why these requirements should be fulfilled and show that current systems do not satisfy them. We also present CORE, a large-scale Open Access aggregation system, outline its functionality and architecture and demonstrate how it addresses the above mentioned needs and how it can be applied to benefit the whole ecosystem including institutional repositories, researchers, general public and government.TRANSCRIPT
![Page 1: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/1.jpg)
1/52
CORE: Aggregating and Enriching Content to Support Open Access
Petr KnothThe Open University
![Page 2: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/2.jpg)
2/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what for?
2. The CORE system3. Supporting research in mining databases of scientific
publications (DiggiCORE)
![Page 3: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/3.jpg)
3/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what for?
2. The CORE system3. Supporting research in mining databases of scientific
publications (DiggiCORE)
![Page 4: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/4.jpg)
4/52
Growth of items in Open Access repositories
![Page 5: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/5.jpg)
5/52
Growth of Open Access repositories
![Page 6: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/6.jpg)
6/52
Growth of articles in OA journals
![Page 7: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/7.jpg)
7/52
Growth of OA journals
![Page 8: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/8.jpg)
8/52
Green Open Access - statistics
![Page 9: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/9.jpg)
9/52
Why we need aggregations?
“Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’
[COAR manifesto]
![Page 10: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/10.jpg)
10/52
Access to information according to the level of abstraction
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Semantic Enrichm
ent
Interfaces
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information access
Aggregation
![Page 11: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/11.jpg)
11/52
Who should be supported by aggregations?
The following users groups (divided according to the level of abstraction of information they need):
• Raw data access. • Transaction information access.• Analytical information access.
![Page 12: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/12.jpg)
12/52
Who should be supported by aggregations?
• The following users groups (divided according to the level of abstraction of information they need):• Raw data access. Developers, DLs, DL researchers, companies …• Transaction information access. Researchers, students, life-long learners …• Analytical information access. Funders, government, bussiness intelligence
…
![Page 13: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/13.jpg)
13/52
Layers of an aggregation system
Metadata Transfer Interoperability
OLTP OLAP
Metadata Content
Enrichment
Interfaces
![Page 14: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/14.jpg)
14/52
Layers of an aggregation system
Metadata Transfer Interoperability
OLTP OLAP
Metadata Content
Enrichment
Interfaces
OAI-PMH, OAI-ORE … Dublin Core, XML, RDF … PDF, Word …
Annotations
Catalog records
StatisticsAPIs (REST, SOAP, XML-RPC), UIs, Dashboards
![Page 15: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/15.jpg)
15/52
Access to information according to the level of abstraction
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 16: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/16.jpg)
16/52
Related systems
![Page 17: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/17.jpg)
17/52
Aggregation projects – BASE
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 18: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/18.jpg)
18/52
Aggregation projects – OAISter/WorldCAT
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 19: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/19.jpg)
19/52
Aggregation projects – RepUK
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 20: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/20.jpg)
20/52
Aggregations need access to content, not just metadata!
• Certain metadata types can be created only at the level of the aggregation
• Certain metadata can be changing in time• Ensuring content:• accessibility• availability• validity• quality• …
![Page 21: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/21.jpg)
21/52
Aggregation projects – CiteSeerX
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 22: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/22.jpg)
22/52
Should an aggregation system support all three user types?
Can be realised by more than one systemproviding that
the dataset is the same!
![Page 23: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/23.jpg)
23/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what for?
2. The CORE system3. Supporting research in mining databases of scientific
publications (DiggiCORE)
![Page 24: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/24.jpg)
24/52
CORE objectives
• CORE aims to provide a comprehensive technical infrastructure
for Open Access scholarly publications that will support access and reuse of scholarly materials at different levels of abstraction.
• A nation-wide aggregation system that will improve the discovery of publications stored in British Open Access Repositories (OARs).
![Page 25: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/25.jpg)
25/52
What does CORE provide at different aggregation levels?
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 26: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/26.jpg)
26/52
CORE functionality
![Page 27: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/27.jpg)
27/52
CORE functionality
Step 1: Metadata and full-text harvesting
Content harvesting, processing
![Page 28: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/28.jpg)
28/52
What does CORE provide at different aggregation levels?
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
Semantic similarity, Citation extraction, classsification, …
![Page 29: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/29.jpg)
29/52
CORE functionality
Step 2: Semantic enrichment
Semantic enrichment
![Page 30: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/30.jpg)
30/52
What does CORE provide at different aggregation levels?
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 31: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/31.jpg)
31/52
CORE functionality
Step 3: Providing a set of services on top of the aggregation
Providing services
![Page 32: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/32.jpg)
32/52
CORE applications
• CORE Portal• CORE Mobile• CORE Plugin• CORE API• Repository Analytics
![Page 33: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/33.jpg)
33/52
What does CORE provide at different aggregation levels?
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 34: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/34.jpg)
34/52
CORE ApplicationsCORE Portal – Allows searching and navigating scientific publications aggregated from Open Access repositories
![Page 35: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/35.jpg)
35/52
CORE Applications
CORE Mobile – Allows searching and navigating scientific publications aggregated from Open Access repositories
![Page 36: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/36.jpg)
36/52
CORE ApplicationsCORE Plugin – A plugin to system that recommendations for related items.
![Page 37: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/37.jpg)
37/52
What does CORE provide at different aggregation levels?
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 38: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/38.jpg)
38/52
CORE ApplicationsCORE API – Enables external systems and services to interact with the CORE repository.
![Page 39: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/39.jpg)
39/52
What does CORE provide at different aggregation levels?
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
![Page 40: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/40.jpg)
40/52
CORE ApplicationsRepository Analytics – is an analytical tool supporting providers of open access content (in particular repository managers).
![Page 41: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/41.jpg)
41/52
What does CORE provide at different aggregation levels?
Repository
Repository
RepositoryRaw data access
Transaction information access
Analytical information accessInterfaces
Metadata Transfer
Interoperability
OLTP
OLAP
Metadata
Content
Enrichment
Repository Analytics
CORE API
CORE Portal, CORE Mobile, CORE Plugin
![Page 42: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/42.jpg)
42/52
CORE statistics
• Content• 5.4M records• 192 repositories• 402k full-texts
• Started: February 2011• Budget: 140k£
![Page 43: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/43.jpg)
43/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what for?
2. The CORE system3. Supporting research in mining databases of scientific
publications ( )
![Page 44: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/44.jpg)
44/52
Partners
Advisory Board
![Page 45: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/45.jpg)
45/52
Objective
Software for exploration and analysis of very large and fast-growing amounts of research publications stored across Open Access Repositories (OAR).
![Page 46: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/46.jpg)
46/52
DiggiCORE networks
Three networks: (a) semantically related papers,(b) citation network, (c) author citation network
![Page 47: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/47.jpg)
47/52
DiggiCORE objectives
Allow researchers to use this platform to analyse publications. Why?• To identifying patterns in the behaviour of research
communities• To detect trends in research disciplines• To gain new insights into the citation behaviour of researchers• To discover features that distinguish papers with high impact
![Page 48: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/48.jpg)
48/52
Questions the system can help answering?
• What are the attributes of impact publications?• Do these attributes differ in the humanities, social sciences and
computer sciences?• What are the features of research groups within disciplines and
how do these features relate to contributions generated by the group?
• What are the attributes of high-impact authors and what is their role within the group?
• What are the dynamics of successful research groups?
![Page 49: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/49.jpg)
49/52
Questions the system can help answering?
• What is the mechanism of cross-fertilisation within disciplines, especially between the humanities and the sciences?
• Who are the authors whose work is worth monitoring because they contribute to the achievements of their own discipline and also inspire other disciplines?
• How should the novice in the discipline get acquainted with key achievements in the discipline?
• How should he/she search for the most important publications?
![Page 50: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/50.jpg)
50/52
Summary
• The rapid growth of OA content provides both an opportunity as well as a challenge.
• Aggregations should serve the needs of different user groups. • Aggregations need to aggregate content, not just metadata. • We can have many services that are part of the infrastructure,
but should work with the same data.
![Page 51: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/51.jpg)
51/52
Thank you!
Yes we can!
![Page 52: CORE: Aggregating and Enriching Content to Support Open Access](https://reader036.vdocuments.us/reader036/viewer/2022070313/554bdb9bb4c905ac708b5366/html5/thumbnails/52.jpg)
52/52