neeo project ec final review meeting gateway and portal 23 march 2010 benoit pauwels université...

17
NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

Upload: calvin-cory-webb

Post on 31-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

NEEO project

EC Final review meetingGateway and portal

23 March 2010

Benoit PauwelsUniversité Libre de Bruxelles, Belgium

1

Page 2: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

2

• Overview of technical infrastructure

• EO as a network of data providers – descriptive metadata

• EO as a network of data providers – usage statistics

• Added value services• Publication lists• Enriched metadata• Full-text searching• Multilinguality

• Collaboration with RePEc

• EO gateway and portal

Plan

Page 3: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

Meresco

Metadata

Harvester

Objects

HTTP

Crawler

Metadata

Lucene

EO portal Homemade - FOSS

Exporter engineHomemade - FOSS

Logs

OAI-PMH

OAI-PMH RSS/Atom

Other portals

SRU

RePEc

SRU

Enrichment service

OA

I-PM

H

DIDL / MODS SWUP

Page 4: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

4

Descriptive metadata exchange format

Desired EO functionality Technical decision

Facetted search&find experience Normalized/normalizable metadata

APA formatted citations Granular metadata

Publication list per EO author Unambiguous identification of authors

Full text indexing/searching Unambiguous links to full texts

Enrichment of metadata (JEL, datasets, citations)

Extensible metadata format

Page 5: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

5

• DIDL – XML container structure that can hold semantically distinct metadata• Descriptive, object files (by-ref), splash page, enriched metadata • Based on existing container structure defined by SurfShare

• MODS (3.2) – granular descriptive metadata• Based on existing metadata structure defined by SurfShare

• DAI – Unambiguous identification of authors• National or institution-unique persistent identifier

• Continuous aim of standardization at a level that surpasses the NEEO project• NEEO adaptations fed back to SurfShare

Descriptive metadata exchange format

Page 6: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

DIDL[1]

Item[1]

Descriptor/Identifier (persistent identifier)

Item[1..∞] (of type descriptiveMetadata)

Descriptor/type (« descriptiveMetadata »)

Component/Resource -- representation by value (XML)

Item[0..∞] (of type objectFile)

Component/Resource -- representation by ref. (URL)

Descriptor/modified

Descriptor/Identifier (persistent identifier)

Descriptor/modified

Descriptor/type (« objectFile »)

Descriptor/Identifier (persistent identifier)

Descriptor/modified

Item[0..1] (of type humanStartPage)

Component/Resource -- representation by ref. (URL)

Descriptor/type (« humanStartPage »)

EO descriptive metadata model

• Publication is described as a complex (compound) object– persistent identifier

• Aggregation of 3 types of components– descriptiveMetadata (MODS)– objectFiles– humanStartPage

• Extensible– additional items can be stored within

the complex object

• MODS contains DAI of EO author

• Semantic Web - Linked Data – OAI-ORE ready

Page 7: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

7

• Central EO gateway

• DIDL and MODS application profiles• Vocabularies in DIDL and MODS

• Technical guidelines for project partners• All documentation is OA available

• Partner solutions: home-made or with external support

• ARNO home-made• Dspace home-made, AtMire• Eprints home-made, ECS-University Of Southampton• Fedora METS/MODS -> DIDL/MODS• DigiTool METS/MARC -> DIDL/MODS

• All original partners + 2 new partners

Descriptive metadata exchange format

Page 8: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

8

• Aim: sustainable solution for big network with many partners

• Decentralized Admin file

• Format XML-RDF | FOAF + NEEO-specific vocabulary• Decentralized file sits on local web server of project partner• Content - information of institution : name, description, ...

- OAI baseURL + OAI sets to harvest- EO authors: DAI, photograph, full name, affiliation

• EO gateway HTTP gets and validates at regular intervals• Used for - information in EO portal screens

- publication lists (match on DAI)- automated harvesting process

Decentralized registry service

Page 9: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

9

Usage statistics – EO use case• EO use case: present download rates through EO portal per publication,

scholar, institution

• Normalization of exchange format and communication protocolOAI-PMH exchange of SWUP OpenURL ContextObjects (Scholarly Works Usage Community Profile)

•Special considerations:• Enryption of IP address of requester (MD5)• Filtering out robot requests (list of 50 regular expressions)• Filtering out double clicks

• Similar initiatives come together at Knowledge Exchange workshop, Berlin 29-30 March 2010• JISC (Usage Statistics Review project), Pirus2, SurfSure, Counter, Mesur,

OA-Statistik, Economists Online

Page 10: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

10

Usage statistics – implementation status• Central EO Gateway – DoDoCo (Document Download Counter)

• PMH harvesting of SWUP ContextObjects into SQL database• Enrich with information on item, scholar, institution• Web servicelevel (item, scholar, institution) + date range

• Technical guidelines for project partners (OA available)

• Partners

• Implementation - for all major IR platforms- solution for Combined Log Format web logs

• Registration through Admin file• 7 original + 1 new partner

• Not enough data available

• Not visible through EO portal yet, although DoDoCo software is ready

Page 11: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1
Page 12: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

12

• Publication lists

• Per DAI of authors who are registered in Admin file

• SRU extract publications from EO gateway and Format• APA+ in HTML

• with links to full text in EO partner repository• with links to publisher sites (through OpenURL resolution)

• APA in PDF• APA in RTF• RIS• BibTex

Added value services

Page 13: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

13

• Enriched descriptive metadata

• JEL classification

• Enrichment service (ES) gets records to be enriched from EO, over SRU• ES creates enrichment record(s), using text mining technology• ES makes enrichment record(s) available to EO, over OAI-PMH• EO harvests enrichment records from ES and integrates into original record• EO reuses enrichment information in its services: index & present

• Bibliographic references

• Through collaboration with RePEc/CitEc

• Visible through EO portal

Added value services

Page 14: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

14

• Full-text search service

• Process

• Full-text indexer component in Meresco fetches relevant records from EO Gateway over SRU

• Follow links to PDF object files • Text is extracted from PDF, and added to record through SRU

Update • EO can now index & present

• Prototype exists

• Not yet fully deployed in EO portal

Added value services

Page 15: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

15

• Multilinguality (EN, FR, GE, ES)

• Complete EO portal interface• JEL classification• MLIA functionality in EO portal

• Student thesis – Prof. Bouillon (Univ. Of Geneva -- multilingual information processing department )• (uncustomized) Systran and Google Translate show equivalent results

• Contacts with CACAO (also through Europeana)• comes as a complete portal solution, not as an add-in for existing portals

like EO• Considerations:

• Lingua franca in economics = EN• NEEO = NOT research project in linguistics, aim: reuse best existing

technology Use “Google Translate” for translation of queries

Added value services

Page 16: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

16

• Harvesting metadata from RePEc into EO• AMF to DIDL/MODS mapping

• Push metadata from EO to RePEc• “RePEc:ner” archive, with separate series for each EO institution• According to agreed-upon reviewed ReDIF format

Admin file directives in order to limit overlap

• Contribute to LogEc

• Reuse CitEc data in EO portal

Collaboration with RePEc

Page 17: NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

17

• Gateway – metadata store and search engine • Choice between Summa, SOLR/Lucene, Meresco• Open source solution, based on Lucene search engine • Support available from software developers (CQ2 company)• Has proven its qualities in the past (DARENet)

• Portal• First version: home-made• Final version:

• outsourced design to private company• HTML, CSS, JavaScript, all images

EO gateway and portal