mind the lexical gap- eurovoc luxembourg, 18-19 november 2010 automatic eurovoc indexing of...

Post on 27-Mar-2015

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mind the lexical gap- EurovocLuxembourg, 18-19 November 2010

Automatic Eurovoc indexing of parliamentary documentation

Live demostration

Victoria Fernández MeraCongreso de los Diputadosvictoria.fernandez@congreso.eshttp://www.congreso.es

JRC tool was retrained on more than 80.000 parliamentary Spanish texts (short abstracts, manually indexed with 3 and 3.1 Eurovoc versions).

5th June 2005, the European Community and the Congress of Deputies signed a Software License Agreement to grant a free of charge licence on the software.

It has been the main indexing tool since November 2005.

Available from any computer with a web browser inside the Congress of Deputies.

Login and an associated password to access.

Joint Research Centre automatic indexing software at the Congress of Deputies

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

How does the system work?

Web interface

Web interface

USER

Argo Database

Gets texts

Stores indexation

Simple computer with a Web browser

Server (Linux fedora) With:- Perl installed V5.8.5- Oracle Client 9.0.1- Apache server 2.0.55

eic servernogal.congreso.es

From Bruno Pouliquen. Technical documentation, overview of the tool. Global architecture and requirements. (Information brochure unpublished). 17 p.

ORACLE database

The information is organized on text, numerical and data fields

Gathers information on any and all written communications submitted to the Congress of Deputies.

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Congress of DeputiesParliamentary activities information system

Argo

Types and numerical codes of parliamentary texts

Legislative initiatives: Governments bills (121) Private Members´ bills (122, 123, 124,124) Decree-laws (130) International Treaties (110,111,112)

Control of the Executive: Granting and withdrawal of confidence:

Investiture of the Government (80) Censure motions (82) Question of confidence (81)

Checking on the Government´s performance

Interpelations and motions (161,162,170,171,172,173) Oral and written questions (180,181,184)

Attendances: Members of Government (210, 213, 214) Other Authorities (212, 219)

Government communications, programmes, plans and other reports Nominations and appointments of high-ranking officials to certain State

bodies

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Main indexing language since 1987

Eurovoc official edition

Spanish geographical application

Short abstracts or titles are indexed

Descriptors are only assigned to the one document that start the procedure in the House

Average number of three descriptors assigned

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Eurovoc at the Argo database

Mind the lexical gap- Eurovoc

Luxembourg, 18-19 November 2010

Welcome page

Clicking on Index a Congreso text, we will be ready to index

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Mind the lexical gap-Eurovoc Luxembourg, 18-19 November 2010

Document indexing page

(to tap the numerical code of the texts to index)

• The system always displays all the texts that have not been indexed yet

• Clicking on the box ”ready to index”, we will go to the validation interface.

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Indexation interface

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Validation interface

• It displays a ranked list of 30 descriptors.

• The descriptors assigned are ranked by their score.

• Ticking the corresponding box to choose the good descriptors

• Clicking on the link below “Id”, the browser shows all the thesaurus relations descriptor.

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Thesaurus relations descriptor

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Look for new descriptors (in the box “Add a new descriptor” tap a Eurovoc descriptor code, if known, or a plain text)

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Look for new descriptors (The box “search for …. in Eurovoc” allows to look for new descriptors and look through the thesaurus on line)

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

To display geographical descriptors click on the button “show INE”

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Clicking on “some additional administrative tools here” a new interface performs several funtions

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Clicking on “Add documents”, the system is ready to plan text indexation

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Planning indexation (this interface resumes the codes to be indexed)

Conclusions

The software is able to assign keywords from a controlled language

It performs a high average of correct descriptors among the 10 first assigned

It is able to retrain continuously the assignment of new descriptors

It is a reliable system

It gives a list of Eurovoc descriptors, which have to be validated by the human indexers. So, we can define it as a good automatic assignment tool to help and support indexers work.

victoria.fernandez@congreso.es

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

top related