mind the lexical gap- eurovoc luxembourg, 18-19 november 2010 automatic eurovoc indexing of...

19
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández Mera Congreso de los Diputados victoria.fernandez@congreso .es http://www.congreso.es

Upload: alyssa-martin

Post on 27-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- EurovocLuxembourg, 18-19 November 2010

Automatic Eurovoc indexing of parliamentary documentation

Live demostration

Victoria Fernández MeraCongreso de los [email protected]://www.congreso.es

Page 2: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

JRC tool was retrained on more than 80.000 parliamentary Spanish texts (short abstracts, manually indexed with 3 and 3.1 Eurovoc versions).

5th June 2005, the European Community and the Congress of Deputies signed a Software License Agreement to grant a free of charge licence on the software.

It has been the main indexing tool since November 2005.

Available from any computer with a web browser inside the Congress of Deputies.

Login and an associated password to access.

Joint Research Centre automatic indexing software at the Congress of Deputies

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Page 3: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

How does the system work?

Web interface

Web interface

USER

Argo Database

Gets texts

Stores indexation

Simple computer with a Web browser

Server (Linux fedora) With:- Perl installed V5.8.5- Oracle Client 9.0.1- Apache server 2.0.55

eic servernogal.congreso.es

From Bruno Pouliquen. Technical documentation, overview of the tool. Global architecture and requirements. (Information brochure unpublished). 17 p.

Page 4: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

ORACLE database

The information is organized on text, numerical and data fields

Gathers information on any and all written communications submitted to the Congress of Deputies.

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Congress of DeputiesParliamentary activities information system

Argo

Page 5: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Types and numerical codes of parliamentary texts

Legislative initiatives: Governments bills (121) Private Members´ bills (122, 123, 124,124) Decree-laws (130) International Treaties (110,111,112)

Control of the Executive: Granting and withdrawal of confidence:

Investiture of the Government (80) Censure motions (82) Question of confidence (81)

Checking on the Government´s performance

Interpelations and motions (161,162,170,171,172,173) Oral and written questions (180,181,184)

Attendances: Members of Government (210, 213, 214) Other Authorities (212, 219)

Government communications, programmes, plans and other reports Nominations and appointments of high-ranking officials to certain State

bodies

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Page 6: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Main indexing language since 1987

Eurovoc official edition

Spanish geographical application

Short abstracts or titles are indexed

Descriptors are only assigned to the one document that start the procedure in the House

Average number of three descriptors assigned

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Eurovoc at the Argo database

Page 7: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- Eurovoc

Luxembourg, 18-19 November 2010

Welcome page

Page 8: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Clicking on Index a Congreso text, we will be ready to index

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Page 9: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap-Eurovoc Luxembourg, 18-19 November 2010

Document indexing page

(to tap the numerical code of the texts to index)

Page 10: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

• The system always displays all the texts that have not been indexed yet

• Clicking on the box ”ready to index”, we will go to the validation interface.

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Indexation interface

Page 11: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Validation interface

• It displays a ranked list of 30 descriptors.

• The descriptors assigned are ranked by their score.

• Ticking the corresponding box to choose the good descriptors

• Clicking on the link below “Id”, the browser shows all the thesaurus relations descriptor.

Page 12: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Thesaurus relations descriptor

Page 13: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Look for new descriptors (in the box “Add a new descriptor” tap a Eurovoc descriptor code, if known, or a plain text)

Page 14: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Look for new descriptors (The box “search for …. in Eurovoc” allows to look for new descriptors and look through the thesaurus on line)

Page 15: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

To display geographical descriptors click on the button “show INE”

Page 16: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Clicking on “some additional administrative tools here” a new interface performs several funtions

Page 17: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Clicking on “Add documents”, the system is ready to plan text indexation

Page 18: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

Planning indexation (this interface resumes the codes to be indexed)

Page 19: Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández

Conclusions

The software is able to assign keywords from a controlled language

It performs a high average of correct descriptors among the 10 first assigned

It is able to retrain continuously the assignment of new descriptors

It is a reliable system

It gives a list of Eurovoc descriptors, which have to be validated by the human indexers. So, we can define it as a good automatic assignment tool to help and support indexers work.

[email protected]

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010