mind the lexical gap- eurovoc luxembourg, 18-19 november 2010 automatic eurovoc indexing of...
TRANSCRIPT
Mind the lexical gap- EurovocLuxembourg, 18-19 November 2010
Automatic Eurovoc indexing of parliamentary documentation
Live demostration
Victoria Fernández MeraCongreso de los [email protected]://www.congreso.es
JRC tool was retrained on more than 80.000 parliamentary Spanish texts (short abstracts, manually indexed with 3 and 3.1 Eurovoc versions).
5th June 2005, the European Community and the Congress of Deputies signed a Software License Agreement to grant a free of charge licence on the software.
It has been the main indexing tool since November 2005.
Available from any computer with a web browser inside the Congress of Deputies.
Login and an associated password to access.
Joint Research Centre automatic indexing software at the Congress of Deputies
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
How does the system work?
Web interface
Web interface
USER
Argo Database
Gets texts
Stores indexation
Simple computer with a Web browser
Server (Linux fedora) With:- Perl installed V5.8.5- Oracle Client 9.0.1- Apache server 2.0.55
eic servernogal.congreso.es
From Bruno Pouliquen. Technical documentation, overview of the tool. Global architecture and requirements. (Information brochure unpublished). 17 p.
ORACLE database
The information is organized on text, numerical and data fields
Gathers information on any and all written communications submitted to the Congress of Deputies.
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Congress of DeputiesParliamentary activities information system
Argo
Types and numerical codes of parliamentary texts
Legislative initiatives: Governments bills (121) Private Members´ bills (122, 123, 124,124) Decree-laws (130) International Treaties (110,111,112)
Control of the Executive: Granting and withdrawal of confidence:
Investiture of the Government (80) Censure motions (82) Question of confidence (81)
Checking on the Government´s performance
Interpelations and motions (161,162,170,171,172,173) Oral and written questions (180,181,184)
Attendances: Members of Government (210, 213, 214) Other Authorities (212, 219)
Government communications, programmes, plans and other reports Nominations and appointments of high-ranking officials to certain State
bodies
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Main indexing language since 1987
Eurovoc official edition
Spanish geographical application
Short abstracts or titles are indexed
Descriptors are only assigned to the one document that start the procedure in the House
Average number of three descriptors assigned
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Eurovoc at the Argo database
Mind the lexical gap- Eurovoc
Luxembourg, 18-19 November 2010
Welcome page
Clicking on Index a Congreso text, we will be ready to index
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Mind the lexical gap-Eurovoc Luxembourg, 18-19 November 2010
Document indexing page
(to tap the numerical code of the texts to index)
• The system always displays all the texts that have not been indexed yet
• Clicking on the box ”ready to index”, we will go to the validation interface.
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Indexation interface
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Validation interface
• It displays a ranked list of 30 descriptors.
• The descriptors assigned are ranked by their score.
• Ticking the corresponding box to choose the good descriptors
• Clicking on the link below “Id”, the browser shows all the thesaurus relations descriptor.
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Thesaurus relations descriptor
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Look for new descriptors (in the box “Add a new descriptor” tap a Eurovoc descriptor code, if known, or a plain text)
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Look for new descriptors (The box “search for …. in Eurovoc” allows to look for new descriptors and look through the thesaurus on line)
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
To display geographical descriptors click on the button “show INE”
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Clicking on “some additional administrative tools here” a new interface performs several funtions
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Clicking on “Add documents”, the system is ready to plan text indexation
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010
Planning indexation (this interface resumes the codes to be indexed)
Conclusions
The software is able to assign keywords from a controlled language
It performs a high average of correct descriptors among the 10 first assigned
It is able to retrain continuously the assignment of new descriptors
It is a reliable system
It gives a list of Eurovoc descriptors, which have to be validated by the human indexers. So, we can define it as a good automatic assignment tool to help and support indexers work.
Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010