eurovoc and parliamentary documents: a semi-automatic classification experience at the camera dei...

37
Eurovoc and parliamentary documents: Eurovoc and parliamentary documents: a semi-automatic classification a semi-automatic classification experience at the Camera dei deputati experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Upload: makayla-rhodes

Post on 27-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Eurovoc and parliamentary documents: Eurovoc and parliamentary documents:

a semi-automatic classification a semi-automatic classification experience at the Camera dei deputatiexperience at the Camera dei deputati

Calogero Salamone

Luxembourg, 19 november 2010

Page 2: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

General

Establishing techniques to allow citizens access to legal information is a matter of primary importance in terms of the fundamentals of public service

Classification of parliamentary and legal resources provide an important support for research

Page 3: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

History

In 1969/1970, Italy’s Chamber of Deputies and Senate began to consider the classification of laws, in the context of early automation projects of the Parliament

An Automatic machine dictionary of Italian language (“Camera 72”) was projected to be used for the information retrieval of legal texts

Page 4: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

History

The project should have included a research system based on the storage of the full text of laws, decrees, treaties etc. dating back to 1848

An accurate legal-linguistic analysis was to establish a classification system to identify and resolve the problems of homographs, polysemy, shifts of meanings This project was abandoned

Page 5: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

History

In 1992 the thesaurus TESEO (TEsauro Senato per l’Organizzazione dei documenti parlamentari) was adopted for the classification of the bills’ database managed by the Senate

The same thesaurus was adopted for the database of parliamentary oversight (Sindacato ispettivo) managed by the Chamber of deputies (questions to the government, motions and resolutions)

Page 6: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

History

TESEO includes 3650 terms grouped into 45 thematic areas (Top Terms), derived from an old home-made classification system and arranged according to the logical structure of the Universal Decimal Classification (UDC)

There are only 358 language equivalent terms (non-descriptors) used for cross-referencing

Page 7: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

From TESEO to EUROVOC

The use of TESEO at Chamber of Deputies was overall satisfactory

Difficulties were sometimes encountered in some areas due to the vagueness or absence of appropriate descriptors

These problems led to creating a supplementary list with additional descriptors

Page 8: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

From TESEO to EUROVOC

In 2005 the Chamber began to consider whether to switch from TESEO to EUROVOC

We considered inter alia the advantages of multilingual classification, including the possibility of connecting different legal and social phenomena under a single system of categorization

Page 9: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

From TESEO to EUROVOC

We also considered the larger number of descriptors available and the even bigger number of language equivalent terms (non-descriptors) available for the italian language

There are some areas arranged in an EU perspective that can be difficult to use in a national perspective.

Page 10: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

From TESEO to EUROVOC

We hope to gradually extend Classification through Eurovoc thesaurus from policy-setting and oversight documents to the whole information system

That’s why we developed a map to match and link the descriptors of Eurovoc to those of TESEO

Page 11: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

We know that automatic classification processes do not achieve the same quality as human indexing does

They can be efficient enough to be used for specific purposes, e.g. to automatically index documents that otherwise would not be indexed at all, or to support the process of human indexing

Page 12: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

The Chamber of deputies chose to test automatic indexing on policy-setting and oversight documents

These are texts written in everyday language whose length is usually limited

Page 13: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

The application of automatic indexing to the classification of legislative texts is probably more difficult

Legislative texts present a higher level of formalization of language and the consistency of documentary units that should be indexed (up to the level of the paragraphs), may probably be too short for the application of automated tools

Page 14: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

The Chamber of Deputies decision to use an automated classification system was finalised in 2005

In an initial phase we started by testing automatic classification through TESEO descriptorsIn a second phase started in 2006, the program was set to automatic classification with Eurovoc thesaurus

Page 15: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

In 2008, with the beginning of the 16th Parliament, the Eurovoc classification of policy-setting and oversight documents of the Chamber of Deputies and the Senate was launched

Page 16: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

We selected a semantic technology solution (COGITO by Expert System), which automatically suggests a set of descriptors to be applied to each document

Each document is analyzed and interpreted in order to be archived quickly in the corresponding category

Page 17: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

The categorizer automatically analyzes each document and suggests a list of descriptors that could be used

This list is checked, modified and validated by a professional operator

Page 18: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

The current procedure is in fact semi-automatic

Automatic suggestions are modified and integrated (amended and supplemented)

The operator is responsible for the selection and final results

Page 19: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

So far, the classification suggested by Cogito categorizer has been used by transferring it manually to another application in order to record Eurovoc descriptors in the database used for research

Page 20: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Page 21: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Page 22: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Page 23: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Page 24: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Page 25: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

History

A new integrated application, Camer@voc, is now available, which enables the automatic Cogito categorizer to analyse all the texts, and then to revise them, as well as validate and record Eurovoc descriptors

Page 26: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

History

Camer@voc is a Web application created to manage the automatic classification of policy-setting and oversight documents

The application also allows the management of various stages of classification and its history

Page 27: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

History

Camer@voc is entirely developed in an open source environment using three-tier architecture

Applicative infrastructure is divided into three different modules dedicated respectively to the user interface (View), the functional logic also called business logic (Model) and the data persistence management (Controller)

Page 28: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Main functionalities:

Sampling of new texts needing to be classified

Page 29: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Page 30: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Main functionalities:

Display lists of documents automatically classified, divided by classification status

Page 31: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Page 32: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Main functionalities:

Viewing and editing the automatic classification of a document; confirmation and subsequent storage of the final classification

Page 33: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Page 34: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Page 35: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

Future developments include a phase of extensive and deep fine-tuning

The aim is to check whether the system ultimately can lead to a high level of response so that it can be considered acceptable - even temporarily - without human intervention

Page 36: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Automatic indexing

In case of positive results, we can consider the possibility of publishing automatic classification before revising it

Users would be warned about this characteristic by a message like “Classification to be reviewed”

Page 37: Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

Questions to:

[email protected]