MADONNE
MAsses de DONnées issues de la
Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER
L3i Laboratory, la Rochelle UniversityTel : 0033 5 46 45 82 15 – [email protected]
NAVIDOMASS
NAVigation In DOcument MASSes
Two French Projects on Analysis of Cultural Heritage Documents
Mathieu Delalandre (CVC)
IDoc Meeting, Valencia (Spain)
22th February 2007
MADONNE MAsses de DONnées issues de la
Numérisation du patrimoiNE
French ANR program “Masse de données”
Length 36 monthsFunding 110 000 €
NAVIDOMASS NAVigation In DOcument
MASSes
French ANR program “Masse de données et connaissances”
Length 36 monthsFunding 550 000 €
Introduction
Strategy
Model
Processing GUIHigh-Level Meta-Data
of images
Structured and Indexed Information
Cultural Document Images
System
Scope of projects …
The cultural heritage documents correspond to a very large mass of data.
The Madonne/NaviDoMass projects develop document analysis systems allowing to index and to browse inside this mass of data.
2003
2004
2005
2006
2007
2008
2009
Years
Calendar …
Consortium
Centre de Recherche en Informatique de Paris 5 (Paris)
Institut de Recherche en Informatique et Systèmes Aléatoires (Rennes)
Laboratoire Informatique (Tours)
Laboratoire d'InfoRmatique en Image et Systèmes d'information (Lyon)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (Nancy)
Laboratoire d'Informatique de Traitement de l'Information (Rouen)
Laboratoire d’informatique image et interaction (La Rochelle)
Centre d’Etude Supérieures de la Renaissance (Tours)
Professor 8Lecturer 14Post-Doctoral 3PhD Student 9Master Student 15
Engineer 6
55 Project Members
Permanent
On the last 3 years
Companies 5 HP, APROGEIDE … Libraries 5 CHAN, British library …Research Centers 10 CVC, Indian SI …
20 Project Partners
OverviewDocument Layout [Ramel’05]
Bloc segmentation into footnote, text zone, dropcap, figure, ..
Background analysis Foreground analysis
Merging
10 000 pages of old printed books
Text density
Graphic density
Collection Modelling [Journet’06]
Directional rose
Old printed books
Overview
Graphem based signature for handwritten patronymic retrieval
Document Layout and Retrieval [Couasnon’05]
Segmented Cells
(1) Line extraction based on Kalman Filter
(2) Positioning Grammar to correct and build cells from extracted lines
60 000 Forms of XIX° Century
Form viewerRetrieved
patronymic
“access to form”
Query Text Field
Overview
text erasureinterline
Document Layout [Nicolas’06]
Handwritten pages of XIX° century
Segmentation based on Markov Random Field
Dropcap Retrieval [Parreti’05] [Uttama’05] [Delalandre’06] [Salmon’05]
10 000 dropcap images
Pattern rank
Freq
uen
cy
Style retrieval
textures MSTimage Structure retrieval
Printing retrievalquery compacity RLEAccuracy
Letter retrievalimage capital letter
combination of shape
descriptors
PhD Thesis 4Journal Paper 8Conference Paper 43Master Thesis 15Technical Report 6
76 Publications
http://l3iexp.univ-lr.fr/madonne/publications.html
33 SoftwaresLicence 2Free 4Prototype 27
http://l3iexp.univ-lr.fr/madonne/ressources.html
ConclusionResults
Consortium 8 laboratories, 55 members
Renew of project NaviDoMass
WP related to MADONNE
Perspectives
NaviDoMass started since November 2007 …
5 Work Package (WP)
1. Document Layout analysis and structure based indexing
2. Information spotting
3. Structuring the feature space
4. User needs, participative design and groundtruthing
5. Interactive extraction and relevance feedback
New topics