digitization of documentary heritage collections in indic languagecomparative study of five major...
DESCRIPTION
Presented by Dr. Anup Kumar Das in the International Conference on the Memory of the World in the Digital Age: Digitization and Preservation, 26-28 September 2012, Vancouver, British Columbia, CanadaTRANSCRIPT
Digitization of Documentary Heritage Collections in Indic Language
Comparative Study of Five Major Digital Library Initiatives in India
Dr. Anup Kumar DasJawaharlal Nehru University (JNU)
New Delhi, Indiahttp://www.anupkumardas.blogspot.in/
Presented in the International Conference on the Memory of the World in the Digital Age: Digitization and Preservation, 26-28
September 2012, Vancouver, British Columbia, Canada
Outline
• Introduction• Indicative Multilingual DL Initiatives in India• Digital Library of India (DLI) project• IGNCA maintained Digital Libraries• National Mission for Manuscripts• DL Initiatives with Single Indic Language Contents• Challenges Ahead• Examining Semantic Web Principles • Conclusion
Introduction
• Article 6 of the UNESCO Universal Declaration on Cultural Diversity “Towards Access for All to Cultural Diversity”
• Mandates of Networked Knowledge Societies.• DL as a vehicle for widely disseminating documentary heritages. • Indian DL initiatives aim at producing a vast amount of Multilingual,
Multicultural digitized contents pertaining to different forms of recorded human knowledge, ranging from the rare manuscripts to current literature.
• Culturally diverse contents in multilingual DLs ensure intercultural understanding and intercultural dialogues, a building block for inclusive knowledge societies.
• When establishing digital library with a large collection, collaboration is inevitable.
• Indian DL initiatives achieved multi-stakeholders’ participation with increased international, regional, national and local collaborations.
• Providing metadata information in Indic languages is one of the major challenges in DLs in Indic languages
Introduction
Indicative Multi-/ Bi-lingual DL Initiatives in IndiaName of the Initiative Implementing Agency Funding Agency Website
Digital Library of India (DLI)
Indian Institute of Science; IIIT
Hyderabad; C-DAC
MCIT and others http://www.new1.dli.ernet.in
; http://www.new.dli.ernet.in
;http://dli.cdacnoida.in Kalasampada: Digital
Library Resources for Indian Cultural
Heritage (DL-RICH)
IGNCA MCIT http://www.ignca.nic.in/dlrich.html
National Databank on Indian Art and
Culture (NDBIAC)
IGNCA MCIT http://ignca.nic.in/ndb_0001.htm
Kritisampada : National Database of
Manuscripts
National Mission for Manuscripts, IGNCA
Ministry of Culture http://www.namami.org/
pdatabase.aspx
Panjab Digital Library (PDL)
Panjab Digital Library
Nanakshahi Trust and others
http://www.panjabdigilib.or
g/
Digital Repository of WBPLN (DR-
WBLLN)
West Bengal Public Library Network (WBPLN), CDAC
Kolkata
Directorate of Library Services, West Bengal
http://dspace.wbpublibnet.go
v.in/dspace/
Indicative Multi-/ Bi-lingual DL Initiatives in IndiaName of the
InitiativeImplementing
AgencyFunding Agency Website
Open Access to Oriya Books – Project
OaOb
National Institute of Technology,
Rourkela
NITR; Srujanika, Bhubaneswar;
Pragati Utkal Sangh R
http://oaob.nitrkl.ac.in
Archives of Indian Labour (AIL)
V. V. Giri National Labour Institute &
Association of Indian Labour Historians
Ministry of Labour http://www.indialabourarc
hives.org
Muktabodha Digital Library
Muktabodha Indological Research
Institute
Donations from Individuals & Trusts
http://muktalib5.org/
digital_library.htm
Traditional Knowledge Digital Library (TKDL)
Council of Scientific and Industrial
Research (CSIR)
Department of Ayurveda, Yoga…
(AYUSH)
http://www.tkdl.res.in
National Science Digital Library
NISCAIR, India Council of Scientific and Industrial
Research (CSIR)
http://nsdl.niscair.res.in
Vigyan Prasar Digital Library
Vigyan Prasar, India Department of Science and Technology
http://www.vigyanprasar.g
ov.in/digilib/
Digital Library of India• A partner project of Universal Digital Library (UDL) or Million Books
Project (MBP)• Initiated in India in 2002 as spin-off of Universal Digital Library
project.• 355,000+ documents; top six languages are respectively English,
Sanskrit, Hindi, Telugu, Bengali and Urdu covering about 91.3% of books in major DLI site http://www.new1.dli.ernet.in.
• Becomes a testbed for Indian language technologies, facilitating development of OCR (optical character recognition), TTS (text-to-speech) and other related software for Indian language computing.
• Challenge 1: Indic language contents are not OCR-ed.• Challenge 2: Metadata information not available in Indic languages
for Indic language documents.• Challenge 3: Document is downloaded page-wise in image, html, txt
formats; but not full whole document downloaded in a single click, e.g. in PDF file.
• Challenge 4: Broken links and page is not available – signs of aging.
Multi-stakeholders’ Participationo Principal Coordinator (International) – Carnegie Mellon
University o Principal Coordinator (National) – Indian Institute of Science
(IISc), Bangaloreo Research Coordinator (National) – International Institute of
Information Technology (IIIT), Hyderabado Infrastructure Agency – ERNET Societyo Funding Agencies – MCIT, NSF, PSAo Software and Hardware Solutions – Industrial Partners o Operational Agencies
– Regional Mega Scanning Centres (RMSCs)– Scanning Centres– Source Libraries
Participation in Content Generation
Cultural Institutions (e.g.
Salarjung Museum)
Religious Institutions
(e.g. Tirumala Tirupati
Devasthanam) Government Agencies
(e.g. Rashtrapati
Bhavan)
Industrial Agencies
(e.g. Thrinaina Informatics
Ltd.)
Research Agencies
(e.g. CDAC- Noida)
Academic Institutions(e.g. Anna University)
Digital Library of India
Content Generation Process
Coordination
IGNCA maintained Digital Libraries• Partially open access multilingual and multimedia digital contents
– Kalasampada: Digital Library Resources for Indian Cultural Heritage (DL-RICH)
– Cultural Heritage Digital Library in Hindi (CHDLH)– National Databank on Indian Art and Culture– National Digital Library of Manuscripts
• Supported by DIT, MCIT; Ministry of Culture– Content Development and IT Localisation Network
(COILNET) Programme– Technology Development for Indian Languages (TDIL)
Programme– National Mission for Manuscripts
Collaborative Digital Libraries on Indian Cultural Heritage
Manuscript Libraries
(e.g., Allama Iqbal Library)
Government Agencies
(e.g. Asiatic Society)
Academic Institutions(e.g. Visva-
Bharati)
Museums (e.g. National
Museum)
Oriental Institutions (e.g. Oriental
Research Library)
National Mission for Manuscripts
Archaeological Survey of India
IGNCA’sPartner
Institutions
National Mission for Manuscripts• February 2003 by Ministry of Tourism and Culture, Government of India.
• An ambitious five year project with the specific objectives of locating, documenting, conserving and disseminating the knowledge content of India's manuscripts.
• Established a network of 47 Manuscript Resource Centres, 32 Manuscript Conservation Centres (MCCs), 32 Manuscript Partner Centres (MPCs) and more than 200 Manuscript Conservation Partner Centres (MCPCs) across the country.
• NMM identified 45 collections of Manuscript Treasures of India (MTI). These are very unique and rare collections of manuscripts.
• 5 MTIs have already inscribed on Memory of the World Register.
• Out of 6 inscriptions from India, 5 inscriptions are from MTIs.
• National Digital Manuscripts Library will provide full-text access to all MTIs including which are covered in MoWR.
• Kritisampada: The National Database of Manuscripts provides access to metadata inform of manuscript collections of NMM partners.
DL Initiatives with Single Indic Language ContentsName of Digital Library Organization Focused
Indic Language
Whether Metadata in Indic
Language
S/W used
Digital Repository of W.B. Public Library Network
West Bengal State Central Library & CDAC Kolkata
Bengali Yes, Partial*
DSpace
Panjab Digital Library Panjab Digital Library; Nanakshahi
Punjabi No* -
Open Access to Oriya Books – Project OaOb
National Institute of Technology, Rourkela; Srujanika, BBS
Oriya No* EPrints
Digital Repository of VPM Vidya Prasarak Mandal, Thane
Marathi Yes, Partial*
DSpace
ASI Digital Library Archeological Survey of India; IGNCA New Delhi
English and Sanskrit
No* -
E-Gyankosh Indira Gandhi National Open University, New Delhi
English and Hindi
No* DSpace
* Metadata available mostly in transliterated English
Challenges Ahead
• Lack of national practice for establishing principles of interoperability, cross-search, metadata harvesting, etc.
• Enabling harvesting of metadata from South Asian digital libraries– Protocol for Metadata Harvesting (OAI-PMH) can be adopted– Other similar harvesting method can be applied
• Standardization of transliterated metadata or metadata with diacritical mark
• South Asian documentary heritage collections available worldwide – stock taking
• Innovation in DL development is needed to integrate features of interactive Web 2.0 (such as user interaction and content sharing), Multimedia, and M-Science (accessibility using mobile devices).
Examining Semantic Web Principles • Indic language metadata – providing metadata in all
major Indian languages for a full-text document• Whether ontology-based structure is followed (RT, BT,
NT…)– Standard vocabulary/ structured subject headings/
subject thesaurus vs. user-generated keywords • Whether permanent link is available for a document or a
dynamic link is generated– Rate of link failure or dead links (links to full-text
contents, images, etc.)• Whether contents can be accessed using handheld
devices• Whether text-to-speech (TTS) can be applied
Conclusion
• Helped in bridging digital divide in the country by making Indian language documents freely available to the masses.
• Helped in pushing content localization efforts.• ‘Lean backward’ to digitize important documentary
heritage collections. • “Lean forward” to include born digital contents in
multilingual OA repositories.• National DLs to include rare and out-of-print books and
manuscripts in all Indian languages.• Metadata harvesters for these DLs.
Acknowledgement
thanK You
anY Question?http://www.anupkumardas.blogspot.in/
• UNESCO, UBC and JNU for travel and technical support