digitization practices in india: issues and challenges v.n. shukla

47
Digitization Practices in India: Issues and Challenges V.N. Shukla

Upload: liliana-wood

Post on 22-Dec-2015

229 views

Category:

Documents


7 download

TRANSCRIPT

Digitization Practices in India: Issues and

Challenges

V.N. Shukla

2

C-DAC, NOIDA UNITC-DAC, NOIDA UNITC-DAC, NOIDA UNITC-DAC, NOIDA UNIT

MISSION MISSION C-DACC-DAC

NATURAL LANGUAGE PROCESSING AND

INTERFACES

NATURAL LANGUAGE PROCESSING AND

INTERFACES

HUMAN RESOURCE DEVELOPMENT IN

HITECH AREAS

HUMAN RESOURCE DEVELOPMENT IN

HITECH AREAS

INFRASTRUCTURE AND SUPPORT

SERVICES

INFRASTRUCTURE AND SUPPORT

SERVICES

SPECIAL INDUSTRIAL

APPLICATIONS

3

AREAS OF COMPETENCEAREAS OF COMPETENCE

Graphical Display System

Security Systems

Embedded System

System Engineering and Consultancy

NLP

Solar Energy System

E-Governance

Internet on CATV & E-Commerce

.

.

.

NOIDANOIDA

•Digital Library Projects

•Mega Centre for Digital Library•Mobile Digital Library : Dware Dware Gyan Sampada•Digital Library at President’s House•Digital Library at Nagari Pracharini Sabha Varanasi•Digital Library at Uttaranchal•GyanNidhi : Multilingual Parallel Corpus in Indian Languages•Digital Library at Gujrat Vidyapeeth ,Ahmedabad•Digitization of Libraries

Digital Library Activities : CDAC Noida

Digital Library Mission

Online ContentBillions of web pages

Offline ContentBillions of items still unindexed

To organize the information and make it universally accessible and useful.

DL Initiatives

~85% of books are out of print and/or out of copyright – these books are only found in libraries

GOAL: Create a comprehensive virtual card catalog of all books in all languages, while respecting publishers’ rights

Only ~15% of books are in print

Source: Google

Metadata Search

DL creation & processes

Users

Traditional Libraries

Digital Libraries

I NDEX

Index

Hyperlinks

92% of the world's books are neither generating revenue for the copyright holder nor easily accessible to potential readers.*

The value is in the middle

A Typical Library Collection

In-Print Public DomainUnclear copyright status• May be in copyright, but not for sale • Rights may have reverted to author• May be in the public domain

Less than 20%**~65% or more

15%

*Source:  Covey, Denise Troll.  "Global Cooperation for Global Access:  The Million Book Project“**OCLC analysis of the Google Books Library Project: http://www.dlib.org/dlib/september05/lavoie/09lavoie.html   

~15%

Digital Library (DL) may be seen as “Collection of intelligent creations by human beings through their own language and culture. It also reflects cultural heritage besides providing archive and generating many research issues pertaining to Natural Language Processing”

According to other definition Digital libraries are

“Organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily available for use by a defined community or set of communities”.

Digital Library ?

Sun Microsystems defines a digital library as the electronic extension of functions users typically perform and the resources they access in a traditional library.

These information resources can be translated into digital form, stored in multimedia repositories, and made available through Web-based services.

What is Digital library ?

A Service? An Architecture? A set of Information Resources? A set of tools to locate, search, retrieve

information? Possibly the tools to create such resources and

services also fall within the purview of DLs Digital face of traditional libraries Include both digital collections and traditional Backbone and nervous system of libraries.

•Efficient & qualitative services by collecting, organizing, storing, disseminating, retrieving and preserving the information.

•Preservation benefits besides making information retrieval & delivery more comfortable.

•Online access to historical and cultural documents whose existence is endangered due to physical decay.

Digital libraries necessarily include a strong focus on the management of digital content, just as traditional libraries have focused for long on the management of content in physical forms.

Digital library Vs traditional libraryDigital library Vs traditional library

The major areas for great exploitation are:

• Information retrieval, • multimedia,• database, • data mining, • data warehouse, • on-line information repositories, • image processing, hypertext, • World Wide Web and wide area information services (WAIS).

Most of the digital content that is being managed includes:

• Human Language, in various forms character-coded electronic text, scanned images, printed or handwritten text or human speech.

• Language technology helps in managing digital content

• Management through learning from past experience also adds to manage content

Digital Content ManagementDigital Content Management

• Access anywhere

• Reducing delays

• Distributed storage – central access

• Better cataloguing • Cross references to other documents

• Full text search

• Protected information source • Wide exploration and exploitation of the information

Few advantages of digital libraries

The information explosion, the wide bandwidth data networks and the potential The information explosion, the wide bandwidth data networks and the potential of Internet-based technologies - such as the Web - make digital libraries one of of Internet-based technologies - such as the Web - make digital libraries one of the important application areas of computer science.the important application areas of computer science.

Process of Digital Preservation

Centralized Server

Centralized Server

Book scanning status

Book scanning status

XML Meta File Creation using

Dublin core Std.

XML Meta File Creation using

Dublin core Std.

Scanned Image in TIFF

format

Scanned Image in TIFF

format

S/w to divide even & odd

pages

S/w to divide even & odd

pages

Batch cropping & Cleaning

Batch cropping & Cleaning

OCROCRConversion to

TXT/RTF/HTML

Conversion to TXT/RTF/HTML

Yes

No

Uploading

Reject the Book

Reject the Book

Goals of DL

Focused on digitization technology, metadata schemes, data management techniques, and digital preservation.

Second-generation digital library exploring new opportunities and developing new

competencies. Third-generation digital library

focusing instead on fully integrating digital material into the library’s collections through a modular systems architecture.

Ingredients for DLs

Hardware The minimum machinery to do the job

Software The programs for handling data

Digital Objects Articles, Conference Papers, Thesis,…… Basic Skills

Things one has to learn

Hardware

A Server You’ll need access to a web server

A good PC Scanners

Flatbed – Auto feed, Back to back

MF

Book Scanner

Software

Open Source Software (OSS)

Dspace, E-Prints, Fedora, GSDL……

Proprietary software you can’t avoid Image Editing and Optical Character Recognition Software

have to be purchased

Content is King

The information content is more important than the systems used for its storage, management and retrieval

Objects should not be “locked” in specific DLs or archives

Creating DLs …

Six steps Selecting Acquiring Digitization Creation Of Meta Data Organizing Archiving Providing Access

Possible Delivery Formats

Pure image formats: TIFF, JPEG Open encoded formats: XML, HTML, ASCII, and

Unicode Hybrid formats: PDF, DjVu – can contain both image and

text

Proprietary formats: Microsoft Word, WordPerfect

Digitization: Issues

Copyright Access copy and archive copy File size Storage media( CD, Hard disc…) File format ( TIFF,JPEG…)

25

Challenges in Digitization

Building digital collections of national importance from

existing texts, documents, images . . .

Creating new digital documents & linking them

Subject portals: Selecting and maintaining open source

digital resources

Developing / adapting management tools for digital

collections

Providing access to digital collections

26

Challenges..

Integrating digital & other library collections

incl. integration of OPACs, subscribed e-resources and

subject portals

Establishing services for digital libraries

online access & offline support

education & training of users and librarians

Addressing social, legal, policy issues

Challenges in Publishing

Preservation of layout

Searchability of content and metadata

Efficient image compression

Easy browsing of books

Accommodating low bandwidth user

Multilingual text support

Multipaging

Digital Library Support in India

Funding Ministry of Communication & Information Technology

(MIT) Ministry of Human Resource Development (MHRD) Manuscript Mission of India Department of Scientific & Industrial Research (DSIR-

TRP) All India Council for Technical Education (AICTE) University Grants Commission (UGC)

29

Library Consortium in India Scholarly Science Journals Theses & Dissertations Institutional E-Print Archives Books (out of copyright) Manuscripts Newspapers Online Courseware Open Access at Metadata Level Portal and Gateway Services

Digital Library Initiatives in India

Government of India

Min. of C&IT Min of Culture

INDEST-AICTE Consortium

Others

CSIR E-Journals Consortium

UGC Infonet Consortium

FORSA Consortium

National Manuscript Library

Universal Digital

Library

IIM Libraries Consortium

Digital Library of India Digital Library of India Digital Library of India Digital Library of India

Participating centers of DLI

IISc

IIIT-H State & CityCentral LibraryUniversity of Hyderabad

MIDC Pune University

AKCE

SASTRAASR Melkote

Sringeri Mutt

Anna University

TTD Tirupati

IIIT-Allahabad

CDAC Noida

Rashtrapathi Bhavan

Mega Scanning Centres atIIITH, IIITA

CDAC- Noida and Kolkatta

PTU-1PTU-2PTU-3

Goa University

Kanchi MuttIISc, IIAP,

PoornaPragya

CDAC Kolkata

ERNET

Digital Library Initiatives in India

Some Examples

April 20, 2009 Workshop on Institutional Repositories 33

Digital Library of India

http://www.dli.ernet.in/

April 20, 2009 Workshop on Institutional Repositories 35

http://www.ias.ac.in/

April 20, 2009 Workshop on Institutional Repositories 36

http://www.insa.ac.in/

April 20, 2009 Workshop on Institutional Repositories 37

http://medind.nic.in/

April 20, 2009 Workshop on Institutional Repositories 38

39

Manuscripts India has the largest collection of manuscripts in the world (5 million

Approximately).

India is the repository of an astounding wealth of ancient knowledge belonging to different periods of history, going back to thousands of years. Most of this knowledge belonging to different areas of intellectual activity such as religion, philosophy, science, arts and literature is preserved in the form of manuscripts. Composed in different Indian languages and scripts, they are preserved in materials such as birch bark, palm leaf, cloth, wood, stone and paper.

National Manuscript Mission was launched five-year programme in Feb., 2003 by the Ministry of Human Resource Development, Govt. of India to get all the manuscripts and conserve them.

http://namami.nic.in/

43

Archives of Indian Labour

V.V. Giri National Labour Institute

Heritage of Indian Working Class

Commissions on Labour

Oral History Collections

Trade Union Collections

Regional Collections

Strike Collections

Powered by Green Stone Digital

Library

http://www.indialabourarchives.org/

Digital Libraries Benefits : Individual

Gain access to the holdings of libraries worldwide through automated catalogs. Locate both physical and digitized versions of scholarly articles and books.

Optimize searches, simultaneously search the Internet, commercial databases, and library collections.

Save search results and conduct additional processing to narrow or qualify results.

From search results, click through to access the digitized content or locate additional items of interest.

All of these capabilities are available from the desktop or other Web-enabled device such as a personal digital assistant or cellular telephone.

Conclusion

Digital Libraries are redefining the role of libraries in society & the role of librarians & information specialists

National level mechanism is essential to promote and coordinate open access and public domain digital library systems

Improve awareness of open access Regular training – tools, processes, standards Support setting up of working models, services National Resource Centre for open access publishing

International agencies like UNESCO, ICSU, ICSTI, CODATA need to actively promote and support developing country initiatives

References

Digitization Of Library Forum Survey 2010. IT Act . Available at www.mit.gov.in/it-bill.htm.

A digital library for education: the PEN-DOR project. The Electronic Library, 17(2), 75-82.

Government of India. 2000. “Background Report on IT for Masses” itformasses.nic.in/vsitformasses/page1.htm

Government of India. 2000. IT for the Common Man: The Millenium IT Policy. Department of Information.

Thank You