development of national digital library of india · development of national digital library of...

Post on 24-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

NDLI: LOOKING BEYOND THE HORIZON

Development of National Digital Library of India

Towards Building a National Asset

Dr. Plaban Kumar Bhowmick

plaban@cet.iitkgp.ernet.in

Co-PI, NDL Project, NME-ICT, MHRD

Indian Institute of Technology, Kharagpur

L I N E 2 0 1 7 - L I B R A R I E S I N N E X T E R A , K A L Y A N I U N I V E R S I T Y

1 6 - D E C - 2 0 1 7

N M E I C T : N a t i o n a l M i s s i o n o n E d u c a t i o n T h r o u g h I n f o r m a t i o n a n d

C o m m u n i c a t i o n T e c h n o l o g y

16/12/2017 IIT, Kharagpur

The current Vision

Technology to Realize current Vision

What’s there beyond the horizon…..?

Initiatives to realize the future

BUILD UP

NATIONAL DIGITAL LIBRARY OF INDIA

AS A NATIONAL KNOWLEDGE ASSET –

THE KEY DRIVING FORCE FOR EDUCATION, RESEARCH, INNOVATION, AND

KNOWLEDGE ECONOMY IN INDIA

16/12/2017

NDL Vision

IIT, Kharagpur

3

TO CREATE A 24X7-ENABLED INTEGRATED NDL

AS A UBIQUITOUS DIGITAL KNOWLEDGE SOURCE OF THE NATION – CATERING TO

IMMERSIVE E-LEARNING FOR

ALL LEANERS AT ALL LEVELS IN ALL AREAS

TO INITIATE A MOVEMENT FOR INTEGRATED DIGITAL LEARNING ACROSS INDIA

16/12/2017

NDL Mission

IIT, Kharagpur

4

INCLUSIVE

&

OPEN

16/12/2017

NDL Motto

IIT, Kharagpur

5

National Digital Library: Issues

16/12/2017

User-side Wide geographic expanse & Large

population Huge number of students Large number of institutions Varied linguistic diversity Severe lack of Teachers

Provider-side Wealth of digital content

Books and Articles ETD Question Papers and Solutions Video Lectures - MOOCs Simulations & Animations NMEICT Projects Data …

No single-window search Google search uses keyword – no

metadata search Widely varied DL technology Lack of Interactivity, Vernacular

support Low integration between content

and learning system Weak ecosystem between learners

and teachers

IIT, Kharagpur

6

Presentation Model

16/12/2017

Not a new library – an umbrella

Collects and ingests metadata only

Presents full-text from source view

Provides:

Search

Browse

IIT, Kharagpur

7

16/12/2017

NDLIndia Live: https://ndl.iitkgp.ac.in/

NDL Project: http://www.ndlproject.iitkgp.ac.in/ndl/

NDLIndia on Facebook: https://web.facebook.com/NDLIndia/

NDLIndia on YouTube: https://www.youtube.com/watch?v=LEwAyHGKeLw

https://www.youtube.com/watch?v=qIZB-G9ywF0 https://www.youtube.com/watch?v=UCoJwfPrQFs&t=115s

IIT, Kharagpur

T A R G E T S

C O N T E N T S ,

S T A K E H O L D E R S ,

C O N T R I B U T O R S ,

U S E R S ,

A R C H I T E C T U R E , A N D

T H E B I G P I C T U R E

16/12/2017

Objective and Scope

• Books are for use • Every reader his [or her] book

• Every book its reader • Save the time of the reader

• The library is a growing organism

IIT, Kharagpur

9

Objectives

16/12/2017

Create a 24X7-enabled Infrastructure for NDL with single window search facility – To include h/w systems, networks, s/w tools, applications and interoperability standards

Harvest IDRs (Institutional Digital Repository) across institutions of the nation to provide integrated access

Facilitate select institutes to disseminate existing content and create new digital content

Provide support for immersive E-learning environments at multiple levels spanning across All academic levels – school to college to university to life-long learning

All disciplines – Science, Arts, Humanities, Engineering, Medical, Law, and

All languages (vernacular) used as medium of instruction.

Support interfaces in vernacular & for differently abled users IIT, Kharagpur

10

Digital Contents Digital, Surrogate Digital, Metadata Digital, etc.

Content at NDL

Born-digital object

Digital surrogate of a physical object

Digital metadata of physical object

Metadata at NDL

NDL does not store contents

NDL only ingests metadata for Search & Browse

Content (Full-text) is delivered from Source

16/12/2017

A content is included (metadata ingested) in NDL if it is expected to have educational value

• 7679334-abstract-image-of-tunnel-with-binary-language.jpg

• https://www.123rf.com/profile_carloscastilla

• 450 pixels X 376 pixels • 72 dpi • Royalty Free

IIT, Kharagpur

11

Range of Contents

Institutional Digital Repository of

Contributing Institutes

Faculty Publications,

ETD (Electronic Thesis &

Dissertation): DSc-PhD-Masters-

Undergrad, Research Projects

Books & Periodicals, Open Access

Journals , E-Books &

Subscribed E-Resource

Annual Reports, Project Reports,

Convocation, Working Papers,

Others

Encyclopaedia Dictionaries

Directories Others

Lecture Slides,

Videos, Class Notes,

Courseware

Institutions of School & Higher Education, Boards

Term Papers, Assignments,

Solutions

Lab Experiments,

Manuals, Case Studies

Datasets, Benchmarks,

Models, Maps, Software

Audio & Video

Content

Manuscripts, Painting,

Sculpture, Music, Dance, Drama

Question Banks (JEE / GATE / NET / CAT ), Model

Answers

Re

se

arc

h a

nd

Pro

fess

ion

al In

sti

tuti

on

s,

Ce

ntr

al / S

tate

Un

ive

rsit

y

Institutional and Open Contributions. Multi-modal, Multi-faceted

16/12/2017 IIT, Kharagpur

Content View Architecture

16/12/2017

Content Baseline

Sc

ho

ol

Ve

rti

ca

l

Do

ma

in V

er

tic

al

(Me

dic

al/

Le

ga

l/…

)

Co

mp

eti

tiv

e E

xa

m

Ve

rti

ca

l

Da

ta V

er

tic

al

Ap

pli

ca

tio

n

Ve

rti

ca

l

Vertical-Specific Custom Interface and Search

Generic Interface and Search

Ap

p L

au

nc

he

r

MC

Q /

MS

Q /

Te

xtb

oo

k,

Le

ss

on

Vie

w

IIT, Kharagpur

13

Stakeholders

Roles and Responsibilities

16/12/2017

Stakeholder Roles and Responsibility

Government 1. Sponsor and facilitator 2. Content Contributor

• Ministries / Departments • R & D Labs

Institutions • Public /

Private • Academic /

R & D / Educational

1. Host Institution – IIT Kharagpur 2. Contributing Institution – Supporting

IDRs 3. Participating Institution – Providing

Users & Feedback

Public • NGOs • Individuals

1. Use and Feedback 2. Metadata by Crowd Sourcing 3. Content by Crowd Sourcing

Industry 1. Technology Providers

Publishers 1. Metadata Provider 2. Content Provider (under various

licensing schemes)

IIT, Kharagpur

14

Contributors

CFTI, State and Central Universities, R & D Labs, Govt. Depts, Free Portals, Publishers, etc.

130 Contributors and counting

16/12/2017 IIT, Kharagpur

15

Users & Access

Individual

Institutional

Registration is Open to all

Registration Types:

Individual

Registers directly

Institutional

By request from Institution

Managed by authenticated Nodal Person

Convenient for bulk upload of users

16/12/2017

College Life-long Learner

IIT, Kharagpur

16

The Big Picture

NDL:

Content Repository

LMS:

Content Delivery for Learning

VUC:

Certification & Credit Transfer

16/12/2017

National Digital Library of India

MOOCs

LMS

Virtual University

and Certification

IIT, Kharagpur

17

16/12/2017 IIT, Kharagpur

The current Vision

Technology to Realize current Vision

What’s there beyond the horizon…..?

Initiatives to realize the future

M E T A D A T A E N G I N E E R I N G

S O F T W A R E A R C H I T E C T U R E

M U L T I - L I N G U A L I N T E R F A C E

E X P E R I E N C E T R A C K I N G

16/12/2017

Technology of NDL

IIT, Kharagpur

19

NDL Data Model

16/12/2017

Challenges in Metadata Engineering for NDL

Wide category of resources

Generic metadata or domain specific?

Openness of repository

Closed metadata standard may fail to describe a new resource

Scale is enormous

Manual annotation is infeasible

Automatic annotation guided by crowd sourcing?

16/12/2017

Metadata Specification Requirement

To describe any digital resource Generic content metadata

Contributor, Description, Language, Format etc

To describe domain specific resources Educational content metadata

Educational level, ToC, Type of learning material etc

Medical domain

Disease, Patient condition, case studies etc.

Thesis metadata

Institution, advisor, degree, researcher

16/12/2017

Metadata Envelop

Shodhganga (thesis)

pedagogicObjective

keyword

NDL Metadata

http://www.ndlproject.iitkgp.ac.in/ndl/header.php?mname=Metadata%20Schema

NDL Metadata Envelop

16/12/2017

Locate Content Acquire Metadata

Harvest Institutional IDRs

Crawl Websites

In Bulk – from Publishers

Donated by Source

Source-supported API

Creation Manual Automated

Translation Format Standard / Schema

Curation Manual Assisted

Ingestion

Acquisition Scenarios

16/12/2017

Smart Metadata Curation Workflow

16/12/2017

Software Architecture

16/12/2017 IIT, Kharagpur

26

Experience Tracking Technology Record:

1. Partha searched ‘tiger’

2. Partha navigated to Tiger Wiki

3. Partha studied Wiki (3 min)

4. Partha downloaded tiger image (4 images)

5. Partha checked tiger map (2 min)

6. Partha enlarged map at Sunderbans (twice)

7. Partha searched ‘national animal of India’

Infer:

• Partha learnt – tiger is the national animal (of India)

• … possibly

Experience API (xAPI) / Tin Can API

Connects learning content and learning systems to record and track all types of learning experiences

Learning Record Store (LRS)

Stores learning experiences

16/12/2017

Source: https://tincanapi.com/overview/

IIT, Kharagpur

28

UI Technology: Experience Tracking

LRS NDL

Repository LMS

LMS Front End Search/Browsing Visualization &

Analytics

LMS Tracker Search Tracker

16/12/2017 IIT, Kharagpur

29

UI Technology: Web Interface

API End Points:

• RestFul API endpoint

• CMS API, Index API and LMS API

• Parameterized access to the index

• API Sandbox

Web interface is an app using NDL API

Extended Search Features Facet based search result refinement

DDC Topic tree based content browsing

Tag, comment on a content

Target group specific interface

Bookshelf

Rating and sharing

NDL-Specific Features Multi-lingual enablement

Multi-lingual query interface powered by Google transliteration

Cross-lingual search

Personalization

16/12/2017 IIT, Kharagpur

30

16/12/2017 IIT, Kharagpur

The current Vision

Technology to Realize current Vision

What’s there beyond the horizon…..?

Initiatives to realize the future

Development of Large Scale Repository

16/12/2017 IIT, Kharagpur

Current Status: • A single search index cater to all the domains

Issues: • Inefficiency in retrieval (relevance and

retrieval time) • Domain specific information is lost

• A single metadata caters to all • e.g., same metadata schema used to index

education, medical, cultural domain

Development of Large Scale Repository

16/12/2017 IIT, Kharagpur

Solution: • Distributed indices each targeting a specific domain

Education Medical Culture

Intelligent Query Forwarding Search Result Aggregation

User Query Search Results

Development of Large Scale Repository

16/12/2017 IIT, Kharagpur

34

Intelligent Query Forwarding or Selective Search • Identify domain of query • Challenging as query contexts tend to be brief

Search Result Aggregation • Aggregating facets that are domain specific

• Subject Categorization (MeSH or DDC) • On-the-fly mapping without compromising

response time

Semantic Search

16/12/2017 IIT, Kharagpur

35

Books where the author is Satyajit Ray

Books where the subject is Satyajit Ray

Semantic Search

16/12/2017 IIT, Kharagpur

36

video lectures

Learning Resource Type

Source Organization

Subject Domain

on IIT Kharagpur Computer Science

video tutorial A. Basu by

Author

of

NPTEL

Data structure

Source

from

Keyword/Subject

Learning Resource Type

Semantic Search

16/12/2017 IIT, Kharagpur

37

Works of Indian about European Colonization of South East Asia

A Complex Query

CreativeWork

Novel NonFiction

isA isA isA

Author

type

Article

type

Indian nationality

asserts Statement colonized

subject

predicate

Indonesia

Europe

SouthEastAsia

object

isAuthorOf

Result: OCEAN OF CHURN

Knowledge Graph

Reified Statement

Semantic Search

16/12/2017 IIT, Kharagpur

38

Core Knowledge Graph Library

Resource Description

External Knowledge Graph

Unstructured Text

Semantic Analysis of Query

Query

Semantic Search

16/12/2017 IIT, Kharagpur

39

Indonesia isPartOf SouthEastAsia

Semantic Search

16/12/2017 IIT, Kharagpur

40

Linked Open Data (LOD) Cloud

Data is out there. We got to use it http://lod-cloud.net/versions/2017-08-22/lod.png

Semantic Search: Research Challenges

16/12/2017 IIT, Kharagpur

41

Linking Data to LOD Cloud • Linking entities to LOD entities (entity disambiguation)

• dbr:Indonesia sameAs http://dbpedia.org/page/Indonesia

Unstructured Data to Structured Knowledge Graph Entry

• Extract entities and relationships from free text, video, image • Indonesia experienced a long colonial history under Dutch rule

(https://www.indonesia-investments.com/culture/politics/colonial-history/item178?)

• dbr:dutch dbp:colonized dbr:indonesia

Semantic Search: Research Challenges

16/12/2017 IIT, Kharagpur

42

Semantic Analysis of Query • Identifying meaningful phrases in query • Mapping phrases to Knowledge Graph vocabulary

• Colonization conquer

Inference over Knowledge Graph

• Graph traversal-based inference • Ocean_of_Churn type NonFiction, NonFiction isA

CreativeWork • Ocean_of_Churn type CreativeWork

• Rule-based Inference • X type Author AND X nationality India X type

IndianAuthor

Crowd Sourcing for User Engagement

16/12/2017 IIT, Kharagpur

43

https://pro.europeana.eu/post/writing-the-past-transcribing-handwritten-documents-from-world-war-one

Crowd Sourcing for User Engagement

16/12/2017 IIT, Kharagpur

44

https://www.nla.gov.au/content/many-hands-make-light-work-public-collaborative-ocr-text-correction-in-australian-historic

User Engagement: Research Challenges

16/12/2017 IIT, Kharagpur

45

Effective Crowdsourcing Strategy • Designing Hackathons that are interesting • Incentive and motivation • Strategy for moderation

OCR Technology • Indian Language OCR technology has to go a long way

Enhancing User Experience

16/12/2017 IIT, Kharagpur

46

Video Lectures

Assessment Items

Simulation

Game

Integrated Resource Presentation

Enhancing User Experience

16/12/2017 IIT, Kharagpur

47

Tools

Dataset

Experts

Research Groups

Integrated Resource Presentation: Research Challenges

16/12/2017 IIT, Kharagpur

48

Selecting Candidate Resource for Integration • Hybrid similarity model based on Knowledge graph and

unstructured text

Judging Supplimentarity • Given one resource to what extent another resource is a

supplement • Diversity in resource modalities

Personalized and Context-based Notification

16/12/2017 IIT, Kharagpur

49

I am a traveler. I want to visit Kolkata to

enjoy and have fun

Personalized and Context-based Notification

16/12/2017 IIT, Kharagpur

50

I am a research scholar in an exploration tour

for my research on Kolkata

Context-based Notification: Research Challenges

16/12/2017 IIT, Kharagpur

51

Context Representation • Demographic data

• Age range, ethnicity etc. • Dynamic data

• Location, current interest, tour plan etc.

Context Profiling • Demographic data

• Survey, prediction model over user behavior data • Dynamic data

• Physical sensors (location, time) • Social sensors (facebook, twitter feeds)

Notification Generation

• Context profile based retrieval model

16/12/2017 IIT, Kharagpur

plaban@cet.iitkgp.ernet.in

top related