dynamic content discovery, harvesting and delivery, from open corpus sources, for adaptive systems...

1
Dynamic Content Discovery, Harvesting and Delivery, from Open Corpus Sources, for Adaptive Systems Séamus Lawless Knowledge & Data Engineering Group, Trinity College Dublin [email protected] For more information visit: www.cs.tcd.ie/~slawless Methodology Personalised elearning is being heralded as one of the grand challenges of next generation learning systems, in particular, its ability to support greater effectiveness, efficiency and student empowerment. However, a key problem with such systems is their reliance on bespoke content. The challenge for adaptive systems in scalably supporting personalised elearning is its ability to discover, harvest and deliver open corpus content to adaptive content services and personalised elearning systems. The Problem Research Goal and Scope The goal of this research is to provide dynamic contextual retrieval of open corpus content for eLearning systems. There are many sources of open corpus content, the primary examples of which are the Worldwide Web and digital content repositories. Open corpus content lacks consistency in its structuring. These inconsistencies are both semantic, relating to the vocabulary used to describe the content, and syntactic, relating to the metadata standards implemented. Metadata mappings to a canonical model using a fixed vocabulary ontology will be implemented to ensure consistency and accuracy in content discovery, analysis and description. Lexical analysis and metadata tag generation will be required for some open corpus content. The aesthetic qualities and presentation of retrieved content and IP/DRM are both deemed to be outside the scope of this research. It is proposed that a series of content requirements can be extracted from educational environments. These requirements are both technical, relating to content structure, metadata standards and taxonomies used, and semantic, relating to subject matter and pedagogical needs. A cache of candidate learning content is generated through a combination of web crawling and OAI harvesting of repositories. This content is then analysed and indexed to enable discovery. Upon creation of the content index a mapping will occur to a canonical metadata model using a fixed ontology of terms. Metadata tags will be created for content with insufficient descriptions. A query engine will take the content requirements and generate queries to run against the content index and identify suitable candidate content. Once suitable content is identified it will be harvested and delivered to a learning object generation environment for restructuring. Status Work underway Focused Web Crawling System operational Initial attempts at generating an index of the content cache Soon to commence Initial attempt at requirements extraction from Learning Designs Metadata canonical model & vocabulary ontology creation The research is supported by IRCSET’s embark initiative and conducted in conjunction with the IBM Centre for Advanced Studies Student Program. System Architecture C ontentC ache D igital R epositories WWW Analysis & Annotation O ntologial Mapping Metadata M odel Mapping Learning O bject Generation Q uery Engine AHS / Personalised eLearning Environm ent C ontent Index C ontent Harvesting O pen C orpus C ontentService C ontent Requirements N DLR U ser Input U ser M odel H eritrix W eb C raw ler R ainbow TextC lassifier O C C S Focused C raw ling System O AIC ontentH arvester ContentIndexing Learning Design

Upload: eliza-hiles

Post on 16-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Content Discovery, Harvesting and Delivery, from Open Corpus Sources, for Adaptive Systems Séamus Lawless Knowledge & Data Engineering Group, Trinity

Dynamic Content Discovery, Harvesting and Delivery,from Open Corpus Sources, for Adaptive Systems

Séamus Lawless

Knowledge & Data Engineering Group, Trinity College Dublin

[email protected]

For more information visit: www.cs.tcd.ie/~slawless

Methodology

Personalised elearning is being heralded as one of the grand challenges of next generation learning systems, in particular, its ability to support greater effectiveness, efficiency and student empowerment. However, a key problem with such systems is their reliance on bespoke content. The challenge for adaptive systems in scalably supporting personalised elearning is its ability to discover, harvest and deliver open corpus content to adaptive content services and personalised elearning systems.

The Problem

Research Goal and ScopeThe goal of this research is to provide dynamic contextual retrieval of open corpus content for eLearning systems. There are many sources of open corpus content, the primary examples of which are the Worldwide Web and digital content repositories. Open corpus content lacks consistency in its structuring. These inconsistencies are both semantic, relating to the vocabulary used to describe the content, and syntactic, relating to the metadata standards implemented. Metadata mappings to a canonical model using a fixed vocabulary ontology will be implemented to ensure consistency and accuracy in content discovery, analysis and description. Lexical analysis and metadata tag generation will be required for some open corpus content. The aesthetic qualities and presentation of retrieved content and IP/DRM are both deemed to be outside the scope of this research.

It is proposed that a series of content requirements can be extracted from educational environments. These requirements are both technical, relating to content structure, metadata standards and taxonomies used, and semantic, relating to subject matter and pedagogical needs. A cache of candidate learning content is generated through a combination of web crawling and OAI harvesting of repositories. This content is then analysed and indexed to enable discovery. Upon creation of the content index a mapping will occur to a canonical metadata model using a fixed ontology of terms. Metadata tags will be created for content with insufficient descriptions. A query engine will take the content requirements and generate queries to run against the content index and identify suitable candidate content. Once suitable content is identified it will be harvested and delivered to a learning object generation environment for restructuring.

Status Work underway

Focused Web Crawling System operational

Initial attempts at generating an index of the content cache

Soon to commence

Initial attempt at requirements extraction from Learning Designs

Metadata canonical model & vocabulary ontology creation

The research is supported by IRCSET’s embark initiative and conducted in conjunction with the IBM Centre for Advanced Studies Student Program.

System Architecture

Content Cache

Digital Repositories

WWW

Analysis & Annotation

Ontologial Mapping

Metadata Model

Mapping

Learning Object

Generation

Query Engine

AHS / Personalised

eLearning Environment

Content Index

Content Harvesting

Open Corpus Content Service

Content Requirements

NDLR

User Input

User Model

Heritrix Web Crawler

Rainbow Text Classifier

OCCS Focused Crawling System

OAI Content Harvester

Content Indexing

Learning Design