A Framework for Community based Distributed and Semantically
Annotated Course-ware Development, Sharing and Quality Management for
Higher Technical education over Publish/subscribe P2P Overlay
Department of Computer Science and Engineering
Motilal Nehru National Institute of Technology Allahabad
and
Applied Artificial Intelligence Group
Centre for Development of Advanced Computing, Pune
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 2
Higher Technical Education: Observations
Engineering Institutions : 2,500 approx Annual output: 400,000 approx Computer Science graduates : 300,000
approx Growth rate: 20% expected (NASSCOM) Employable Output: 25% only (McKinsey
Global)
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 3
Higher Technical Education: Observations
M.Tech. Output: 20,000 Ph.D. Output
Engineering: less than 1000 Basic Sciences: around 5,000.
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 4
Higher Technical Education: Observations
Number of researchers (2007-08) India :About 154,800 China: 1,423,000 US : 1,571,000
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 5
Higher Technical Education: Observations
Needs: Order of magnitude growth of Quantity and Quality
Rapid and large scale growth of Student enrollment Institutes/universities Research Scholars
Total quality management of Outputs: Publications, Patents, Personals Resources: Courseware, Training material,
Labs and Evaluation Services
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 6
Impact of Internet
Highly scalable, anywhere/anytime access Very large volume of:
Courseware Research papers Training materials
No positive impact on quality of education. Points to a disconnect between needs and
availability
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 7
Possible Reasons for Disconnects
Resources are targeted to a specific groups May not be suitable for academically,
linguistically and culturally different groups of users
Disproportionately larger effort required to search Lack of semantic annotation
Lack of quality assessment and indicators
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 8
Learning Methodologies
Traditional class room teaching with/without ICT Face to face interaction with teacher and peers Valuable learning experience Peer interaction dominant
E-learning: Unsupervised: No interaction, Learners work in
isolation Supervised: Limited interaction Static resources:
Very limited support for evolving heterogeneous needs of learners.
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 9
E-Learning Infrastructure
Content Delivery : Client/Server Mode Dedicated Servers in LAN Environment Through Portals on WWW
Communication Paradigm Request/Reply Synchronous Coupled
Scalability: Limited Fault Tolerance: Limited
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 10
Latent Knowledge Resources
Every institution has large number of hosts.
Each host contains valuable knowledge resources.
Latent: search engine can’t list them Reason:
Hosts do not have Public IP address Hosts are not servers Hidden behind Proxy/NAT
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 11
Sharing Latent Knowledge Resources Interest based cooperative sharing is
desirable Difficulties:
Heterogeneity of interest Dynamic interest evolution Rendezvous of availability and interest Hosts are widely distributed
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 12
Sharing Latent Knowledge Resources Visibility of interests and contents
resource owner – declare the availability and Interested user -- submit there interest
Dynamic evolution of Interest based communities
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 13
Our Vision
Decentralized and autonomous middleware Highly Scalable Fault-tolerant Minimal management and maintenance
overhead Support dynamic evolution of interest
based communities for Collaborative generation of:
Content Meta-data Domain ontology
Seamless sharing of resources Peer interaction
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 14
Our Vision
Semantic searching based on Meta-data Domain ontology
Quality assessment of resources by community
Behavioral Mining
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 15
Challenges
Heterogeneity Users: Interest and content Host: uptime, memory, CPU, bandwidth
Scalability and interoperability Hosts without Public IP Management of dynamics
content, user group and their behaviors Absence of domain ontology and meta-data
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 16
Requirements
Communication paradigm to support scalability Decoupling: Time, space and synchronization Anonymity
Network Infrastructure to support Peer-to-peer interaction Dynamic evolution of interest based
communities Interoperability Seamless dynamic leaving and joining of
nodes
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 17
Decoupling :
Between providers and consumers Increase scalability
No dependencies No coordination & synchronization.
Create highly dynamic, decentralized systems
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 18
Dimensions Of Decoupling:
Three dimensions Space - No need to hold references or
even know each other Time - No need to be available at the
same time Synchronization (flow) - Control flow is
not blocked by the interaction
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 19
Publish/Subscribe
Paradigm for scalable distributed applications
Provides Decoupling Anonymity Asynchrony
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 20
Publish/Subscribe: High Level View
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 21
Publish/subscribe: Subscription Model Topic (subject) -based Content-based Type based
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 22
Implementation of Event Service
Centralized Implementation Event matching is easy No Scalability No fault Tolerance
Distributed Implementation Set of nodes designated as Brokers Improved Scalability and fault tolerance Routing and matching of events is difficult
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 23
Implementation of Event Service
Role based Implementation Every node can take any role based on
context Broker Publisher Subscriber
Highly scalable and fault tolerant
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 24
Role based Implementation: Challenges Management of scalability and fault-
tolerance Application Layer Overlay Hierarchy Informed/Un-informed leaving
Routing of Publications and subscriptions Location of rendezvous Life span
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 25
Role based Implementation: Challenges Role assignment
Designated (fix role) Dynamic
Matching Content based Type based
Notification Service Guarantee (at least once, at most
once etc.)
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 26
Current Network Infrastructure
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 27
Current Network Infrastructure
Within Institute/Organization: Nodes are assigned Private IPs Grouped in IP based subnets Physically connected with each other
through layer-2 and layer-3 switches. Not visible to outside world Connect to outside world through
NAT/Proxy
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 28
Our Network Architecture
Within LAN of Institute/Organization Nodes having same interest:
Not aware about each other May be physically distant
Some virtualization is required Formation of interest based virtual rings Virtual links are formed using virtual (e.g..
TCP) links Virtual ring termed as Overlay.
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 29
Our Network Architecture
With in LAN of Institute/Organization
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 30
Our Network Architecture
Node visibility Nodes hidden behind Proxy/NAT Virtual rings of same interest may be
behind different proxy/NAT Isolated rings Resource sharing not possible: Invisibility Have to come under one umbrella
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 31
Our Network Architecture
Virtual Ring of Proxies too. This makes it a 2-tier Overlay
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 32
Our Network Architecture
Dynamic Community Evolution Abstraction over the 2-tier overlay
Isolated rings form communities Virtual Interest based proximity: Physically
nodes may be far apart
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 33
Our Overall Network Architecture
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 34
Pub/Sub on our Network Architecture: Every Node acts as:
Publisher, Subscriber, Broker Rendezvous Point based Matching
Distributed Hash Table (DHT) Nodes:
Majority are short lived and have minimal capabilities
Small percentage Remains up for long periods Relatively better storage, bandwidth and memory Termed as Super nodes.
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 35
Super Nodes
Candidate Super Nodes: May get elected dynamically Proxy Nodes GARUDA nodes/ NKN nodes
May act as Brokers for Popular content (temporal locality) Hot contents are automatically cached
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 36
Finding Content
Push/Pull Model Subscription Instead of Searching
Learner need not make search effort Learner subscribes for content System provides matching Publication
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 37
Finding Content
Semantic Support Publication with/without meta-data Subscription with/without meta-data
Knowledge Resources enriched with meta-data
Use of domain specific ontology
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 38
Meta-data
Meta-data can be created in distributed manner by: Content creator Some designated meta-data expert from the
community Automatic or semi-automatic Meta-data: Published/subscribed, stored,
retrieved as usual knowledge resource.
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 39
Ontology
Distributed Ontology creation by Some experts from community
Published/subscribed, stored, retrieved as usual knowledge resource.
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 40
Our Universal Client
Every node will run a generic client application
Universal client provides an interface for: Joining, Leaving: virtual ring maintenance Fault tolerance: replication, caching Publishing, Subscribing content Event Brokering Meta-data creation Ontology creation Behavior mining and Quality assessment
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 41
Our Software Architecture
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 42
Layer 1: Distributed and Federated Database
It Contains: Meta-data base Ontology base Knowledge Resource base Access log Base for user profiles
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 43
Layer 1: Distributed and Federated Database
It also contains: Publication base Subscription base Base for event brokering
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 44
Layer 2: Publish/Subscribe, Overlay Layer
It has three sub-layers: Sub-layer 1 : Overlay sub-layer Sub-layer 2 : Community Management sub-
layer Sub-layer 3 : Publish/Subscribe sub-layer
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 45
Layer 3: Service Layer
Provides Services for Distributed Ontology Creation Metadata Harvesting Inference Engine Multilingual Subscription/Publication
Support
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 46
An Example Demonstration
Layer 3 of our Software Architecture Presentation by C-DAC
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 47
Design Challenges and Trade-offs
Overlay Architecture: Structured/Unstructured/Hybrid Unstructured
Stateless, Maintenance cost minimum Flooding instead of routing, bandwidth wastage
Structured State full, Maintenance required No flooding, saves bandwidth
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 48
Design Challenges and Trade-offs
Implementation of event service Purely Distributed
Every node can be broker High scalability Higher cost of event management, routing and
matching Partially Distributed
Only Proxies as brokers Scalability is reduced Lower cost of event management, routing and
matching
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 49
Simulation
To evaluate design alternatives: Role:
Assignment Vs acquisition Static Vs Dynamic
Utilization of Skewedness in subscription Replication of Hot Content
Service Guarantee Life span of Knowledge resources
Informed and Uninformed Leaving
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 50
Strengths: MNNIT
Implicit Invocation Systems and Semantic Web Group of faculty members and research
scholars (PhD, MTech) indulged in: Large scale Publish/Subscribe for dynamic
topologies Automatic meta-data extraction and generation.
Networking and Distributed Computing Group of faculty members and research
scholars (PhD, MTech) indulged in: Peer-to-Peer computing Cloud Computing
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 51
Strengths: CDAC
Expertise in: Multi-lingual Searching Meta-data extraction and generation Domain specific ontology creation Behavioral Mining
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 52
Proof of Concept
Demonstration of: Publish/Subscribe over P2P Overlay. Scalability in terms of participating nodes by
simulation
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 53
Proof of Concept: MNNIT Responsibilities Design and implementation of 2-tier peer to
peer overlay involving nodes at: CDAC: One proxy and 10 nodes behind it. MNNIT: One proxy and 10 nodes behind it.
Demonstration of scalability of the infrastructure in terms of participating nodes by simulation.
Design and implementation of publish/subscribe interface over this p2p overlay
Integration of modules developed by CDAC including GUI.
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 54
Proof of Concept: CDAC Responsibilities Creation of ontology for two or three core
computer science domain and for Subscription/publication meta-data
structure, complaint with semantic web standards.
Distributed data-base management of meta-data, ontology, subscriptions,
publications, access pattern log and user profiles.
GUI for search, publish and subscribe.
©2010 CSED, MNNIT, Allahabad and AAIG, CDAC Pune 55
Thanks