~khaled shaban phd. candidate supervisors: dr. otman basir dr. mohammad kamel

19
1 ~Khaled Shaban PhD. Candidate Supervisors: Dr. Otman Basir Dr. Mohammad Kamel

Upload: valentine-hyde

Post on 31-Dec-2015

19 views

Category:

Documents


0 download

DESCRIPTION

~Khaled Shaban PhD. Candidate Supervisors: Dr. Otman Basir Dr. Mohammad Kamel. Previous work. MSc. Thesis, 2002, “Information Fusion in a Cooperative Multiagent System for Web Information Retrieval” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

1

~Khaled ShabanPhD. Candidate

Supervisors:

Dr. Otman Basir

Dr. Mohammad Kamel

Page 2: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

2

Previous workPrevious work MSc. Thesis, 2002, “Information Fusion in a Cooperative

Multiagent System for Web Information Retrieval”

K. B. Shaban, O. A. Basir, K. Hassanein, and M. Kamel, "Intelligent Information Fusion Approach in Cooperative Multiagent Systems", World Automation Congress. June 2002.

K. B. Shaban, O. A. Basir, K. Hassanein, and M. Kamel,

"Information Fusion in a Cooperative Multi-agent System for Web Information Retrieval", The Fifth International Conference On Information Fusion, July 2002.

Page 3: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

3

System visionSystem vision

Envisioned View of the System

Personal Agent

Intermediate “Fusion” Agent

Personal Agent

Resource “Information

Retrieval” Agent

Environment“The Web”

UserUser

Page 4: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

4

Decision FusionDecision Fusion

Markovian team.

A1

A2

A3

An

RG

Z1

Z2

Z3

Zn

Environment

(a)

R1

R2

A1 A2An

RG

Z1 Z2 Zn

DECISIONMAKER

R1

R2

Rn

(b) Centralized team. (c)

A1

A2An

RG

Z1

Z2Zn

DECISIONMAKER

R1

R2Rn

Consensus team.

Page 5: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

5

ImplementationImplementationAltaVista

Excite

AltaVista

Retrieval Agent

Retrieval Agent

Retrieval Agent

Fusion AgentPersonal Agent

0

0.02

0.04

0.06

0.08

0.1

0.12

Google Agent Excite Agent AltaVista Agent LSI Fusion TeamConsensus

Fusion

Un

cert

ain

ty

Page 6: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

6

Current ProjectCurrent Project

“Semantic-based Document Clustering”

Page 7: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

7

Project GoalsProject Goals

Clustering documents based on semantic similarities of their contents

Lend ideas to other mining projects

PhD. thesis by 2005/2006!

Page 8: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

8

Document ClusteringDocument Clustering

Clustering

Documents

Low Inter-cluster similarity

High Intra-cluster similarity

Document Cluster

Document Cluster

Document Cluster

Page 9: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

9

ApplicationsApplications

Improve information retrieval systems performance

Improve the organization and viewing of documents

Accelerate nearest-neighbour searchGenerate directories of hierarchy

clustersImprove automatic speech

recognition systems

Page 10: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

10

Existing Schemes Existing Schemes

Data representation models– Documents as bags-of-words (Vector Space

Model (VSM)) – N-grams – Latent Semantic Indexing (LSI) – Phrase-based

Similarity measures– Euclidean distances– Minkowski distances

Page 11: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

11

Existing Schemes, Cont.Existing Schemes, Cont.

Clustering algorithms– Partitioning (k-means & Fuzzy C-means) – Geometric (Self-Organized Maps (SOM), LSI)– Probabilistic (Maximization Expectation (ME),

Probabilistic LSI)

Evaluation methods– Entropy– F-measure– Overall Similarity

Page 12: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

12

ShortcomingsShortcomings

Abandoning meanings produce wrong results!

– Ex. • ”John eats the apple standing beside the tree“ vs.

”The apple tree stands beside John’s house”

• ”John is an intelligent boy“ vs. “John is a brilliant son”

Page 13: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

13

Proposed ApproachProposed Approach

Syntactic analysis

Documents

Semantic analysis

Parse Tree

Semantic- based document clustering

Knowledge Representation scheme

Document Cluster

Document Cluster

Document Cluster

Page 14: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

14

Proposed Approach - StepsProposed Approach - Steps Preprocess text

– Remove tags, hyperlinks, etc. Morphological analysis

– Identifying words, punctuations, etc. Syntactic analysis

– Building sentences grammatical structures (Parse Tree) Semantic analysis

– Assigning meaning to words– Discourse integration– Pragmatic analysis– Knowledge representation structure

Clustering using the produced representations– New similarity measures– New clustering algorithm– Better document clustering results (hopefully!)

Page 15: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

15

IllustrationIllustration

– “John eats the apple standing beside the tree.” vs. “The apple tree stands beside John’s house.”

tree

clause 1 clause 2

npvgnp

n v ndet

John eats applethe

vg adv

v

standing

prepdet

thebeside

sent 1

n

clause 1

npprep

np

n

adv

ndet

Johnapple besidethe ‘s

sent 2

tree stands

apos

house

Parse Trees

Page 16: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

16

Illustration, Cont.Illustration, Cont.

beside the tree

John eats the apple

Act 1 Obj 2

standing

Act 2 Obj 3

Obj 1 The apple tree Stands beside John’s house

St 1Obj 1

Knowledge Representations

Page 17: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

17

Relation to LORNET?Relation to LORNET?

–Findings can be applied to Learning Objects (LO) mining• Knowledge Representations• Clustering• Classification• Retrieval• Knowledge Sharing

Page 18: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

18

MilestonesMilestones

Jan 03 Jan 04 Jan 05

Phase 1Phase 1

Phase 2Phase 2

Phase 3Phase 3

Grad. courses

Lit. review

Proposal

Comp. Exam

Development

Experimentations

Evaluations

Reporting

Thesis writing

Defence

Page 19: ~Khaled Shaban PhD. Candidate Supervisors:  Dr. Otman Basir Dr. Mohammad Kamel

19

Thank you!Thank you!

Questions?Questions?