~khaled shaban phd. candidate supervisors: dr. otman basir dr. mohammad kamel
DESCRIPTION
~Khaled Shaban PhD. Candidate Supervisors: Dr. Otman Basir Dr. Mohammad Kamel. Previous work. MSc. Thesis, 2002, “Information Fusion in a Cooperative Multiagent System for Web Information Retrieval” - PowerPoint PPT PresentationTRANSCRIPT
1
~Khaled ShabanPhD. Candidate
Supervisors:
Dr. Otman Basir
Dr. Mohammad Kamel
2
Previous workPrevious work MSc. Thesis, 2002, “Information Fusion in a Cooperative
Multiagent System for Web Information Retrieval”
K. B. Shaban, O. A. Basir, K. Hassanein, and M. Kamel, "Intelligent Information Fusion Approach in Cooperative Multiagent Systems", World Automation Congress. June 2002.
K. B. Shaban, O. A. Basir, K. Hassanein, and M. Kamel,
"Information Fusion in a Cooperative Multi-agent System for Web Information Retrieval", The Fifth International Conference On Information Fusion, July 2002.
3
System visionSystem vision
Envisioned View of the System
Personal Agent
Intermediate “Fusion” Agent
Personal Agent
Resource “Information
Retrieval” Agent
Environment“The Web”
UserUser
4
Decision FusionDecision Fusion
Markovian team.
A1
A2
A3
An
RG
Z1
Z2
Z3
Zn
Environment
(a)
R1
R2
A1 A2An
RG
Z1 Z2 Zn
DECISIONMAKER
R1
R2
Rn
(b) Centralized team. (c)
A1
A2An
RG
Z1
Z2Zn
DECISIONMAKER
R1
R2Rn
Consensus team.
5
ImplementationImplementationAltaVista
Excite
AltaVista
Retrieval Agent
Retrieval Agent
Retrieval Agent
Fusion AgentPersonal Agent
0
0.02
0.04
0.06
0.08
0.1
0.12
Google Agent Excite Agent AltaVista Agent LSI Fusion TeamConsensus
Fusion
Un
cert
ain
ty
6
Current ProjectCurrent Project
“Semantic-based Document Clustering”
7
Project GoalsProject Goals
Clustering documents based on semantic similarities of their contents
Lend ideas to other mining projects
PhD. thesis by 2005/2006!
8
Document ClusteringDocument Clustering
Clustering
Documents
Low Inter-cluster similarity
High Intra-cluster similarity
Document Cluster
Document Cluster
Document Cluster
9
ApplicationsApplications
Improve information retrieval systems performance
Improve the organization and viewing of documents
Accelerate nearest-neighbour searchGenerate directories of hierarchy
clustersImprove automatic speech
recognition systems
10
Existing Schemes Existing Schemes
Data representation models– Documents as bags-of-words (Vector Space
Model (VSM)) – N-grams – Latent Semantic Indexing (LSI) – Phrase-based
Similarity measures– Euclidean distances– Minkowski distances
11
Existing Schemes, Cont.Existing Schemes, Cont.
Clustering algorithms– Partitioning (k-means & Fuzzy C-means) – Geometric (Self-Organized Maps (SOM), LSI)– Probabilistic (Maximization Expectation (ME),
Probabilistic LSI)
Evaluation methods– Entropy– F-measure– Overall Similarity
12
ShortcomingsShortcomings
Abandoning meanings produce wrong results!
– Ex. • ”John eats the apple standing beside the tree“ vs.
”The apple tree stands beside John’s house”
• ”John is an intelligent boy“ vs. “John is a brilliant son”
13
Proposed ApproachProposed Approach
Syntactic analysis
Documents
Semantic analysis
Parse Tree
Semantic- based document clustering
Knowledge Representation scheme
Document Cluster
Document Cluster
Document Cluster
14
Proposed Approach - StepsProposed Approach - Steps Preprocess text
– Remove tags, hyperlinks, etc. Morphological analysis
– Identifying words, punctuations, etc. Syntactic analysis
– Building sentences grammatical structures (Parse Tree) Semantic analysis
– Assigning meaning to words– Discourse integration– Pragmatic analysis– Knowledge representation structure
Clustering using the produced representations– New similarity measures– New clustering algorithm– Better document clustering results (hopefully!)
15
IllustrationIllustration
– “John eats the apple standing beside the tree.” vs. “The apple tree stands beside John’s house.”
tree
clause 1 clause 2
npvgnp
n v ndet
John eats applethe
vg adv
v
standing
prepdet
thebeside
sent 1
n
clause 1
npprep
np
n
adv
ndet
Johnapple besidethe ‘s
sent 2
tree stands
apos
house
Parse Trees
16
Illustration, Cont.Illustration, Cont.
beside the tree
John eats the apple
Act 1 Obj 2
standing
Act 2 Obj 3
Obj 1 The apple tree Stands beside John’s house
St 1Obj 1
Knowledge Representations
17
Relation to LORNET?Relation to LORNET?
–Findings can be applied to Learning Objects (LO) mining• Knowledge Representations• Clustering• Classification• Retrieval• Knowledge Sharing
18
MilestonesMilestones
Jan 03 Jan 04 Jan 05
Phase 1Phase 1
Phase 2Phase 2
Phase 3Phase 3
Grad. courses
Lit. review
Proposal
Comp. Exam
Development
Experimentations
Evaluations
Reporting
Thesis writing
Defence
19
Thank you!Thank you!
Questions?Questions?