seeking for information: artificial intelligencemounia/ecai2002.pdf · sailing greece mediterranean...
TRANSCRIPT
Seeking information: IR + AI approaches ECAI 2002 Lyon
1
Seeking Information: Methods from Information Retrieval and Artificial Intelligence
Alison CawseyHeriot-Watt University, Edinburgh, Scotland
Mounia Lalmas and Thomas RoellekeQueen Mary University of London, London, England
http://www.dcs.qmul.ac.uk/~mounia/ECAI2002.html
Tutorial Outline
1. Introduction2. Information Retrieval Approaches3. Artificial Intelligence Approaches4. Conclusions and Future5. Demos
Seeking information: IR + AI approaches ECAI 2002 Lyon
2
Introduction: Information needExample of information need in the context of the world wide web:
“Find information on sailing charters that: (1) can be skipped from the Greek Islands, and (2) are registered with the RYA. To be useful, the information must include boat specification, price per week, and e-mail and phone number for contact purpose.”
⇒Information Retrieval (IR)⇒Artificial Intelligence (AI)
Introduction: Information Seeking
users sources
Seeking information: IR + AI approaches ECAI 2002 Lyon
3
Introduction: Three main components
expressing the information need
extracting information from sources
matching
Introduction: Information Seeking
! Libraries and Bibliographic Systems! World-wide-web! Digital libraries! (Knowledge Management)
! Areas: medical, journalism, broadcast, geographical and satellite systems, learning, leisure, …
Seeking information: IR + AI approaches ECAI 2002 Lyon
4
Seeking for Information: Information Retrieval
Mounia Lalmas and Thomas RoellekeDepartment of Computer ScienceQueen Mary University of LondonLondon, E1 4NS, England{mounia,thor}@dcs.qmul.ac.uk
IR Approaches: Outline1. Introduction2. Basics
1. Indexing mechanisms2. Retrieval Models3. Evaluation
3. Topics1. Query reformulation2. Web IR3. Structured document retrieval4. The use of AI in IR
Seeking information: IR + AI approaches ECAI 2002 Lyon
5
Introduction: Information needExample of information need in the context of the world wide web:
“Find all documents containing information on computer courses which: (1) are offered by universities in South England, and (2) are accredited by the BCS/IEE bodies. To be relevant, the document must include information on admission requirements.”
⇒ Information Retrieval
Introduction: Information retrieval (IR) system
“Retrieve all the documents which are relevant to a user query, while retrieving as few non-relevant documents as possible.”
Seeking information: IR + AI approaches ECAI 2002 Lyon
6
Introduction: The (Basic) IR Process
information need match
search/ / retrievalengine
documents
query
retrieved documents
query reformulation
+ _
index
Introduction: Topics in IR
! Query reformulation! Web IR and link analysis! Structured document
retrieval! Thesaurus construction! Parallel and distributed IR! Text categorisation ! Filtering! Hypertext and hypermedia ! Metadata and ontologies ! Integration of IR and DB
technologies
! Agent-based technology ! Metasearch and data fusion ! Summarisation, abstraction ! Interface and visualisation ! Information-seeking and user
modelling ! User studies ! Multimedia IR ! Multilingual IR ! Index structure ! Text compression
Seeking information: IR + AI approaches ECAI 2002 Lyon
7
Introduction: IR is a multi-disciplinary approach
informationretrieval
artificial intelligence
human computerinteraction
linguisticsvision
machinelearning
cognitive science
mathematicsinformation andlibrary studies
Information Retrieval: Basics
1. Indexing2. Retrieval3. Evaluation
Seeking information: IR + AI approaches ECAI 2002 Lyon
8
Basics: Indexing1. What is a document?
2. Representing the content of documents1. Conflation2. Weighting
3. Inverted files (Index)
Indexing: What is a document?
sailinggreecemediterraneanfishsunset
Author = “B. Smith”Crdate = “14.12.96”Ladate = “11.07.02”
Sailing in Greece
B. Smith
headtitleauthor
chaptersectionsection
content
structure
layout
fact
Seeking information: IR + AI approaches ECAI 2002 Lyon
9
Indexing: Conflation
Documents Tokens
Stop wordsStemsIndex terms
Phrases
- stop list-suffix stripping-linguistic resources
- NLP
Controlled vocabulary-Catalogue-Thesaurus
Indexing: Weighting
weight(t,d) = tf(t,d) × idf(t)
d t N N(t) idf(t) occ(t,d) tmax tf(t,d)
document term number of documents in collection number of documents in which term t occurs inverse document frequency occurrence of term t in document d term in document d with highest occurrence term frequency of term t in document d
•high frequency in a document (tf) leads to a high term weight•lower document frequency (high idf) leads to a high term weight
( ) ( )tnN log tidf =
( ) ( ) ( )( )d,tocc
dt,occa-1 adt,tfmax
+=
Seeking information: IR + AI approaches ECAI 2002 Lyon
10
Indexing: Inverted fileWord-oriented mechanism for indexing document collections to speed up searching
Searching:! vocabulary search (query terms)! retrieval of occurrence! manipulation of occurrence
TERM IDF TFDOC
Basics: Retrieval
1. Boolean model2. Vector space model3. Probabilistic model4. Other models
Seeking information: IR + AI approaches ECAI 2002 Lyon
11
Retrieval: Boolean ModelRetrieve documents which are “true” for the query
! Query: logical combination of index termsQ = (K1 and K2) or (K3 and (not K4))“Retrieve all documents indexed by K1 AND K2, OR documents indexed by K3 AND (but) NOT indexed by K4”
! Inverted file for document collections {D1, D2, D3, D4}K1-list: D1, D2, D3, D4K2-list: D1, D2K3-list: D1, D2, D3K4-list: D1
! Result: {D1, D2, D3}
! Issue of normalisation ⇒ set-based models (Dice, Jaccard, …)
Retrieval: Vector Space Model (1)! Set of terms {t1, t2, … , tn}! Document vector D = <d1, d2, … , dn>! Query vector Q = <q1, q2, … , qn>
di = term frequency of term ti in documentqi = query formulated with term ti
Retrieval status value:∑i=1,n di qi
(∑i=1,n di2)1/2 (∑i=1,n qi
2)1/2 = cos θ
Seeking information: IR + AI approaches ECAI 2002 Lyon
12
Retrieval : Vector Space Model (2)
term t1
term t2
here n=2 (two index terms)
d1 q1
d2
q2
D
Q
θ
Retrieval: Probabilistic Model“Given a user query q and a document d, estimate the probability that the user will find d relevant.”
"Binary independence model (BIR)# Index terms in relevant and non-relevant documents#Assume feedback information
#BIR without user feedback #BIR with within-document frequency
"Use of polynomial functions and logistic regression"Others
Seeking information: IR + AI approaches ECAI 2002 Lyon
13
Retrieval: Binary Independence Model (BIR)
! Document described by presence/absence of terms! D = <d1, d2, …, dn> where n is number of terms.
R: relevant; ¬R: not relevant
compute P(RD) and P(¬RD) to decide whether document represented by D is relevant.
= otherwise0
by t intexeddocument 1d i
i
BIR: Bayes’ Decision Ruleif P(RD) > P(¬RD) then D is relevant; otherwise D is not relevant.
! Minimises the average probability of error:assigning a relevant document as non-relevant or vice versa (Probability Ranking Principle)
! Need to compute P(RD) and P(¬RD)
Seeking information: IR + AI approaches ECAI 2002 Lyon
14
BIR: Bayes' theorem (1)
! P(D): probability of observing D a description at random, i.e., probability of D irrespective of whether it is relevant or not.
! P(DR): probability of observing D given that it is relevant.! P(D¬R): probability of observing D given that it is not relevant.! P(R): prior probability of observing a relevant document.! P(¬R): prior probability of observing a non relevant document.! Note: P(D) = P(DR)P(R) + P(D¬R)P(¬R).
( ) ( ) ( )( ) ( ) ( ) ( )
( )DPRP RDP
DRP DP
RP RDPDRP
¬¬=¬=
BIR: Bayes’ theorem (2)if P(DR)P(R)>P(D¬R)P(¬R) then D is relevant otherwise D is not relevant
! From above decision rule, we derive a retrieval function g(D) using independence assumptions:
P(DR) = P(d1R) P(d2R) … P(dnR)P(D¬R) = P(d1¬R) P(d2¬R) … P(dn¬R)
Presence Absencepi = P(di = 1 R) 1 - pi = P(di = 0 R)qi = P(di = 1 ¬R) 1 - qi = P(di = 0 ¬R)
! pi(qi): probability that if the document is relevant (non-relevant) then the ithterm ti is present in the document.
Seeking information: IR + AI approaches ECAI 2002 Lyon
15
BIR: Retrieval function g(D)
where
"cis are weights associated with terms ti. e.g. discrimination power.
"Simple addition: add the coefficients ci for those terms ti present in document.
"Rank documents using g(D).
( ) ∑=
=n
1iiidcDg
( )( )ii
iii p1q
q1plogc−−
=
BIR: Estimating the cis (1)For each term ti:
ni: number of documents with term tiri: number of relevant documents with term tiR: number of relevant documentsN: number of documents
• not total number of documents in system• some subset specially chosen to enable ci to be estimated• relevance feedback data: number of displayed documents.
NN – RR
ni
N – ni
ni – ri
N – ni – R + ri
ri
R – ri
di = 1di = 0
non-relevantrelevant
Seeking information: IR + AI approaches ECAI 2002 Lyon
16
BIR: Estimating the cis (2)! pi: probability that a relevant document contains
the term ti! qi: probability that a non relevant document
contains the term ti
! So
extent to which the ith term can discriminate betweenthe relevant and non-relevant documents.
Rrp i
i =
R-Nr-nq ii
i =)rRn(N
)r(n)r(R
r
logc
ii
ii
i
i
i
+−−−
−=
Retrieval: Other Models! Set theoretical models
"fuzzy set model "extended Boolean model
! Algebraic models"latent semantic indexing model "neural network model
! Probabilistic models"inference network "belief network "Language model
! IR viewed as a logical inference
Seeking information: IR + AI approaches ECAI 2002 Lyon
17
Other Models: IR viewed as a logical inference (1)
! document and query: logical formulae d and q! retrieval: search for document which implies the query:
d → q
! Advantage: from term-based retrieval to knowledge-based retrieval! d = square, q = rectangle, and thesaurus: square → rectangle: document
d relevant to query q
d → q
d = t1∧ t2 ∧ t3d = t1 ∧ t3
d ={t1, t2, t3}q= {t1, t3}
logical view
Other Models: IR viewed as a logical inference (2)
! d = quadrangle, q = rectangle: document d maybe relevant to query q
! Uncertainty: quadrangle → rectangle with uncertainty 0.3
! Retrieval: Estimating the “probability” that document infers thequery: P(d → q)
! Logical Uncertainty Principle:“Given any two sentences x and y; a measure of uncertainty of y → x related to a given data set is determined by the minimal extent to which we have to add information to the data set, to establish that y → x”
"Use of non-classical logics and theories of uncertainty
Seeking information: IR + AI approaches ECAI 2002 Lyon
18
Information Retrieval: Evaluation
1. Background2. System-centred evaluation3. User-centred evaluation
Evaluation: Background! What to evaluate?
"coverage of the text collection: extent to which the system includes relevant material
"time lag (efficiency): average interval between the time request is made and the time answer is given
"presentation of the output"effort involved by user in obtaining answers to request"recall of the system: proportion of relevant documents retrieved"precision of the system: proportion of the retrieved documents
that are actually relevant
Seeking information: IR + AI approaches ECAI 2002 Lyon
19
Evaluation: Background! Originally
"Batch IR systems"Small, textual collections"Queries formulated by searchers
! Today"Interactive IR systems"Large collections of different or mixed media"Queries formulated by end-users
Evaluation: System-centred evaluation
! (Comparative) evaluation of technical performance of IR system(s)
! Relevant = “having significant and demonstrable bearing on the matter at hand”
#Objectivity, Topicality, Binary nature, Independence
! Effectiveness = the ability of the IR system to retrieve relevant documents and suppress non-relevant documents
#Test collections: document collection, queries, relevance judgements
Seeking information: IR + AI approaches ECAI 2002 Lyon
20
Effectiveness: Recall / Precision
Document collection
Retrieved RelevantRetrieved and relevant
documentsrelevant ofnumber retrieved documentsrelevant ofnumber recall
retrieved documents ofnumber retrieved documentsrelevant ofnumber precision
=
=
Effectiveness: Recall / Precision! For each system / system
version"For each query in the test
collection#Run query against system to
obtain ranking#Use ranking and relevance
judgements to calculate recall/precision (r/p) pairs at each recall point
# Interpolate to standard recall points if necessary
"Average r/p values across all queries
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall
system 1system 2
Seeking information: IR + AI approaches ECAI 2002 Lyon
21
Effectiveness: Example of test collections
! TREC (Text REtrieval Conference)"Started in 1990, run by National Institute of Standards and
Technology (NIST)"Components
#Huge document collection (several GB), taken from Wall Street Journal, Financial Times, etc
#New documents, topics (i.e. requests, including description and narrative fields) and relevance judgements (performed by retiredcivil servants) each year
"Tracks# Interactive, cross-lingual, Web, spoken document, short query,
video, querying-answering (factoid)
Evaluation: User-centred evaluation
! Evaluation of interface and user interaction"Usability, task performance, user satisfaction
! Methodology based interactive experiment, ethnographic study"No standard user-centred methodology"Elements often borrowed from other areas, e.g.
human computer interaction, experimental psychology
Seeking information: IR + AI approaches ECAI 2002 Lyon
22
nformation Retrieval: Topics in informationretrieval
1. Query reformulation
2. Web information retrieval
3. Structured document retrieval
4. The use of AI in IR
Query reformulation
1. Introduction2. Relevance feedback3. Automatic local analysis4. Automatic global analysis5. Evaluation6. Issues
Seeking information: IR + AI approaches ECAI 2002 Lyon
23
Query reformulation: IntroductionNo detailed knowledge of collection and retrieval environment
difficult to formulate queries well designed for retrievalNeed many formulations of queries for good retrieval
First formulation: naïve attempt to retrieve relevant informationDocuments initially retrieved:
Examined for relevance informationImproved query formulations for retrieving additional relevant documents
Query reformulation:Expanding original query with new termsReweighting the terms in expanded query
Query reformulation: Three approaches1. Relevance feedback
1. Approaches based on feedback from users1. Rocchio2. Binary Independence Model (BIR)
2. Local analysis (pseudo-relevance feedback)1. Approaches based on information derived from set
of initially retrieved documents (local set of documents)
3. Global analysis1. Approaches based on global information derived
from document collection
Seeking information: IR + AI approaches ECAI 2002 Lyon
24
Relevance feedback! Cycle
" User presented with list of retrieved documents" User marks those which are relevant
• In practice: top 10-20 ranked documents are examined• Incremental
" Select important terms from documents assessed relevant by users" Enhance importance of these terms in a new query
1. Query expansion: Add new terms from relevant documents2. Term reweighting: Modify term weights based on user relevance
judgements3. Query expansion + term reweighting
Relevance feedback: RocchioFor query q
Dr: set of relevant documents among retrieved documentsDn: set of non-relevant documents among retrieved documentsα,β,γ: tuning constants
! Usually information in relevant documents more important than in non-relevant documents (γ<<β)
! Positive relevance feedback (γ=0)
qi+1 = αqi +β
Drdj
d j∈Dr
∑ −γ
Dnd j
d j∈Dn
∑
Seeking information: IR + AI approaches ECAI 2002 Lyon
25
Relevance feedback: Rocchio in practice (SMART)
! α=1! Terms
"Original query"Appear in more relevant documents that non-relevant
documents"Appear in more than half the relevant documents
! Negative weights ignored
qi+1 = qi +βDr
d j
dj∈Dr
∑ −γ
Dnd j
dj∈Dn
∑
Relevance feedback: Binary independence model (BIR)
! Probabilistic based (Bayes’ Theorem and Probability Ranking Principle)
! Document D=<d1, …, dn>"n terms t1, … , tn in the collection"di = 1 if document D indexed by ti, otherwise = 0
#ci = discrimination power of term ti at retrieving relevant documents and ignoring non-relevant documents
#Predict relevance#Several formulations for ci
g(D) = cidii=1,n∑
Seeking information: IR + AI approaches ECAI 2002 Lyon
26
BIR: formulations for ci! Independence assumptions
" I1: distribution of terms in relevant documents is independent and their distribution in all documents is independent
" I2: distribution of terms in relevant documents is independent and their distribution in irrelevant documents is independent
! Ordering principle"O1: probable relevance based on presence of search terms in
documents"O2: probable relevance based on presence of search terms in
documents and their absence from documents
Independence Assumption I1
IndependenceAssumption I2
Ordering Principle O1 F1 F2
Ordering Principle O2 F3 F4
BIR: Various combinations
R = number of relevant documentsN= number of documents in collectionri = number of relevant documents containing tini = number of documents containing ti
Nn
Rr
logc :O1)(I1 F1i
i
i =+
R)(N)r(nR
rlogc :O1)(I2 F2
ii
i
i
−−
=+
)n(Nn
)r(Rr
logc :O2)(I1 F3
i
i
i
i
i
−
−=+
)rRn(N)r(n
)r(Rr
logc:O2)(I2 F4
ii
ii
i
i
i
+−−−
−=+
Seeking information: IR + AI approaches ECAI 2002 Lyon
27
BIR: Experiments! F1, F2, F3 and F4 outperform no relevance weighting and
ranking by IDF ! F1 and F2; F3 and F4 perform in the same range
! F3 and F4 > F1 and F2! F4 slightly > F3
"O2 is correct (looking at presence and absence of terms)
! No conclusion with respect to I1 and I2, although I2 seems a more realistic assumption.
Query reformulation: Local analysis
! Examine documents retrieved for query to determine query expansion
! No user assistance
! Two strategies"Local clustering (synonyms, stemming variations)"Local context analysis (terms close to query terms in text)
! Two issues:"Query “drift”"Computation cost (on-line)
Seeking information: IR + AI approaches ECAI 2002 Lyon
28
Query reformulation: Global analysis
! Expand query using information from whole set of documents in collection
! No user assistance! Thesaurus-like structure using all documents
! Two issues:"Approach to built thesaurus (e.g. term co-occurrence)"Approach to select terms for query expansion (e.g.
the top 20 terms ranked according to IDF value)
Query reformulation: Evaluation! Use qi and compute precision and recall graph! Use qi+1 and compute precision recall graph
" Use all documents in the collection• Spectacular improvements• Also due to relevant documents ranked higher• Documents known to user• Must evaluate with respect to documents not seen by user
" (For example) Use documents in residual collection = set of documents minus those assessed relevant
• Measures lower than for original query• More realistic evaluation• But result not comparable with original ranking (fewer relevant
documents)
Seeking information: IR + AI approaches ECAI 2002 Lyon
29
Query reformulation: Issues! Relevance feedback
"Often users are not reliable in making relevance assessments"Positive, negative, neutral, partial relevance assessments"Why a document is relevant?
! Interface and visualisation"Allow user to quickly identify relevant and non-relevant
documents (e.g. the use of summary)"What happen with 2D and 3D visualisation?
! Interactive query expansion (as opposed to automatic)"User choose the terms to be added
Web information retrieval1. Introduction2. Tasks of web search engines
1. Gathering2. Indexing3. Searching4. Document and query management
3. Metasearch4. Issues
Seeking information: IR + AI approaches ECAI 2002 Lyon
30
Introduction: Queries on the web
Measure Average value RangeNumber of words 2.35 0 - 393Number of operators 0.41 0 - 958Repetitions of queries 3.97 1 - 1.5 millionQueries per user session 2.02 1 - 173325Screens per query 1.39 1 - 78496
Introduction: Users and the web! Main purpose: research, leisure, business, education,
#products and services (e-commerce)#people and company names and home pages# factoids (from any one of a number of documents)#entire, broad documents#mp3, image, video, audio
! Some statistics#80% do not modify query#85% look first screen only#64% queries are unique#25% users use single keywords (problem for polysemic words and
synonyms)#10% queries are empty!
Seeking information: IR + AI approaches ECAI 2002 Lyon
31
Web IR: Tasks of a web search engine! Document gathering
#select the documents to be indexed
! Document indexing# represent the content of the selected documents#often 2 indices maintained (full + small for frequent queries)
! Searching# represent the user information need into a query# retrieval process (search algorithms, ranking of web pages)
! Document and query management#display the results#virtual collection (documents discarded after indexing) vs. physical
collection (documents maintained after indexing)
Tasks: Document indexing! Document indexing = building the indices
! Indices are variant of inverted files"metatag analysis"stop words removal + stemming"position data (for phrase searches)"weights
# tf x idf; # downweight long URLs (not important page)# upweight terms appearing at the top of the documents, or emphasised terms
"use de-spamming techniques
! hyperlink information# count link popularity# anchor text from source links# hub and authority value of a page
Seeking information: IR + AI approaches ECAI 2002 Lyon
32
Tasks: Searching! Querying
"1 word or all words must be in the retrieved pages"normalisation (stop words removal, stemming, etc)"complex queries (date, structure, region, etc)"Boolean expressions (advanced search)"metadata
! Ranking algorithms: use of web links"Anchor text"web page authority analysis
#PageRank (Google)#HITS (Hyperlink Induced Topic Search)
Ranking: Use of web links! Web link: represent a relationship between the connected
pages
! The main difference between standard IR algorithms and web IR algorithms is the massive presence of web links
"web links are source of evidence but also source of noise"classical IR: citation-based IR"web track in TREC, 2000, TREC-9: Small Web task (2GB of
web data); Large Web task (100GB of web data, 18.5 million documents)
Seeking information: IR + AI approaches ECAI 2002 Lyon
33
Ranking: Anchor text! Represent referenced document
"why?#provides more accurate and concise description than the page
itself# (probably) contains more significant terms than the page itself
"used by ‘WWW Worm’ - one of the first search engines 1994"representation of images, programs, …
! Generate page descriptions from anchor text
Ranking: PageRank (1)! Designed by Brin and Page at Stanford University and used
to implement Google
"a page has a high rank if the sum of the ranks of its in-links is high# in-link of page p: a link from a page to page p#out-link of a page p: a link from page p to a page
"a high PageRank page has many in-links or few highly ranked in-links
! Retrieval: use cosine product (content, feature, term weight) combined with PageRank value
Seeking information: IR + AI approaches ECAI 2002 Lyon
34
Ranking: PageRank (2)! Random Surfer Model : user randomly navigates
"Initially the surfer is at a random page"At each step the surfer proceeds
# to a randomly chosen Web page with probability d called the “damping factor” (e.g. probability of random jump = 0.2)
# to a randomly chosen page linked to the current page with probability d-1 (e.g. probability of following a random outlink = 0.8)
! Process modelled by Markov Chain"PageRank PR of a page a = probability that the surfer is at page
a on a given time
PR(a) = Kd + K(1-d) ∑i=1,n PR(ai)/C(ai)
d set by system a = page pointed by ai for i=1,nK normalisation factor C(ai) = number of outlinks of ai
Ranking: HITS = Hypertext Induced Topic Search
! Originated from Kleinberg, 1997 (also referred to as the “The Connectivity Analysis Approach”)
! Broad topic queries produce large sets of retrieved results"abundance problem ⇒ too many relevant documents"new type of quality measure needed ⇒ distinguish the most
“authoritative” pages ⇒ high-quality response to a broad query
! HITS: for a certain topic, it identifies "good authorities
#pages that contain relevant information (good sources of content)"good hubs
#page that point to useful pages (good sources of links)
Seeking information: IR + AI approaches ECAI 2002 Lyon
35
Ranking: HITS (2)! Intuition
"authority comes from inlinks"being a good hub comes from outlinks
"better authority comes from inlinks from good hubs"being a better hub comes from outlinks to good authorities
! Mutual reinforcement between hubs and authorities"a good authority page is pointed to by many hub pages"a good hub page point to many authority pages
! Use the set of pages S that are retrieved (e.g. k = 200 top-ranked pages) + set of pages T that point to or are pointed to by retrieved set of pages S
Ranking: HITS (3)! Computation of hub and authority value of a page through the
iterative propagation of “authority weight” and “hub weight”
! Initially all values equal to 1! Authority weight of page x(p)
"if p is pointed to by many pages with large y-values, then it should receive a large x-value
x(p) = Σqi→p y(qi)! Hub weight of page y(p)
"if p points to many pages with large x-values, then it should receive a large y-value
y(p) = Σp→qi x(qi)! After each computation (iteration), weights are normalised
Seeking information: IR + AI approaches ECAI 2002 Lyon
36
Web IR: Metasearch (1)! Problems of Web search engines:
"limited coverage of the publicly indexable Web"index different overlapping sections of the Web"based on different IR models"different results to the same query
⇒ users do not have the time, knowledge to select the most appropriate search engines with regard to their information need
! Metasearch engines"Sends query to several search engines, Web directories,
databases"Collect results"Unify (merge) them - Data fusion
Web IR: Metasearch (2)! Divided into phases
"search engine selection# topic-dependent, past queries, network traffic, etc
"document selection#how many documents from each search engine?
"merging algorithm#utilise rank positions, document retrieval scores, titles & abstracts,
etcMetasearcher URL Sources usedMetaCrawler www.metacrawler.com 13Dogpile www.dogpile.com 25SavvySearch www.search.com > 1000
Seeking information: IR + AI approaches ECAI 2002 Lyon
37
Web IR: Issues! Modelling! Querying! Distributed architecture! Ranking! Indexing! Dynamic pages! Browsing! User interface! Duplicated data! Multimedia! Context
Web IR: Context! Results of search engines are identical, independent of
"user"context in which the user made the request
! adding context information for improving search results ⇒ focus on the user need and answer it directly "explicit context
#query + category"implicit context
#based on documents edited or viewed by user"personalised search
#previous requests and interests, user profile
Seeking information: IR + AI approaches ECAI 2002 Lyon
38
Structured document retrieval! In standard IR, documents are considered as atomic
information units whatever their type or size"Indexed as a whole
# Indexes do not express the internal organisation of the discourse set by the author(s)
"Retrieved as a whole#Users cannot retrieve independent components of documents that might
be more adapted (more focussed ) to their information needs
! New standards (SGML, XML, HTML, ODA…)! MPEG-7 for audio-visual data
Structured document
Seeking information: IR + AI approaches ECAI 2002 Lyon
39
SDR: Impact of structure! Searching = querying and browsing
"Complementary advantages and limitations"Both based on explicit manipulation of structure
#Querying: attributes, logical structure#Browsing: links
"Disorientation
SDR: Approaches! Hypermedia
! Passage retrieval
! Indexing/ retrieving hierarchical structure (aggregation)
Seeking information: IR + AI approaches ECAI 2002 Lyon
40
SDR: Passage retrieval
! Apply IR techniques to parts, “passages”, rather than whole documents"Return ranked documents based on passages"Return ranked passages
#Combination of evidence (local + global)
! Three types of structure:"Discourse: sentence, section, … "Semantic: subject or content of text "Window: based on fixed set of words
SDR: Aggregation-based approaches
object o2object o1
object representation R1 object representation R2
object oobject representation= R1 ⊕ R2type of linksnumber of childrentype of child…..
Seeking information: IR + AI approaches ECAI 2002 Lyon
41
SDR: Aggregation-based approaches
? = weight of term in Document that is an estimation of Document being represented by the term
• term weight of term in document• aggregated term weight of term from related components• importance associated with components (abstract vs. conclusion)• link type (hierarchy, linear, semantic, popularity)• portion of related components indexed by term • distribution of related components indexed by term
document
section3section2section1{0.7 wine} {0.9 cheese} {0.3 wine}
{? wine ? cheese}
Aggregation: Focussed Retrieval
r
r - relevant r
r
r
BEP
BEPbrowsing
Focus retrieval to Best Entry Points (BEPs)
Seeking information: IR + AI approaches ECAI 2002 Lyon
42
SDR: Issues! Information seeking-process for SDR
! Interfaces that support browsing and querying
! IR models"Retrieve at appropriate level of granularity"Focus retrieval to best entry points "Structured queries "XML retrieval
! Evaluation and test collection for SDR
! Index structure for SDR
Areas of AI used in IR! Natural language processing
! Knowledge representation# Expert systems# Logical formalisms, conceptual graphs, etc
! Machine learning# Short term: over a single session# Long term: over multiple searches by multiple users
! Computer Vision# OCR
! Reasoning under uncertainty# Dempster-Shafer, Bayesian networks, probability theory, etc
! Cognitive theory# User modelling
Seeking information: IR + AI approaches ECAI 2002 Lyon
43
Areas of AI used in IR:Three main roles
1. Information characterisation
2. Search formulation in information seeking
3. Support functions
Information characterisation - Approach 1! Replace document text (natural language) with a knowledge
base in an artificial language
! Directly manipulating the information available => Knowledge-base retrieval
! Allows for for question/answering queries! Much of the (textual) information is lost
"What will be put in the knowledge base"Issue of information extraction
! Problem with large collection, but was shown successful in specific domain (SCISOR)
Seeking information: IR + AI approaches ECAI 2002 Lyon
44
Information characterisation - Approach 2
! Keep documents and use knowledge base as access tool (query formulation)"Semantic-based access, concept-based access"Interface and presentation
! Better classification of document text and better access
! Criticism: problems of (automatic) linkages (documents have different style, language and level of discussion)
Information characterisation - Approach 3! Abandon knowledge base but use AI (syntactic level) to
characterise document content
! Sophisticated matching
! Use NLP to derive"Noun-phrases: “The mother of Jane <=> Jane’s mother”"Sentences: “The boy ate the apple <=> The apple was eaten by
the boy”
"Normalisation is necessary!
! Little of evidence of success (so far)
Seeking information: IR + AI approaches ECAI 2002 Lyon
45
Information characterisation - Approach 4
! Use AI to select good natural language index terms"Thesaurus construction"Compound terms
! Use world knowledge and a bit of linguistics (e.g. noun vs. verb, discourse)
Information seeking! Characterisation of the user’s information need (and not the
actual matching)! User modelling: “Automating the intermediary” giving the user
an intelligent front-end
! Over iterative searching and dialogue, determine use’s realinformation need
Medical doctor vs. medical student Student and general topic: look for a survey document
! BUT: users have difficulty expressing their information need ⇒difficult of manually or automatically deriving rules for systems
! Based on Expert Systems technologies
Seeking information: IR + AI approaches ECAI 2002 Lyon
46
Example of Rules! Data abstraction rules
# if precision <=20% then precision level is 1# if precision > 80% then precision level is 5# if retrieval size is 101-200 then retrieval level is 4(1 - very low … 5 very high)
! Heuristic matching rules# if precision level is 2 or 3 and retrieval level >2 then use narrowing
strategy! Refinement rules
# if a narrowing strategy is needed then select strategy “use terms that have high frequency in relevant records” with weight 0.8
Support functions1. Information extraction2. Abstracting and summarising3. Cataloguing (Ontology)4. Automatically linking parts of texts (Hypertext)5. Thesaurus/dictionary building(Linguistics)6. Story telling (News)
Seeking information: IR + AI approaches ECAI 2002 Lyon
47
IR: Bibliography! [AA97] M Agosti and J Allan. Introduction to the Special Issue on Methods and Tools for the Automatic Construction of Hypertext. IP&M
33(2):129-131, 1997.! [ACP00] M Agosti, F Crestani and G Pasi. Lectures on Information Retrieval, Third European Summer-School, ESSIR 2000, Varenna,
Italy, September 11-15, 2000, Revised Lectures, 2001.! [AS96] M Agosti and AF Smeaton (eds). Information Retrieval and Hypertext. Kluwer Academic Publishers, 1996.! [Bel00] RK Belew. Finding Out About: Search Engine Technology from a cognitive Perspective, Cambridge University Press, 2000.! [BR99] R Baeza-Yates and B Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.! [Cro87] WB Croft. Approaches to intelligent information retrieval. IP&M 23(4):249-254, 1987.! [CLR98] F Crestani, M Lalmas and CJ van Rijsbergen (eds). Information Retrieval: Uncertainty and Logics - Advanced models for the
representation and retrieval of information. Kluwer Academic Publishers, Boston etal, 1998.! [CLRC98] F Crestani, M Lalmas and CJ . van Rijsbergen, Iain Campbell: ``Is This Document Relevant? ... Probably'': A Survey of
Probabilistic Models in Information Retrieval. ACM Computing Surveys 30(4):528-552, 1998.! [FB92] W Frakes and R Baeza-Yates. Information Retrieval. Data Structures and algorithms. Prentice Hall, 1992.! [FR97] N Fuhr and T Roelleke: A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems.
TOIS 15(1):32-66, 1997.! [GF98] DA Grossman and O Frieder. Information Retrieval: Algorithms and Heuristics. Kluwer Academic Publishers, 1998. ! [Ing92] P Ingwersen. Information Retrieval Interaction. Taylor Graham, London, 1992.! [Kor97] RR Korfhage. Information Storage and Retrieval. Wiley, 1997.! [Kow97] G Kowalski. Information Retrieval Systems, Theory and Implementation. Kluwer Academic Publishers, Boston, USA, 1997. ! [KP93] CSG Khoo and DCC Poo. An expert system approach to online catalog subject searching. IP&M, 30(2):223-238, 1993.! [Kra86] DH Kraft. Research into Fuzzy Extensions of Information Retrieval. SIGIR Forum 20(1-4): 12-13, 1986.! [JR90] P Jacobs and L Rau. SCISOR, Extracting information from on-line news. Communications of the ACM 33(11): 88-97, 1990.
IR: Bibliography! [LLD+02] RWP Luk, HV Leong, TS Dillon, ATS Chan, WB Croft and J Allan. A survey in indexing and searching XML
documents. 415-437 JASIST Volume 53, Number 6, 2002.! [Pet01] C Peters. Cross-Language Information Retrieval and Evaluation, Workshop of Cross-Language Evaluation Forum,
CLEF 2000, Lisbon, Portugal, September 21-22, 2000, Revised Papers. Springer 2001! [Rij79] CJ van Rijsbergen. Information Retrieval. Butterworths, 1979. http://www.dcs.glasgow.ac.uk/Keith/Preface.html.! [Rij86a] CJ van Rijsbergen. A New Theoretical Framework for Information Retrieval. ACM SIGIR’86, pp 194-200, 1986.! [Rij86b] CJ van Rijsbergen. A Non-Classical Logic for Information Retrieval. The Computer Journal 29(6):481-485, 1986.! [Rij92] CJ van Rijsbergen (ed). The Computer Journal, Special Issue on Information Retrieval, 35(3), 1992.! [SKCT88] T Saracevic, P Kantor, AY Chamis, and D Trivison. A study of information seeking and retrieving. I. Background
and methodology. Journal of the American Society for Information Science, 39(3):161-176, 1988. ! [SM83] G Salton and MJ McGill. Introduction to Modern Information Retrieval. McGraw-Hill Book Co., New York, 1983.! [Sme92] AF Smeaton. Progress in the Application of Natural Language Processing to Information Retrieval Tasks. In The
Computer Journal, 36 (3), 1992.! [Spa91] K Sparck Jones. The role of Artificial Intelligence in Information Retrieval, JASIS 42(8) pp 558-565, 1991.! [Spa99] K Sparck Jones. Information retrieval and artificial intelligence. Artificial Intelligence, 114:257-281, 1999.! [Spa00] K Sparck Jones. Further reflections on TREC. Information Processing and Management 36(1):37-85, 2000.! [SW92a] K Sparck Jones and P Willet. Readings in Information Retrieval. Morgan Kaufman, 1997.! [SW92b] C Stanfill and DL Waltz. Statistical Methods, Artificial Intelligence, and Information Retrieval. In text-based
intelligent systems. Current research and practice in Information Extraction and Retrieval (ed PS Jacobs) Lawrence ErlbaumAssociates Intelligent, 1992.
! [TC90] H Turtle and WB Croft.Inference networks for document retrieval. ACM SIGIR, pp 1-24,! Brussels, Belgium, 1990. ! [WMB94] IH Witten, A Moffat and TC Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van
Nostrand Reinhold, New York, 1994.
Seeking information: IR + AI approaches ECAI 2002 Lyon
48
IR: Conferences and journals! Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (www.acm.org/sigir/)! The Text REtrieval Conference (TREC), NIST Special Publication. (trec.nist.gov/)! International Conference on Information and Knowledge Management (CIKM).! Information Retrieval Colloquium British Computing Society (BCS IRSG), now ECIR.! Hypermedia - Information Retrieval - Multimedia (HIM).! RIAO Conference, Content-Based Multimedia Information Access.! ECDL and ADL (Digital libraries)! IP&M (Information Processing and Management), Elsevier.! TOIS (Transactions On Information Systems), ACM.! JASIS (Journal of the American Society for Information Science), ASIS.! JDOC (Journal of DOCumentation), ASLIB.! IR (Information Retrieval), Kluwer.! JIIS (Journal of Intelligent Information Systems), Kluwer.! International Journal of Digital Library (IJODL), Springer CIMIC.! Journal of Digital Information, BCS and OUP.
! British Computer Society, Information Retrieval Specialist Group (irsg.eu.org/)
Seeking for Information: Artificial Intelligence
Alison CawseyDepartment of Computing and Electrical EngineeringHeriot-Watt UniversityEdinburgh, EH14 4AS, [email protected]
Seeking information: IR + AI approaches ECAI 2002 Lyon
49
Introduction! How can we improve retrieval by applying methods
from Artificial Intelligence (AI)?
! Start by reviewing:"What is AI?"What is the retrieval task?
Artificial Intelligence! Concerned with automating or modelling intelligent
and commonsense behaviour.
! Represent and reason with information at the level of “meaning” (not surface strings).
! Use knowledge, of world, of people, of typical situations.
Seeking information: IR + AI approaches ECAI 2002 Lyon
50
Revisiting Retrieval! Goal: Provide the user with the information that they
need.
! How might an intelligent assistant do this?"Analyse user’s requirements."Collect information from many sources. "Read, interpret, filter."Create a report or summary
Intelligent Retrieval! About finding information, not documents.
! Work at level of knowledge, not text.
! AI provides methods to extract knowledge from text, reason with it, and communicate results to user.
Seeking information: IR + AI approaches ECAI 2002 Lyon
51
AI Approaches: Outline1. Speech and language technology: to extract
meaning from text.
2. Ontologies and rich metadata: to represent document and domain semantics.
3. Intelligent filtering and presentation.
4. Practical examples using XML.
Speech and Language Technology
! Involves: Analysis and synthesis of speech and language.
John loves Mary..
loves(john, mary)
Speech recognition and synthesis.
Natural language understanding and generation
Seeking information: IR + AI approaches ECAI 2002 Lyon
52
SLT: Uses in IR! If we can extract the meaning of a document and
query we can:"Use more semantically significant query terms."Retrieve resources which use terms different to that
in query."Create coherent summaries and tables of key data."Improve cross-language retrieval.
[17]
Example: Information Extraction! SLT can be used to extract key data from texts,
using robust analysis techniques.
Celtic played Rangers in a 2-2 draw..
team1
team2score
Celtic
Rangers2-2
IE
[9]
Seeking information: IR + AI approaches ECAI 2002 Lyon
53
Summarisation! IE techniques can be combined with text generation
techniques to create high quality summaries.
Lots of text
Even more
Team 1 ….
Team 2
Score
Celtic played Rangerswith a score of 2-2 in a
match described as variously “exciting”
and “deadly dull”.
[16] [26] [19]
Multilingual Retrieval! Can also improve cross-language retrieval:
! Simple word-by-word translation may be inadequate.
fromageAnglaishistoire
A tale of amouse and his cheddar
Query (language 1) Doc (language 2)
[23]
Seeking information: IR + AI approaches ECAI 2002 Lyon
54
Meaningful Index Terms! SLT can be used to try and extract more meaningful
query terms and document indexes.! Stemming (categorise -> category)! Disambiguate words with many meanings (e.g.,
bank, table)! Use noun phrases as index terms (e.g., “learning
support centre”)! Latter two had limited success
[25][11]
Multimedia Retrieval! SLT used to aid in retrieval of video and speech
(e.g., TV news)
! If no audio transcript, speech recognition can be used.
! Speech recognition doesn’t have to be perfect -matching with query can proceed probabilistically.
[1][20]
Seeking information: IR + AI approaches ECAI 2002 Lyon
55
Summary: Speech and Language Technology
! Work at level of information and meaning rather than words.
! Analyse whole documents:"Extract information, translate..
! Improve query/index terms:"e.g., extract noun phrases
! Recognise spoken language
Ontologies and Metadata! SLT handles retrieval by extracting structure and meaning
from text.
! Other approach is for authors to provide more structure.
! Add metadata to resource, describing resource using set categories.
! Provide ontologies, defining concepts and relations.
Seeking information: IR + AI approaches ECAI 2002 Lyon
56
Metadata! Simple example:
"Title: Essence of AI"Creator: Alison Cawsey"Subject: AI"Publisher: Prentice Hall"Date: 1997
! Standard fields can be used - “Dublin Core” is one standard.
! Can then search on metadata: subject = AI AND Date = 1997
[32][33]
Rich Metadata! Metadata can be based on a relational model
providing rich descriptions of resources:
http://.. http://..
Alison alison@cee
creator
name email
[29]
Seeking information: IR + AI approaches ECAI 2002 Lyon
57
Semantic Web! This leads to idea of semantic web.
! Rich network of resources with meaningful descriptions and relations.
! Augmented with ontology giving relations between concepts (e.g., AI part of CS)
! Search on knowledge/meaning, not text. Inference on concepts used in search.
[5]
Resource Description Framework (RDF)
! RDF provides rich metadata scheme; can be written in XML.<rdf:description about=“http://mydoc”>
<dc:creator rdf:resource=“http://me”></rdf:description>
! RDF schemas provide simple vocabularies/ ontologies - other systems (e.g,. OIL) augment.
! Logics (e.g, description logic) used in inferencing on these.
[29][31][10][12]
Seeking information: IR + AI approaches ECAI 2002 Lyon
58
Summary: Metadata! Authors create meaningful descriptions of
resources.! Schemas/ontologies provide a reference point for
vocabularies and concepts used.! Then reason at level of concepts (e.g., subject=AI
subject=CS).! But metadata authoring and maintenance load!
Intelligent Filtering and Presentation
! Finding information is much more than matching query to document:
Filter Assemblepresentationresources
User profile, context,
task, query
Seeking information: IR + AI approaches ECAI 2002 Lyon
59
Adaptive web sites! New generation web sites can adapt content and
display to user, dynamically generating.
! Personalised web pages involve information selection, assembly and presentation to suit user.
! Source data can be extracted from text, or available in structured form (XML or DBs).
[3][6][7][18]
User profile and context! An adaptive site needs to know about the context.
"user’s preferences, task, location, time, hardware (display)..
! Context may be determined:"automatically (e.g., browser)"by asking user."By monitoring user behaviour
! Consider: Travel Guide[15]
Seeking information: IR + AI approaches ECAI 2002 Lyon
60
Examples! Personalised Travel Guide
"Adapts content to user’s location, time and interests, so nearby, open attractions highlighted.
! Personalised Health Information" Adapts content to highlight information of relevance
to user’s problems and treatments.
! Privacy issues may be paramount.[8][4]
Netbots! Some virtual web sites use Intelligent agents.
! Agent acts on behalf of user to collect, assemble and comment on information retrieved.
! Agent may hold information about the user, and negotiate with other agents for services/information to support user’s task.
[13]
Seeking information: IR + AI approaches ECAI 2002 Lyon
61
Example: PPP! PPP (personalised plan-based presenter) collects
and assembles information, using animated agent to present with respect to user’s information needs. Follow on: MIAU, SmartKom.
[28][24][2]
Presentation, Filtering and Search
! All concerned with supplying user with needed information.
! Search generally based on query, often run once.! Filtering based on a profile or filter acting for an
extended period! Presentation may allow adjusting selection and
emphasis of information from given basic data.
[21][22][27]
Seeking information: IR + AI approaches ECAI 2002 Lyon
62
Dangers of PersonalisationPersonalisation may:! Result in less coherent resources, if automatically
constructed. (e.g., summarisation systems)! Result in misleading documents, if aspects included
by author are omitted in third party adaptation.
[35]
Summary: Intelligent Presentation
! Providing the right information may involve selection, assembly and presentation.
! All may “intelligently” take into account aspects of context.
! Danger though in losing coherence of human purpose-authored documents.
Seeking information: IR + AI approaches ECAI 2002 Lyon
63
XML (eXtensible Markup Language)
! XML provides a basis for fairly simple illustrations of some ideas and techniques.
! This last section therefore introduces XML and gives some “try-this-at-home” examples of personalised presentation.
XML! XML allows us to markup text using tags which
represent meaningful domain concepts."<library-list>
<book><author> Alison </author>
…! Allows search and presentation based on domain
concepts."e.g., Find all resources containing books written by
Alison.
Seeking information: IR + AI approaches ECAI 2002 Lyon
64
Presenting XML! Structure, content and presentation of XML
documents kept separate.
<book><author>
Book list:• “Essence of AI” by Alison Cawsey
Alison CawseyEssence of AI
structure content
presentation
Presenting XML! Presentation controlled by stylesheets (usually XSL
- eXtensible Stylesheet Language)! Define templates describing how to
present/transform part of document! <xsl:template match=“book”>
<li><xsl:value-of select=“title”/>“ <xsl:value-of select=“author”/> “
</li>
[30]
Seeking information: IR + AI approaches ECAI 2002 Lyon
65
Presenting XML! Using two different stylesheets get two very
different presentations.
Booklist:• “Essence of AI”by Alison Cawsey.• “Something else”by A.N.Other
T itle A u th o r
E sse nce o f A I A liso n C a w sey
S o m e th in g e lse A .N .O th e r
Personalised Presentation
! We can keep data about the user in another XML file (so have basic User Model + Domain Knowledge).
! Stylesheet can contain rules so output depends on user:
<xsl:if test=“document($UP)/user/interest[. = ‘books’]”>Special for book lovers..
</xsl:if>
! But awkward for serious reasoning/ inference.
Seeking information: IR + AI approaches ECAI 2002 Lyon
66
XML and Resource Descriptions.! XML used for RDF (Resource Description
Framework).! May support better querying - but also flexible
description of resource to user, e.g.:
Query: type=lesson plan and grade = 1-5
Description:“Astronomy” by Jim Bloggs, is a lesson plan for primary school teachers.A large black umbrella is required.
[36]
Summary: XML! XML allows authors to create documents with
meaningful markup.
! Simple adaptations of presentation can be easily done using XSL.
! But for real “intelligence” in presentation have to parse XML and create structured data format suitable for reasoning systems.
Seeking information: IR + AI approaches ECAI 2002 Lyon
67
Summary and Issues! Challenge is to give user information they need.
! Given resource may want to:"describe it"retrieve it, given query/context."extract info from it."create a new integrated presentation.
! But costs in terms of coherence of resources and user effort.
Problems! Metadata approaches require author to create and
maintain descriptions.
! Information extraction approaches require configuring for domain.
! Personalising and assembling presentation has risks of creating pages that do not reflect the source document author’s intentions.
Seeking information: IR + AI approaches ECAI 2002 Lyon
68
The Future! More folk will use XML, so powerful querying and
presentation control.
! More dynamic pages:"but have to be done well to be better than good human
authored docs!
! May see expansion of metadata:"if human cost of creation reduced.
! Role of SLT may be in supporting metadata extraction.
AI: References! [1] Allan, James "Perspectives on Information Retrieval and Speech," in Information Retrieval
Techniques for Speech Applications, Coden, Brown and Srinivasan, editors. pp. 1-10, 2001
$ [2] E. André, J. Müller, and T. Rist. WIP/PPP: Automatic Generation of Personalized Multimedia Presentations. In ACM Multimedia 96, pages 407-408. ACM Press, November 1996.
$ [3] D Bental, L MacKinnon, H Williams, D Marwick, D Pacey, E Dempster and A Cawsey, Dynamic Information Presentation through Web-based Personalisation and Adaptation - An Initial Review, In Joint Proccedings of HCI 2001 and IHM 2001, A Blandford, J. Vanderdonck and P Gray (Eds), pp
485-500, Springer 2001. .
! [4] Bental, D.S., Williams, W.H., Pacey, D., Cawsey, A.J., McKinnon, L.M., and Marwick, D.H. 2001a. Dynamic personalization of Web resources for presenting healthcare information. In Proceedings of MEDICON 2001, Croatia, June 2001, IFMBE Proceedings, 86-89.
! [5] Tim Berners-Lee, James Hendler , The Semantic Web, Scientific American, May 2001, and Ora Lassila
! [6] Bordegoni, M., Faconti, G., Feiner, S., Maybury, M., Rist, T., Ruggieri, S., Trahanias, P., and Wilson, M. 1997. A standard Reference Model for Intelligent Multimedia Presentation Systems. Computer Standards and Interfaces 18, 477-496.
! [7] Brusilovsky, P. 1996. Methods and Techniques of Adaptive Hypermedia. User Modelling and User-Adapted Interaction, 6(2-3), 87-129
Seeking information: IR + AI approaches ECAI 2002 Lyon
69
AI: References! [8] Cheverst, K., Davies, N., Mitchell, K., And Smith, P. 2000. Providing tailored (context-aware)
information to city visitors. In Adaptive Hypermedia and Adaptive Web-Based Systems, 2000, P. Brusilovsky, O. Stock, And C. Strapparava Eds. Springer, 73-85.
! [9] J. Cowie, Y. Wilks. Information Extraction. In R. Dale, H. Moisl and H. Somers (eds.) Handbook of Natural Language Processing. New York: Marcel Dekker. (2000)
! [10] Stefan Decker, Dan Brickley, Janne Saarela, Jürgen Angele A Query and Inference Service for RDFin QL'98 - The Query Languages Workshop, 1998.
! [11] Feng, F. and Croft, W.B (2000). "Probabalistic Techniques for Phrase Extraction," in Information Process Management, March 2001, vol. 37, No.2, pp. 199-220..
! [12] D. Fensel, I. Horrocks, F. Van Harmelen, S. Decker, M. Erdmann, and M. Klein. Oil in a Nutshell, In Knowledge Acquisition, Modeling, and Management. Proceedings of the European Knowledge Acquisition Conference (EKAW-2000). Lecture Notes in Artificial Intelligence, LNAI, Springer-Verlag, October 2000
! [13] Klusch, M. Information Agent Technology for the Internet: A Survey Journal on Data and Knowledge Engineering, Special Issue on Intelligent Information Integration, D.Fensel (Ed.), Vol. 36(3), Elsevier Science, 2001
! [14] Kobsa, A., Koenemann, J., And Pohl, W. 2001. Personalized Hypermedia Presentation Techniques for Improving Online Customer Relationships. The Knowledge Engineering Review 16(2), 111-155
! [15] Kobsa, A., And Koychev, I. 2000. Learning about Users from Observation. In Adaptive User Interfaces: Papers from the 2000 AAAI Spring Symposium. Menlo Park, CA: AAAI Press.
AI: References! [16] Julian M. Kupiec, Jan Pedersen, and Francine Chen. A Trainable Document Summarizer.
In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 68--73, Seattle, Washington, July 1995.
! [17] David D. Lewis and Karen Sparck Jones. Natural language processing for information
retrieval. Communications of the ACM, 39(1):92-101, January 1996.
! [18] Manber, U., Patel, A., And Robison, J. 2000. Experience with Personalization of Yahoo. Communications of the ACM, Vol 43, Number 8, 35-39.
! [19] Inderjeet Mani (Editor), Mark T. Maybury (Editor) , Advances in Automatic Text Summarization, The MIT Press, 1999.
! [20] Pedro J. Moreno, J.M. Van Thong, Beth Logan, Gareth J.F. Jones. From Multimedia Retrieval to Knowledge Management, Computer Vol 35 no 4 2002 pp 58-66
! [21] Mooney. R.J., And Roy, L. 1999. Content-based book recommending using learning for text categorization. In SIGIR'99 Workshop on Recommender Systems: Algorithms and Evaluation, 1999.
! [22] Murthy, K.R.K., And Keerthi, S.S. 1999. Context Filters for Document-Based Information Filtering. In Proceedings of International Conference on Document Analysis and Recognition'99 (ICDAR '99), Bangalore, India, 1999.
Seeking information: IR + AI approaches ECAI 2002 Lyon
70
AI: References! [23] Douglas W. Oard, Serving Users in Many Languages: Cross-Language Information
Retrieval for Digital Libraries, D-Lib Magazine, December 1997 http://www.dlib.org/dlib/december97/oard/12oard.html
! [24] T. Rist, E. André, and J. Müller. Adding Animated Presentation Agents to the Interface. In Proceedings of the 1997 International Conference on Intelligent User Interfaces, pages 79-86, Orlando, Florida, 1997.
! [25] Mark Sanderson. Word Sense Disambiguation and Information Retrieval (1997) Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval
! [26] Sanderson, M. Accurate user directed summarization from existing tools Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM 98), ps 45-51, 1998.
! [27] Shardanand, U., And Maes, P. 1995. Social Information Filtering: Algorithms for Automating “Word of Mouth”. In Proceedings CHI'95, Denver CO, May 1995, ACM Press, 210-217.
AI: References! [28] Wahlster, W., Reithinger, N., Blocher, A. (2001): SmartKom: Towards Multimodal Dialogues
with Anthropomorphic Interface Agents. In: Wolf, G., Klein, G. (eds.), Proceedings of International Status Conference "Human-Computer Interaction", DLR, Berlin, Germany, October 2001, p. 23 - 34.
! [29] World Wide Web Consortium Resource Description Framework (RDF) Model and Syntax, http://www.w3.org/TR/REC-rdf-syntax/, 1999.
! [30] World Wide Web Consortium XSL Transformations (XSLT) W3C Recommendation, http://www.w3.org/TR/xslt, 1999.
! [31] [World Wide Web Consortium Resource Description Framework (RDF) Schema Specification (W3C Proposed Recommendation) http://www.w3.org/TR/PR-rdf-schema/, 1999
! [32] The Dublin Core Metadata Initiative, http://www.purl.org/DC
! [33] [IMS Metadata Specification http://www.imsproject.org/
! [34] [World Wide Web Consortium XML Schema Part 1: Structures, W3C Working Draft http://www.w3.org/TR/xmlschema-1/
! [35] Cawsey, A., “Presenting tailored resource descriptions: will XSLT do the job?”, in Computer Networks (3) 713-722, 2000.
! [36] Cawsey, A., et al, “Preventing misleading presentations of XML documents: Some initial proposals”, in Proc 2nd International Conference on Adaptive Hypermedia, Aix-en-provence, 2002.
Seeking information: IR + AI approaches ECAI 2002 Lyon
71
AI: Useful Links! Cross language IR: http://raven.umd.edu/dlrg/clir/! Summarisation: http://www.cs.columbia.edu/~radev/summarization/! Information Extraction: http://www.isi.edu/~muslea/RISE/Resources.html! Intelligent IR (at UMASS) http://ciir.cs.umass.edu/! Multimedia retrieval:
http://www-sal.cs.uiuc.edu/~sharad/cs491/readinglist.html! IR and Natural Language Processing: http://web.syr.edu/~diekemar/ir.html! Language Technology: http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html! OIL: http://xml.coverpages.org/oil.html! Semantic Web http://www.w3.org/2001/sw/! Netbots (intelligent information agents): http://www.dbgroup.unimo.it/IIA/! Information filtering: http://www.clis2.umd.edu/dlrg/filter/! Adaptive hypermedia: http://wwwis.win.tue.nl/ah/! XML: http://xml.coverpages.org/
IR+AI: Conclusions! Information seeking: “everyday” users looking for information
to satisfy their information needs
! Information retrieval approaches#Return “information containers”#Mostly at word level, but it works well although context is needed
for web retrieval.
! Artificial retrieval approaches#Return “answers”#Attempt to capture meaning, but it is hard in particular with large
data set (efficiency)
Seeking information: IR + AI approaches ECAI 2002 Lyon
72
IR+AI: Four scenarios
IR AI
IRAIIR AI
application dependent
AIIR
IR+AI: Demos
1. HySpirit: Experimental platform for developing, implementing and evaluating IR systems.
2. Information personalisation using XML and XSLT.
Seeking information: IR + AI approaches ECAI 2002 Lyon
73
Demo: HySpirit! Retrieval platform extending database models with probability theory
" relational, logical and object-oriented layers modelling hypermedia and knowledge retrieval.
"uncertainty, incompleteness and inconsistency"aggregation of uncertain evidence
! Knowledge-oriented retrieval in semi-structured and heterogeneous data sources. "spatial, temporal, semantic relationship " fact-oriented and content-oriented searching and browsing.
! Easy parameter setting to support retrieval experiments and evaluation.
(qmir.dcs.qmul.ac.uk)