outline - university of otago · is different: query formulation search can contain both structural...
TRANSCRIPT
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Query Formulation for XMLRetrieval with Bricks
Bricks: , the building blocks to tackle queryformulation issues in structured document retrieval
Roelof van Zwol
Jeroen Baas
Herre van Oostendorp
Frans Wiering
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
OutlineMotivationObjectives with BricksTheory behind BricksUsability experimentConclusions
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
MotivationStructured document retrieval (XML retrieval)is different:
Query formulationSearch can contain both structural and textual conditions.
Retrieval strategyExploit the document structure to retrieve relevantdocument fragments.
Result presentationPresent individual document fragments, or clusteredfragments (browse and fetch), requires new navigationtechniques.
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Running example:A user planning his holiday, has thefollowing specific information need:
“Find information about traveling todestinations with major airports, where theweather has a tropical climate.”
Based on the Lonely Planet collection.
Legend:Structural conditions
Textual conditions
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Query formulation
Techniques:
Keyword based (NEXI-CO)
Natural Language Processing
Combination of keyword- and structure-based search (NEXI-CAS)
Bricks…
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Query formulationNEXI-Content Only:
Keyword-based query, containing words andphrases.User is ignorant of document structure.Any document or XML document fragmentcan be returned.
Information request:“major airport” destination weather “tropicalclimate”
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Query formulationNatural language processing:
Complex formulation, including structure is very wellpossible in theory.Close to user’s “mental model”.
User needs to know about structure.
Brisbane’s NLP2NEXI engine:Kindly provided by [Geva et al.]We ran into conversion problems, with respect to recursivestructure, and generation of complex NEXI-queries.
Therefore not included into experiment.
Information request:Find information about traveling to destinations with majorairports, where the weather has a tropical climate.
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Query formulationNEXI - Content and Structure:
Query contains structural requirements.User is aware of document structure, and …User is capable to specify structural constraints.
Powerful query language for structured documentretrieval.
Information request://destination[about(.//weather, “tropical climate”)] //getting_there_and_away[about(., “major airport”]
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Query formulation
Task Complexity 71
Form
ulat
ion
com
plex
ity
Easy
Hard
ContentOnly
NEXIContent
AndStructure
Bricks (?)
Natural LanguageProcessing
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Objectives with BricksMinimize the complexity of the queryformulation process
Minimize the required knowledge of thedocument structure
Maximize the expression power asprovided by NEXI
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Running example in bricks
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Theory behind BricksGraphical approachIntuition of a mental modelBuilding blocksAvoid information overload
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Graphical approachReduces syntactical formulation issues.
Bricks is NEXI-compatible.
Reduces/eliminates knowledge of documentstructure.
Bricks uses of pull-down lists.
Alternative in development: TreeSearch.
Avoids formulation of malformed informationrequests.
Bricks uses extensive checks for both query syntaxand structure.
Referred to in literature as: Direct manipulation [Preece et al.]
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Intuition of a mental modelA user has a mental model of the informationhe is looking for.
Effectivity and efficiency of user increaseswhen the formulation process is close to theuser’s mental model.
User thinks in ‘natural language’…… and is likely to specify the requested element ofretrieval first:
“Find information about traveling to…”
… NEXI-spefication is not user oriented, butstructure oriented…
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Building blocks
In Brick, the formulation process is splitinto small building blocks, that a userneeds to complete, to specify hisinformation need.
Similar theory as used for form-wizards.
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Avoid information overload
The risk of too many options…
Syntax: allow a minimal (logical) set of nextsteps.
Structure: present a limited (logical) set ofstructural elements. (LP: 271 unique elements)
Semantic
Logical
Presentation
<destination><weather>
<attractions>
<p> <b> <i>
<ul><li>
<section><chapter><image>
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Usability experimentHypothesisSetupResultsTask complexityObservations
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Hypothesis
H1:
Use of sophisticated query formulationtechniques will lead to a higher effectivenessof the task performance.
NEXI-CO and Bricks should perform significantlybetter than NEXI-CO, with respect to successfultask completion.
The difference in effectiveness is dependant onthe task complexity.
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Hypothesis
H2:
The Bricks approach for query formulationwill increase the efficiency of the user for agiven task.
When time is taken into account, with respect toeffectivity, we expect that users will needsignificantly less time to (successfully) completethe task, when using NEXI-CO and Bricks.
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
SetupDocument collection: Lonely Planet.
Systems: NEXI-CO, NEXI-CAS, and Bricks.
Users: 54 MIR-students, trained in NEXI andSDR.
Topics: 27 topics, sorted in three complexitygroups.
Survey: prior and after the experiment theparticipants filled in a survey (satisfaction)
Experience: Before the experiment the userspracticed with the TERS-interface and searchengines. (reduces learning effects)
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Overall results
Effectivity: significant diff. between Brick and NEXI-CAS vs. NEXI-CO. (H1)
Efficiency: Bricks most efficient. (sign. diff.) (H2)
Time: users need significantly more time to formulateinformation need with NEXI-CAS.
Satisfaction: no significant difference, but NEXI-CASpreferred, followed by Bricks.
4.10.150.27198NEXI-CO
4.70.140.34245NEXI-CAS
214
Time
0.32
Effectivity
0.16
Efficiency
4.6Bricks
SatisfactionSystem
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Task complexity
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Task complexity
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Task complexity
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
Observations
Completely different working procedures:
NEXI-CO: formulate query, inspect resultsand refine. (many iteration steps)
NEXI-CAS: step-wise construction andvalidation of NEXI-query, submissions tocheck syntax (few iteration steps)
Bricks: Longer construction time, almost noiterations, submit to check results.
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
ConclusionsUser need to be capable to adequately use thestructure of a document, to make SDR work inpractice.
Three objectives for query formulation:Adequate expression power
Minimize syntactical formulation problems
Minimize required knowledge of document structure
Bricks: graphical approach, intuition of mentalmodel, building blocks, avoid informationoverload. (Online-demo available)
Outline - Motivation - Objectives - Theory - Usability - Conclusions
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands
ConclusionsExperiment:
Sophisticated query formulation techniques have apositive influence of task performance (effectivenessof NEXI-CAS and Bricks)
Bricks is more efficient, since it allows the user tosuccessfully perform their task in a shorter amountof time.
Task complexity vs. effectivity: negative correlation,but Bricks and NEXI-CAS are more effective for mid-and highly complex task.
Task complexity vs. efficiency: strong negativecorrelation.
NEXI-CAS is sensitive with respect to time needed for taskcompletion.
Outline - Motivation - Objectives - Theory - Usability - Conclusions