![Page 1: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/1.jpg)
ISKO 2010 Marianne Lykke
Marianne LykkeRoyal School of Library and Information Science
Susan L. Price and Lois M. L. DelcambrePortland State University
ISKO 2010 ConferenceSapienza University of Rome, Faculty of Philosophy
February 23 - 26, 2010
Using semantic components to represent and search domain-specific documents: An evaluation of indexing accuracy and consistency
![Page 2: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/2.jpg)
ISKO 2010 Marianne Lykke
Agenda
• Problem and motivation
• Semantic component model
• Research questions
• Test design
• Results
• Conclusions
![Page 3: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/3.jpg)
ISKO 2010 Marianne Lykke
Problem and motivation
Challenges for information retrieval in domain-specific digital libraries:
• Domain-specific libraries often contain large sets of similar documents about few topics
o Important to be able to distinguish between topical similar documents
• Domain experts often have specific information needs targeting a single “right answer”, specified by domain-specific facets.
o Important to be able to limit search to domain-specific dimensions
(e.g. Leckie et al., 1996; Fagin et al., 2003; Freund et al., 2005; Hearst et al., 2006)
![Page 4: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/4.jpg)
ISKO 2010 Marianne Lykke
Problem and motivation
• Little time for information retrieval
o Important that then relevant documents are highly ranked and retrieved by first query
• Distributed indexing, carried out by indexers with varied degree of indexing competenceo Important to address classical indexing problems:
quality, exhaustivity, specificity, consistency (e.g. Leckie et al., 1996; Fagin et al., 2003; Freund et al., 2005; Hearst et al., 2006)
![Page 5: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/5.jpg)
ISKO 2010 Marianne Lykke
Semantic component model
• Semantic components model developed to facilitate formulation of specific, structured queries covering the search topic exhaustively by domain-specific dimensions
• Two-level model dividing a given collection into a set of document classes, each class with an associated set of semantic components
• Based on assumptions thato Domain experts know document genres within a certain
domain: content and structure (Dillon, 1991; Orlikowski & Yates, 1994; Bishop, 1999; Vaughan & Dillon, 2005)
o Domain-specific document content and structure correspond to domain-specific information needs (Ely et al, 1999,2000; Price, Delcambre, Nielsen, 2006)
![Page 6: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/6.jpg)
HIO 2009 Marianne Lykke
SC: General information
SC: Practical information
Document class: Clinical method
![Page 7: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/7.jpg)
HIO 2009 Marianne Lykke
SC: General information
SC: Risk factors
After treatment
Document class: Clinical method
![Page 8: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/8.jpg)
ISKO 2010 Marianne Lykke
Semantiske component model
Document class Semantic component Document class Semantic component
Clinical problem General informationDiagnosisReferralTreatment
Clinical unit Function and specialtyPractical informationReferralStaff and organization
Clinical method General informationPractical informationReferralAftercareRisksExpected results
Drugs General informationPractical informationTarget groupEffectSide effects
Services General informationPractical informationReferral
Notice General informationPractical informationQualification
![Page 9: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/9.jpg)
HIO 2009 Marianne Lykke
![Page 10: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/10.jpg)
HIO 2009 Marianne Lykke
![Page 11: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/11.jpg)
![Page 12: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/12.jpg)
ISKO 2010 Marianne Lykke
Case study
• sundhed.dk: Danish, national health portal
• Active since 2001, 25.000 documents
• Two main target groups: citizens and medical professionals
• Combination of full-text indexing and controlled, assigned indexing: o ICPC, International Classification Primary Careo ICD-10, International Classification of Diseaseso Home-grown Citizens Thesaurus
• Large and varied group of indexers o 5 regionso Up to 250 indexers per region
• Specific target group: family doctors
![Page 13: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/13.jpg)
ISKO 2010 Marianne Lykke
Test design
• Comparative, experimental indexing studyo Baseline: keyword indexing (controlled and free terms)o Experimental: semantic component indexing
• Test persons: 16 sundhed.dk indexers (convenience sample)
• Indexing task: 12 sundhed.dk documentso 6 documents were indexed with semantic components
(SC)o 6 documents were indexed with keywords
• Random assignment of documents and indexing methods
• Training session• Evaluation measures:
o Accuracy o Consistencyo Indexing timeo Easiness
![Page 14: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/14.jpg)
ISKO 2010 Marianne Lykke
Research questions
• Is semantic component indexing more accurate than keyword indexing compared to a reference standard?
• Is semantic component indexing more consistent than keyword indexing?
• Is semantic component indexing faster than keyword indexing?
• Is semantic component indexing easier than keyword indexing?
![Page 15: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/15.jpg)
ISKO 2010 Marianne Lykke
Accuracy
Document Semantic component Keywords
Recall macroaverage
Precisionmacroaverage
Recallmacroaverage
Precision macroaverage
1 0.74 ± 0.37 0.89 ± 0.26 0.14 ± 0.33 0.74 ± 0.43
2 0.56 ± 0.33 0.61 ± 0.39 0.35 ± 0.47 0.74 ± 0.42
3 0.59 ± 0.45 0.72 ± 0.38 0.10 ± 0.23 0.72 ± 0.42
4 0.33 ± 0.29 0.72 ± 0.41 0.16 ± 0.35 0.70 ± 0.45
5 0.74 ± 0.39 0.68 ± 0.47 0.38 ± 0.47 0.85 ± 0.30
6 0.59 ± 0.13 0.81 ± 0.35 0.01 ± 0.04 0.88 ± 0.31
7 0.63 ± 0.39 0.79 ± 0.31 0.28 ± 0.36 0.62 ± 0.41
8 0.70 ± 0.31 0.93 ± 0.17 0.01 ± 0.02 0.61 ± 0.49
9 0.66 ± 0.33 0.76 ± 0.43 0.21 ± 0.39 0.79 ± 0.39
10 0.61 ± 0.35 0.75 ± 0.26 0.25 ± 0.42 0.79 ± 0.39
11 0.65 ± 0.43 0.86 ± 0.31 0.12 ± 0.27 0.80 ± 0.36
12 0.63 ± 0.48 0.83 ± 0.30 0.03 ± 0.08 0.85 ± 0.34
![Page 16: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/16.jpg)
ISKO 2010 Marianne Lykke
Consistency
Document Semantic component
Keywords
Mean K ± SD(of all semantic
components in the document)
Binary K
(all vocabularies)Traditional 1 ± SD
consistency = c / (a + b – c)
1 0.46 ± 0.35 -0.08 0.05 ± 0.13
2 0.21 ± 0.16 0.001 0.18 ± 0.19
3 0.25 ± 0.30 -0.08 0.05 ± 0.11
4 0.35 ± 0.23 0.02 0.19 ± 0.30
5 0.50 ± 0.30 0.32 0.33 ± 0.23
6 0.05 ± 0.11 -0.07 0.23 ± 0.41
7 0.40 ± 0.48 0.26 0.27 ± 0.18
8 0.66 ± 0.11 -0.08 0.05 ± 0.11
9 0.04 ± 0.24 -0.02 0.09 ± 0.14
10 0.44 ± 0.16 0.27 0.29 ± 0.13
11 0.48 ± 0.41 -0.06 0.04 ± 0.09
12 0.01 ± 0.07 -0.12 0.08 ± 0.24
![Page 17: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/17.jpg)
Time to index
0
5
10
15
20
25
30
35
40
< 2min 2 - 5 min 5 - 10 min 10-15 min > 15 min
Time to Index
Nu
mb
er o
f In
dex
ing
Inst
ance
s
Semantic Component Indexing Keyword Indexing
![Page 18: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/18.jpg)
Easiness
0
1
2
3
4
5
6
7
8
9
10
Chooseconcept
Choosekeyword
What each SCis
Designate SC Markboundaries
Choose doc.class
Nu
mb
er o
f In
dex
ers
Very difficult Very easy
![Page 19: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/19.jpg)
ISKO 2010 Marianne Lykke
Conclusions
• Varied accuracy for both indexing methods, but data suggests that semantic component indexing might be more accurate
• Indications that feasibility and easiness of indexing methods are similar
• Semantic component indexing may be preferable alternative if no appropriate controlled vocabulary is available due to short time for development and easy customization to specific document collection
• Limitations:o Small sample and a single domaino Not directly comparable evaluation measure
• Retrieval test shows improvement of document ranking of 25.6% by nDCG (normalized Discounted Cumulative Gain)
![Page 20: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/20.jpg)
ISKO 2009 Marianne Lykke
Future research
• Development of model:
o Simpler version
o Up-marking by users (social tagging)
o Automatic up-marking
o Up-marking by XML
• Larger scale evaluation
• Evaluation in other domains
![Page 21: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/21.jpg)
HIO 2009 Marianne Lykke
Litteratur
Dillon, M (1991). Reader’s model of text structures: the case of academic articles. International Journal of Man-Machine Studies, 35. 913 – 925.
Ely, J, Osheroff, J, Ebell, M, Bergus, G, Levy, B Chambliss, M & Evans, E (1999). Analysis of wquestions asked by family doctors regarding patient care. BMJ, 310 (7206). 358 – 361.
Ely, J, Osheroff, J, Gorman, P, Ebell, M, Bergus, G, Levy, B Chambliss, M, Pifer, E & Stavri, P (2000). A taxonomy of generic clinical questions: classification study. BMJ, 321 (7278). 429 - 432.
Fagin, R., Kumar, R., McCurley, K S., Novak, J., Sivakumar, D., Tomlin, J.A. & Williamson, D.P. (2003). Searching the workplace web. In: Proceedings of the 12th International World Wide Web Conference (WWW ’03), Budapest, Hungary, May 20-24, 2003. 366-375.
Freund, L., Toms, E. & Waterhouse, J. (2005). Modeling the information behaviour of software engineers using a work-task framework. In: Grove, A (ed.) ASIS&T ’05 Proceedings of the 68th Annual meeting, Charlotte, NC, October 28-ember 2, 2005.
Hearst, M & Plaunt, C (1993). Subtopic structuring for full length document access. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 59 – 69.
Leckie, G.J., Pettigrew, K.E. & Sylvain, C. (1996). Modeling the information seeking of professionals. Library Quarterly, 66 (2). 161-193.
Orlikowaki, W J & Yates, J (1994). Genre repertoire: the structuring of communicative practices in organizations. Administrative Science Quarterly, 39. 541 – 574.
Price, S, Delcambre, L & Nielsen, M L (2006). Using semantic components to express questions against document collections. Proceedings International Workshop on Health Information and Knowledge Management (HIKM 2006), Arlington (VA).
Price, S, Nielsen, M L, Delcambre, L & Vedsted, P (2007). Semantic components enhance retrieval of domain-specific documents. Proceedings of the ACM Sixteenth Conference on Information and Knowledge Management (CIKM), Lisboa, November 6 - 8, 2007.
![Page 22: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/22.jpg)
HIO 2009 Marianne Lykke
Search term should appear in specified semantic component
Search term
![Page 23: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/23.jpg)
HIO 2009 Marianne Lykke
Semantic component should appear in document
![Page 24: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/24.jpg)
![Page 25: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/25.jpg)
0
50
100
150
200
250
< 2 min 2 - 5 min 5 - 10 min 10 - 15 min > 15 min
Time to Index
Nu
mb
er
of
Do
cum
en
ts
![Page 26: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/26.jpg)
Time to index
Indexing Type
Total Documents
Indexed (max = 96)
Mean Num. Docs Indexed
Per Indexer (max = 6)
Mean Time (min:sec)
Min Time (min:sec)
Max Time (min:sec)
Semantic Components
83 5.2 07:03 00:24 27:05
Keywords 88 5.5 05:56 01:06 31:26
Time required for indexing documents
![Page 27: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/27.jpg)
0
1
2
3
4
5
6
7
8
9
10
For indexing documents For searching
Task type
Nu
mb
er
of
ind
exe
rs
Prefer keyword indexing About the same Prefer semantic component indexing
![Page 28: ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference](https://reader035.vdocuments.us/reader035/viewer/2022062618/5514bab455034640138b557b/html5/thumbnails/28.jpg)
HIO 2009 Marianne Lykke
Research team
General practice Peter VedstedMD, Ph.D.Research Unit general Practice,Århus University
Jens RubakMDPraksis.dk, Region Midt
Information and computer science
Lois Delcambre, Ph.D., ProfessorSusan Price, MD, Ph.D. studentComputer Science DepartmentPortland State University, USA
Marianne Lykke, Ph.D., Associate professorInformation Interaktion and Information ArkitectureDanmarks Bibliotekskole
sundhed.dk Vibeke Luk Frans la CourInformation specialist IT consultantsundhed.dk Autonomy
Supported by grants from the National Science Foundation, grant numbers 0514238, 0511050 and 0534762, the National Library of
Medicine Training Grant 5-T15-LM07088 and Kvalitetsudviklingsudvalget for Almen Praksis, Aarhus Amt