1 “informatie vinden” conclusie [email protected] vrije universiteit brussel...

47
1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen @ vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation at the one-day conference organised by NVB-WB in KB, Den Haag, Nederland 27 April 2006

Upload: meadow-lavis

Post on 31-Mar-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

1

“Informatie vinden”Conclusie

[email protected]

Vrije Universiteit Brussel

Pleinlaan 2, B-1050 Brussels, Belgium

Prepared for a presentation at the one-day conference organised by NVB-WB in KB, Den Haag, Nederland

27 April 2006

Page 2: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

2

Basic difficulties in information retrieval

Difficulty: A word or phrase is not the same as a concept.

This may cause a low recall.

Word

WordConcept

Page 3: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

3

Basic difficulties in information retrieval (continued)

• When the user needs information related to a particular concept or a combination of more elementary concepts, then the user should formulate a query that covers these concepts well, by using not just a single word or term to cover each concept, but by using several words and/or terms, including synonyms, spelling variations, narrower terms, related terms, translations, and so on.

• The aim is mainly to increase the recall of the search action, by covering the concept better, but also to increase the precision by including the most appropriate words and/or terms in the query.

Page 4: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

4

Basic difficulties in information retrieval (continued)

Difficulty: Many words suffer from ambiguity of meaning.

This may cause low precision.

WordRelevant concept

Irrelevant concept

NOT wanted

Page 5: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

5

Basic difficulties in information retrieval (continued)

• Many words and/or terms from some natural language suffer from ambiguity, because natural languages have evolved spontaneously, not strictly controlled.

• An example is the word “pascal”, which can have several meanings:»the philosopher Blaise Pascal, »the programming language Pascal, »the physical unit of pressure, and »the name of many persons…

Page 6: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

6

• Subject descriptions should be adapted to thelibrary = context = user community!

• In other words: A “general, typical user of all libraries” does not exist.

“Vinden van informatie”Conclusie: Stelling

“Vinden van informatie”Conclusie: Stelling

Page 7: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

7

• The subject description system (classification and/or thesaurus and/or…) should be

»clearly visible and usable

»well-integrated with the formal descriptions of documents

»well explained to the user!

• In most systems this is NOT well implemented.

“Vinden van informatie”Conclusie: Stelling

“Vinden van informatie”Conclusie: Stelling

Page 8: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

8

• Even better:

The system is invisible, but works well in the background

(for example: automatic expansion of queries)

• This is NOT done in most systems.

“Vinden van informatie”Conclusie: Stelling

“Vinden van informatie”Conclusie: Stelling

Page 9: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

9

• Level of subject descriptions must depend on the resources (budget and personnel)that is available by the library!

• For example: No money no subject descriptions.

“Vinden van informatie”Conclusie: Stelling

“Vinden van informatie”Conclusie: Stelling

Page 10: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

10

• Folksonomy will be accepted and implemented.

“Vinden van informatie”Conclusie: Stelling

“Vinden van informatie”Conclusie: Stelling

Page 11: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

11

1. Merging of different subject description systems is impossible or very expensive.

2. Collections with different subject description systems will be merged one day.

3. Therefore: Forget about subject descriptions.

“Vinden van informatie”Conclusie: Syllogisme

“Vinden van informatie”Conclusie: Syllogisme

Page 12: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

12

1. Federated searching = meta-searching = one-stop searching is coming up.

2. This federated searching hinders exploitation of subject descriptions.

3. Therefore: Forget about subject descriptions!

“Vinden van informatie”Conclusie: Syllogisme

“Vinden van informatie”Conclusie: Syllogisme

Page 13: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

13

• Offer independent, external, horizontal, general thesaurus systems to users, so that they can find relevant terms.

• Then it is desirable to link thesaurus terms into the local catalogue for searching.

“Vinden van informatie”Conclusie: Truc

“Vinden van informatie”Conclusie: Truc

Page 14: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

14

Page 15: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

15

Page 16: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

16

Page 17: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

17

Horizontal thesaurus systems for natural human language

• Furthermore, Google Web Search offers also a more direct, automatic expansion of query words, at least for the English language.

• This requires an explicit request through the Google command language by the user to implement this, in fact by preceding a particular search query word in a query by a tilde like in “~queryword”.

• However, this is probably not known by most users. A more user-friendly implementation would be welcome.

Page 18: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

18

• Offer users a view on words that are related to a word used in a first query, so that they can find other relevant words to search or more relevant words for searching.

“Vinden van informatie”Conclusie: Truc

“Vinden van informatie”Conclusie: Truc

Page 19: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

19

System based on words present in the context of the first query

• For instance, AquaBrowser Library software shows the query words of a user in the context of a selection of other words that occur in the document collection.

• More information is available from their WWW site http://www.medialab.nl/

• We can read there: “When you type in a word, you get a 'word cloud' that contains different associations and shades of meaning of that word. You click on the ones that most closely match your interest, and it will help you find

the library resources you need. It’s a lot of fun to use, too."

Page 20: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

20

Page 21: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

21

Page 22: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

22

• Advanced tool for retrieval of items about a subject: Clustering

“Vinden van informatie”Conclusie: Truc

“Vinden van informatie”Conclusie: Truc

Page 23: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

23

Automatic topical clustering today

• The ambiguity of words and terms from natural languages lowers the precision of searches executed with relatively classical, simple retrieval software, as mentioned above.

• This problem can be tackled by topical clustering of search results on the basis of the words included in those results, hoping that this will result in clusters of documents about similar, semantically related concepts/topics/subjects.

Page 24: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

24

Clusty

• http://clusty.com/

• This is an internet meta-search engine that offers not only a conventional ranked list of search results but also search results clustered by topics or sources or URLs.

• The system is produced by the same company that produces the Vivisimo WWW meta-search system that is also mentioned further below. Both use the ‘Vivisimo Clustering Engine’.

Page 25: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

25

Page 26: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

26

Grokker

• http://www.grokker.com/ A public access implementation of Grokker software offers federated searching free of charge through the

»Yahoo! WWW search engine database,

»the Amazon Book database, and

»the ACM Digital Library

• The results are offered in an outline, a list of categories (and --if wanted-- also in the more graphical form of an interactive map).

Page 27: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

27

Page 28: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

28

Vivisimo

• http://vivisimo.com/

• A public access implementation of Vivisimo software offers federated searching free of charge through many WWW search engine databases.Then it clusters results in an outline, a list of categories. (Clusty mentioned above uses the same ‘Vivisimo Clustering Engine’)

Page 29: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

29

Page 30: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

30

Wisenut

• http://www.wisenut.com/

• Wisenut offers searching free of charge through WWW pages and clusters the results in an outline, a list of categories.

Page 31: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

31

Page 32: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

32

• Advanced tool for retrieval of items about a subject: Visualization

“Vinden van informatie”Conclusie: Truc

“Vinden van informatie”Conclusie: Truc

Page 33: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

33

Information visualization: introduction

• Visualization can help users to interpret complex data sets so that better decisions can be made faster.

• On the one hand the maps created by the system should help users to interpret and analyse a set data, but on the other hand they bring their own cognitive load. In other words, before the user can interpret the data set, first the type of visualization must be understood.

• Some mapping technique will probably prove to be useful and widely acceptable in the near future.

Page 34: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

34

Visualization of the information source available

• It may be useful to visualize some aspects of information sources to a user, to give the user a better idea of what is available.

• Visualization of what is available can already be applied in the case of the hard disk on personal computers. Obviously it is interesting to get a clear view on the contents of a hard disk. Some utility programs are available that can be installed and applied for this purpose.

Page 35: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

35

Page 36: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

36

Visualization in a system that helps the user to formulate a query

• For instance the Thinkmap Visual Thesaurus can show relations among words in English in a graphical map on the computer display that is obviously 2-dimensional.

• Furthermore the map is dynamic: it moves to reveal and show the underlying 3-dimensional, spatial map of the related words and phrases.

• The software exploits the open access WordNet thesaurus (which is mentioned also above).

• http://www.visualthesaurus.com/

Page 37: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

37

Page 38: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

38

Visualization in a system that helps the user to formulate a query

• Another example: As mentioned and illustrated above, the AquaBrowser Library software visualizes relations between a user’s query and other words that are present in the information items that a library makes available and that may be relevant in the context of the query.

Page 39: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

39

Page 40: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

40

Visualization of the characteristics of query result sets

• In a next phase, when a user has formulated a query and has executed the search, then the set of search results are presented in most cases as a simple list of references, ordered or ranked in some way or another.

• Some systems go further and offer results in clusters(as outlined above).

• Moreover, some programs do not offer the results merely with text only, but they can visualize the results in the form of a map.

Page 41: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

41

Visualization of the characteristics of query result sets

• For instance:Kartoo software can be applied

»to search

»to cluster/categorize the search results

»and furthermore, to visualize these clusters in a map.

• A public access site offers meta-searching in several WWW search engines, free of charge, through http://www.kartoo.com/

Page 42: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

42

Page 43: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

43

Visualization of the characteristics of query result sets

• Another example: Grokker software »can execute federated searches through several databases

in 1 action, »can cluster/categorize results from search actions, and»can then visualize these in a map

• A public access implementation allows anyone to perform a WWW search based on the Yahoo! database of WWW pages: http://www.grokker.com/

• Has already been implemented in a university library.

Page 44: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

44

Page 45: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

45

• First use an external, more advanced or more specialised database; then check local availability using a catalogue database.Or: use a catalogue database, find a document and then find more information in another external, more advanced or more specialised database.

• In this procedure, it is desirable to link/integrate both databases.

• This is feasible.

• For example: OpenURL linking from a local library catalogue deep into the Amazon book database.

“Vinden van informatie”Conclusie: Truc

“Vinden van informatie”Conclusie: Truc

Page 46: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

46Conclusie van deze conclusie:“Het laatste woord is nog niet gezegd”“We zitten nog een tijdje met dit probleem”

Page 47: 1 “Informatie vinden” Conclusie Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation

47

• You are free to copy, distribute, display this work under the following conditions:

»Attribution: You must mention the author.

»Noncommercial: You may not use this work for commercial purposes.

»No Derivative Works: You may not change, modify, alter, transform, or build upon this work.

• For any reuse or distribution, you must make clear to others the license terms of this work.