lecture 4: information retrieval techniques/model and ... · –1) basic retrieval techniques...

26
College of Education School of Information and Communication Studies Department of Information Studies 2018/2019 ACADEMIC YEAR Lecture 4: Information Retrieval Techniques/Model and Strategies LECTURER: DR. DE-GRAFT JOHNSON AMENUVEVE DEI [email protected] 0243775571

Upload: others

Post on 24-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

College of Education

School of Information and Communication Studies

Department of Information Studies

2018/2019 ACADEMIC YEAR

Lecture 4:

Information Retrieval

Techniques/Model and Strategies

LECTURER: DR. DE-GRAFT JOHNSON AMENUVEVE DEI [email protected]

0243775571

Page 2: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Lecture Objectives

• Recap of previous lessons/class/lecture

• Explain the IR techniques/models

• Identify the models of IR

• Explain the individual models

• Propose a technique/model for IR

Dr. De-Graft Johnson Dei, Dept of Information Studies

Slide 2

Page 3: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

•Retrieval Model/Techniques

Dr. De-Graft Johnson Dei, Dept of Information Studies

Slide 3

Page 4: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Retrieval Model/Techniques

• IR Model/Techniques are designed to help users to locate the information they need effectively and efficiently.

• They help users to find out the required information easily.

• They govern how a document and a query are represented and how the relevance of a document to a user query is defined

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 4

Page 5: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Types of IR Model/Techniques

• There are two types of retrieval techniques.

– 1) Basic Retrieval Techniques

–Boolean Searching

–Truncation Search

–Proximity search/Operators

– 2) Advanced Retrieval Techniques:

–Vector space model

–Probability

–Statistical language model

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 5

Page 6: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Boolean Searching

• Developed by George Boole in 1815-1864.

• By using these techniques user can narrow down their search to get the required information.

• When performing a Boolean search, you must first choose keywords that best describe your topic.

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 6

Page 7: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Boolean Searching

• Boolean searches are carried out using terms like

– AND,

– OR

– NOT.

Dr. De-Graft Johnson Dei, Dept of Information Studies

Slide 7

Page 8: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Boolean Searching

• By adding these “operators” you tell the database what words the results should or should not precisely contain.

• So you are defining the connections between your keywords.

• These essentially narrow or expand your search.

• As they connect words you are more likely to find relevant and focused results.

• Most web resources require you to always use CAPITAL LETTERS when typing any Boolean operator.

Dr. De-Graft Johnson Dei, Dept of Information Studies

Slide 8

Page 9: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Boolean Operators

OR e.g. I want info on college

Find synonyms: college, university,

campus

AND e.g. I am interested in the

relationship between poverty and

crime

NOT e.g. I want info on cats but don’t

want to read anything about dogs

Page 10: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Boolean Operator: AND

• It includes addition of two different concepts for narrowing down the search.

• It retrieves all those items where all the constituent terms occur.

• all search terms must be present in the search results when you use AND

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 10

Page 11: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Boolean Operator: AND

Page 12: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Boolean Operator: AND

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 12

• For example if you want material that discusses both cats and dogs you would use the search cat AND dog.

• The red area in the middle of the Venn diagram below represents the result set for this search.

• It is a small set because the results have to include both search words

• Remember that you can also combine phrase searches with Boolean operators, e.g. AND

• Also be aware that in many, but not all, databases and search engines the AND is implied.

Page 13: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

• You probably already use AND as some resources automatically add it in.

• AND is sometimes indicated with a plus + sign.

• When you apply your keywords with AND you are informing the resource that both words must appear

• This limits or narrows your results as both words must be there.

• A good time to use this connector is when your initial keyword search finds too many irrelevant results.

• So you can use it to refine further.

Boolean Operator: AND

marketing footwear

Search results for marketing and footwear

This intersection of both words is your results list which features

both marketing AND footwear

Page 14: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Dr. De-Graft Johnson Dei, Dept of Information Studies

Slide 14

Page 15: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

NOT

• It is separation of complex concepts into individual simpler ones.

• It allows users to specify those terms that they do not want to occur in the retrieval records.

• It excludes unwanted results.

• Search output will decrease with increase in NOT term.

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 15

Page 16: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

NOT

• Use NOT in a search to exclude search terms and narrow your search.

• For example if you only want to read about cats and nothing about dogs you would use the search cat NOT dog.

• The red area in the Venn diagram below represents the result set for this search.

• It is a small set using because NOT excludes all material with dogs from the result.

• Be careful when using the NOT operator as it might limit the results too much and leave out some valid results.

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 16

Page 17: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

The use of NOT

• When you apply your keywords with NOT you are excluding certain words from your search string.

• This narrows your results by not returning hits that contain the excluded words you have chosen.

• NOT is sometimes indicated with a minus

• sign -, AND NOT or ANDNOT.

• It excludes the keyword that follows it.

• A good time to use this connector is when you scan the first couple of results pages and see numerous irrelevant hits.

Search results for conservative NOT liberal

This section indicates results on conservatives but exclude those

that discuss liberals.

conservative liberal

Page 18: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

OR

• The inclusion of more concepts to expand their connotation.

• It is used for broadening a search.

• It allows users to combine two or more search terms that

• Use OR to broaden your search results

• OR is commonly used to connect two or more similar concepts

• For example if you want material that discusses cats, dogs or both animals, you would use the search cat OR dog.

• All of the red area represents the result set for this search.

• It is a large set because any of the search words are valid using the OR operator.

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 18

Page 19: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

The use of OR

• When you apply your keywords with OR you are informing the resource that the appearance of either is ok.

• So this Boolean operator expands or broadens your search.

• So every page or record must have at least one of the keywords on it. You are giving the resource a choice; either that word OR that word.

• A good time to use this connector is when your initial keyword search finds too few results.

• So you can use it to widen your search • It can be a time saver as you can add in

different words which have similar meanings; instead of running each search separately.

renewable sustainable

Search results for renewable OR sustainable

The intersection of both words as well as the keywords appearing

separately make up your results list.

Page 20: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

TRUNCATION

• Truncation Search – It is also known as wildcard, stemming, term masking,

conflation algorithm etc… – There are three types of truncation.

• Right truncation: – truncation is on right side of the term. For

example... Lib* • Left truncation:

– truncation is on left side of the term. For example... *rary

• Simultaneous with left and right side. For example... *polymer*

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 20

Page 21: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Truncation

• A truncation or stemming symbol allows you to search for all of the words with the same root.

• To use truncation, enter the root of a word and put the truncation symbol at the end.

• So for root words with multiple endings you can search for them all at the same time.

• The truncation symbols vary in each resource so it is worth looking at the online help facilities to make sure you are using the correct one.

• They can include *, !, ?, or #

child* = child, childs, children, childrens , childhood

bank* = banks, banking, bankers, bankrupt.

advert* = adverts, advertising, advertisement, advertised.

sun = suns, sunshine, sunny, sunlight.

Page 22: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Proximity search

• A type of operator used by some search engines to improve search constraints by instructing the search to look for words that are within a short distance of each other in a document. – NEAR

– FBY (Followed BY)

– Within

– ADJACENT / ADJ

– Nesting

– NEXT

• Proximity search is as good as AND.

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 22

Page 23: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Proximity search: NEAR

• For example, using a search engine that supports proximity operators, querying the phrase "cable NEAR modem" will instruct the search engine to look in documents for instances of the words "cable" and "modem" that are near each other.

• NEAR retrieves documents with the search terms appearing within a specified number of words of each other. The exact number of words will vary from one database or search engine to another; in some, you can specify the proximity by a certain number of words.

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 23

Page 24: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Proximity search: NEAR

• Cancer NEAR horoscope would probably find information about the astrological sign Cancer, but likely not information about the disease. Fur NEAR coat would find fur coat, fur lined leather coat, or coat of mink fur

• Different search engines will specify different distances that the words must be within.

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 24

Page 25: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Proximity search: ADJACENT /ADJ

• ADJ

• ADJ retrieves documents in which one word directly follows another.

• However, this does not necessarily mean they will appear in the same order in which you typed them.

• Yellow ADJ Gold would find the precious metal, but also the phrase "gold, yellow or bronze highlights."

Dr. De-Graft Johnson Dei, Dept of Information Studies

Slide 25

Page 26: Lecture 4: Information Retrieval Techniques/Model and ... · –1) Basic Retrieval Techniques –Boolean Searching –Truncation Search –Proximity search/Operators –2) Advanced

Range Searching

• It is very useful in numerical searching.

• It is important in selecting records within certain data ranges.

• For example...

– Greater than (˃)

– Less than (˂)

– Equal to (=)

– Not equal to (1=0 or ˂˃)

– Greater than or equal to (˃ =)

– Less than or equal to (˂ =)

Dr. De-Graft Johnson Dei, Dept of Information Studies Slide 26