construction of index: (page 197)

5
Construction of Index: (Page 197) Objective: Given a document, find the number of occurrences of each word in the document. Example: Computer Science students know computers and computer languages. Keywords: computer, computers, science, students, know, and, languages.

Upload: juliana-ortiz

Post on 30-Dec-2015

27 views

Category:

Documents


7 download

DESCRIPTION

Construction of Index: (Page 197). Objective: Given a document, find the number of occurrences of each word in the document. Example: Computer Science students know computers and computer languages. Keywords: computer, computers, science, students, know, and, languages. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Construction of Index:  (Page 197)

Construction of Index: (Page 197)

• Objective: Given a document, find the number of occurrences of each word in the document.

• Example: Computer Science students know computers and computer languages.

• Keywords: computer, computers, science, students, know, and, languages.

Page 2: Construction of Index:  (Page 197)

Linear time algorithm:

• Let T be the text, |T| the length of T. We can find the occurrences of each word in T in O(|T|) time.

Page 3: Construction of Index:  (Page 197)

Constructing an automaton:

onk

s c i e n c

tupmoc

l

na

egaugna

edut n

sr

e

s

w

d

s

t

e

Page 4: Construction of Index:  (Page 197)

Remarks:

• There is a final state for each word.• There is a counter on each final state storing the

number of occurrences that the final state is reached.

• While reading, the algorithm creates new states for the new word.

• For words having met before, we just go through the corresponding states.

• When the final state is read, add 1 to the counter.

Page 5: Construction of Index:  (Page 197)

Assignment one (due in week 6 on Friday, 7:30 pm)

• Write a program to convert a text into a vector such that each element of the vector is the number of occurrences of the corresponding keyword.