db2 net search extender

21
DB2 Net Search DB2 Net Search Extender Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)

Upload: tess98

Post on 28-Jun-2015

1.300 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: DB2 Net Search Extender

DB2 Net Search DB2 Net Search ExtenderExtender

Presenter:

Sudeshna Banerji(CIS 595: Bioinformatics)

Page 2: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

Topics to discuss:– Information retrieval– Text-indexing– DB2 Text Extenders– DB2 Net Search Extender– References– Questions

Page 3: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

A Little Background…

Information Retrieval(IR):• Extraction of “relevant” information from huge

volumes of data scattered across different databases.• Examples: Textual search, image search, video search

etc.• Efficiency(time and speed) of IR is based on different

INDEXING technologies.• Indexing increases performance of system.• An example of indexing technology: Text-indexing

used for textual-search.

Page 4: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

A Little Background…

Text-Indexing :• Process of deciding what will be used to represent a

given document.

• A text index consists of significant terms extracted from the text documents, each term stored together with information about the document that contains it.

• The search is then handled as a query to look up the index.

Page 5: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

A Little Background… Text-Indexing (continued):

• Involves the following:

– Parsing the documents to recognize the structure.

E.g title, date, other fields.

– Scan for word tokens: numbers, special characters, hyphenation, capitalization etc.

– Stopword removal: based on short list of common words like “the”, “and”, “or”.

Page 6: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

Indexing only Indexing only Significant TermsSignificant Terms

Page 7: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Extenders

– Product of IBM family that provide support to data beyond traditional character and numeric data types.

– Extenders available for images, voice, video, complex documents (full-text search), spatial objects etc.

– Trial and beta versions available for testing.– Link for extenders:http://www-3.ibm.com/software/data/db2/extenders/index.html

Page 8: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Text Extenders– To meet the increasing demands of content management,

IBM has introduced 3 full-text retrieval applications available for DB2 Universal Database (DB2 UDB).

• DB2 Net Search Extender

• DB2 Text Information Extender

• DB2 Text Extender – When to use what?

• Link for comparisons of the above:http://www-3.ibm.com/software/data/db2/extenders/fulltextcomparison.html

Page 9: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender Replaces DB2 Text Information Extender Version 7.2 Some important features:

– Indexing speed of about 1GB per hour .– Different text formats: ASCII Plain text, HTML,XML,

GPP– Base support for 37 languages including English, Spanish,

French, Japanese and Chinese .– Sub-second search response times. – No decrease in search performance with up to 1000

concurrent queries per second.

Page 10: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender Some text-search capabilities:

– Search can be performed using SQL (fourth generation language…almost like English query).

– Searches can include:• Boolean operations.• Proximity search for words in the same sentence or

paragraph: for HTML,XML and GPP.• “Fuzzy” searches for words having a similar spelling as

the search term: Andrew & Andru• Thesaurus related search.• Restrict searching to sections within documents.• User can limit the search results with a “hit count”, and

can also specify how the results are to be sorted.

Page 11: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender System requirements

– DB2 Version 8.1– Java Runtime Environment (JRE) Version 1.3.1

Windows Installation– Administrative rights required.– Call db2text start to start the DB2 Net

Search Extender Instance Services.

Page 12: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender Simple example with the SQL queries

– Following steps are required to do a basic textual-search in DB2 Net Search Extender:

1. Creating a database2. Enabling a database for text search3. Creating a table4. Creating a full-text index5. Loading sample data6. Synchronizing the text index7. Searching with the text index

Page 13: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender

1. Creating a database:db2 "create database sample"

2. Enabling a database for text search:• To start Net Search Extender Service

db2text "START“

• To prepare the database for use with DB2 Net Search Extender:db2text "ENABLE DATABASE FOR TEXT CONNECT TO sample"

Page 14: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender

3. Creating a table:db2 "CREATE TABLE books (isbn VARCHAR(18) not

null PRIMARY KEY, author VARCHAR(30), story

LONG VARCHAR, year INTEGER)"

4. Creating a full-text index:db2text "CREATE INDEX db2ext.myTextIndex FOR

TEXT ON books (story) CONNECT TO sample"

Page 15: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender5. Loading sample data:

db2 "INSERT INTO books VALUES (‘0-13-086755-

1’,’John’,’ A man was running down the street.’,2001)“

db2 "INSERT INTO books VALUES (‘0-13-086755-2’ ,

‘Mike’, ’The cat hunts some mice.’, 2000)“

6. Synchronizing the text index:

db2text "UPDATE INDEX db2ext.myTextIndex FOR TEXT

CONNECT TO sample“

Page 16: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender7. Searching with the text index:

• Using CONTAINS scalar search function:db2 "SELECT author, story FROM books WHERE CONTAINS (story, ‘”cat“’) = 1 AND year >= 2000"

The following result table is returned:AUTHOR STORYMike The cat hunts some mice.

NOTE:– To create a text-index, the text columns must be one of

the following data types:CHAR, VARCHAR, LONG VARCHAR, CLOB.

Page 17: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender Thesaurus Support:

– A thesaurus is structured like a network of nodes linked together by relations:

• Associative relations: RELATED_TO

• Synonym relations: SYNONYM_OF

• Hierarchical relations: LOWER_THAN, HIGHER_THAN

– Creating and compiling a thesaurus:

1. Create a thesaurus definition file (explained below).

2. Compile the definition file into a thesaurus dictionary using DB2EXTTH utility.

Page 18: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender Create a thesaurus definition file.

– Define its content in a definition file using a text editor.

Example of some definition groups::WORDS

football

.RELATED_TO goal

.SYNONYM_OF soccer

:WORDS

chapel

.LOWER_THAN skyscraper

.HIGHER_THAN house

Page 19: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender An example of a structure of a Thesaurus:

Game

Ball Game

Tennis

Soccer

HIGHER_THAN

HIGHER_THANHIGHER_THAN

Football

HIGHER_THAN

SYNONYM_OF

Page 20: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

DB2 Net Search Extender References:

- http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/

document.d2w/report?fn=desu9m03.htm#ToC

- Information Retrieval Site containing good lecture slides:

http://ciir.cs.umass.edu/cmpsci646/

- Net Search Extender Administration and User’s Guide , Version 8.1 (can be downloaded with the software)

Page 21: DB2 Net Search Extender

Sudeshna Banerji (CIS 595: Bioinformatics)

ANY QUESTIONS????